This week I have been visiting the Department of Statistical Sciences at Cornell University. This is the home of many venerable statisticians. At first sight it appears that statisticians are spread all over the university, and technically they are because funding comes from many directions, but almost all are actually located in a suite in Comstock Hall. Professor Paul Velleman is one of the pioneers of data-centrist thinking about statistics. He produced the software called `DataDesk`

in the early 90s that some saw as rivaling LispStat and particularly JMP for introductory statistics classes. It has model fitting and interactive graphics, that is still not available in R. It is not open source, though. His textbook for introductory statistics is one of the pre-dominant texts used globally.

Its been an interesting week. Professor Marty Wells has been comparing results from collecting data using Amazon’s Mechanical Turk with traditional telephone surveys. Professor Jim Booth, chair of Biological Statistics and Computational Biology, has worked in many areas, most currently focused on high-throughput data, and discussed Gordon Smyth’s new work on VOOM. Professor Giles Hooker, a fellow Australian, has some recent inference results for random forests, which might be useful in my student Natalia Da Silva’s thesis research. I had several detailed discussions with Professor Jacob Jien on R packages and optimization, and using Hadley’s development tools and Yihui’s documenting. Professor Stanislav Volgushev suggested that we could build models from crowdsourcing by having people sketch, which might do better than smoothing and logistic regression sometimes! Professors Marten Wegkamp and and Flori Bunea are colleagues from a 2008 semester long workshop at Cambridge on high-dimensional problems in Statistics.

Look out for Amy Willis! She will graduate in about a year, after working on statistics for high-throughput data collected on microbial biomes. She has an R package called `breakaway`

.

Here is a link to my slides on Statistical Inference by Crowd-sourcing.