title | subtitle | author | job | logo | framework | highlighter | hitheme | url | widgets | mode | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The Data Science Track |
Jeffrey Leek |
Johns Hopkins Bloomberg School of Public Health |
bloomberg_shield.png |
io2012 |
highlight.js |
tomorrow |
|
|
selfcontained |
"It is not the critic who counts: not the man who points out how the strong man stumbles or where the doer of deeds could have done better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood, who strives valiantly, who errs and comes up short again and again, because there is no effort without error or shortcoming, but who knows the great enthusiasms, the great devotions, who spends himself for a worthy cause; who, at the best, knows, in the end, the triumph of high achievement, and who, at the worst, if he fails, at least he fails while daring greatly, so that his place shall never be with those cold and timid souls who knew neither victory nor defeat."
Theodore Roosevelt, 26th President of the United States
Statistics and the science game
"Ask yourselves, what problem have you solved, ever, that was worth solving, where you knew all of the given information in advance? Where you didn’t have a surplus of information and have to filter it out, or you didn’t have insufficient information and have to go find some?"
Dan Myer, Mathematics Educator
The key word in data science is not data; it is science
Data intensive statistics in biology and medicine
- Brian Caffo
- Website http://www.bcaffo.com/
- Twitter @bcaffo
- Github https://github.com/bcaffo
- Jeff Leek
- Website http://biostat.jhsph.edu/~jleek/, http://simplystatistics.org/
- Twitter @jtleek
- Github https://github.com/jtleek
- Roger Peng
- Website http://www.biostat.jhsph.edu/~rpeng/,http://simplystatistics.org/
- Twitter @rdpeng
- Github https://github.com/rdpeng
http://www.economist.com/node/15579717
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
http://www.nytimes.com/2009/08/06/technology/06stats.html?_r=0
http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?pagewanted=all
- It is free
- It has a comprehensive set of packages
- Data access
- Data cleaning
- Analysis
- Data reporting
- It has one of the best development environments - Rstudio http://www.rstudio.com/
- It has an amazing ecosystem of developers
- Packages are easy to install and "play nicely together"
[Daryl Morey](http://en.wikipedia.org/wiki/Daryl_Morey)
[Hilary Mason](http://www.hilarymason.com/)
http://radar.oreilly.com/2011/09/building-data-science-teams.html
- Introducing you to the track
- Getting tools set up
- Giving you basic background