According to Hal Varian and just about everyone these days, the hot skills to have are some combination of programming, statistics, machine learning, and visualization. Here are a pile of resources that'll help you get some mad data science skills.
Programming
There seems to be a few main platforms widely used for data intensive programming. R, is a statistical environment that is to statisticians what MatLab is to engineers. It's a weird beast, but it's open source and very powerful, plus has a great community. Python also makes a strong showing, with the help of NumPy, SciPy and matplotlib. An intriguing new entry is the combination of the Lisp dialect Clojure and Incanter. All these tools mix numerical libraries with functional and scripting programming styles in varying proportions. You'll also want to look into Hadoop, to do your big data analytics map-reduce style in the cloud.
Statistics
Machine Learning
![]() |
![]() |
![]() |
Visualization
Classes
Starting on March 5 at the Hacker Dojo in Mountain View (CA), Mike Bowles and Patricia Hoffmann will present a course on Machine Learning where R will be the "lingua franca" for looking at homework problems, discussing them and comparing different solution approaches. The class will begin at the level of elementary probability and statistics and from that background survey a broad array of machine learning techniques including: Unsupervised Learning, Clustering Techniques, and Fault Detection.
Feb 11: Modeling in R (Sudha Purohit -- more details after the jump)
Mar 4: Introduction to R - Data Handling (Paul Murrell)
Apr 15: Programming in R (Hadley Wickham)
Apr 29: Graphics in R (Paul Murrell)
May 20: Introduction to R – Statistical Analysis (John Verzani)
Data bootcamp (slides and code) from the Strata Conference. Tutorials covering a handful of example problems using R and python.
- Plotting data on maps
- Classifying emails
- A classification problem in image analysis
Cosma Shalizi at CMU teaches a class: Undergraduate Advanced Data Analysis.
More resources
- A great list of machine learning tutorials by Andrew Moore.
- There are so many classes, books and lecture videos online these days, you're only limit is the rate at which you can absorb it.
- Hadley Wickham's A philosophy of clean data
- Abhishek Tiwari points us to a Quora thread: How do I become a data scientist?
- Drew Conway's Data Science Venn Diagram, which he expands on in Data science in the US intelligence community. I like Conway's emphasis on the scientific method and hypothesis testing. Drew is coming out with a book soon, Machine Learning for Hackers, that sounds promising.
- Good resources for learning about machine learning
- Machine Learning in Action





R Bloggers