Resources for Learning Statistics and Data Mining

There are an abundance of resources on the web to help novices teach themselves statistics and machine learning, but they can be hard to track down. Below are a few resources that have helped me in the past.

1. Khan Academy has a series of excellent videos on the basics of statistics and probability. They range from 5 minutes to 20 minutes in length. You can create an account to track your progress if you desire. Although good for the basics, Khan Academy does not cover advanced topics.

2. The Elements of Statistical Learning, a free ebook, is a relatively in- depth overview of the major concepts and techniques involved in machine learning. You will need to learn inferential statistics before reading this book, as it assumes that the reader understands the basics.

3. The Statsoft Online Statistics Textbook is a good resource on statistics and machine learning. A reader will need a basic understanding of statistics and probability before reading this text. It is not particularly in depth, but is good for an overview of major concepts, or as a refresher.

4. Machine Learning Videos from mathematicalmonk can be found on youtube(the link is to the full playlist). Although I have not viewed all of these videos personally, they appear to be a good overview of machine learning techniques.

5. Andrew Ng’s Online Machine Learning Classis a simplified version of the Stanford class CS229. It glosses over most of the mathematics involved, but is a very good introductory resource for machine learning. Mlclass also features quizzes and programming exercises that create additional engagement. I would recommend that the reader study basic statistics before attempting this class.

6. Concepts and Applications of Inferential Statistics is written by a professor at Vassar, and is a good introduction to basic statistics.

7. Online Stat Book is a good resource for basic statistical concepts. It has lots of interactive exercises interspersed throughout the text.

8. Introduction to Statistical Thought covers basic probability and statistics, and covers some more advanced topics such as time series and survival analysis. It also has R code that the reader can implement.

9. Linear Algebra, a free to download ebook, provides an overview of linear algebra equivalent to an initial undergraduate course. I have not read through it yet.

10. MIT Opencourseware is an excellent site that has full video lecture series and problem sets for hundreds of classes including linear algebra, calculus, and statistics.

Further Reading

This is an interesting blog post on how to learn mathematics as a programmer.