Thursday, April 25, 2013

What I have learnt about statistics

I've been studying statistics part time (one unit at a time) since 2010. When I was reading in advance of my current unit (Advance Topics in Regression), I realized that some parts of statistics were starting to make sense:

  1. Residuals are important. I doubt that I would have taken notice of residuals before, except perhaps to see that the standardized residuals were mostly under 2 or 3 standard deviations. Now I see how important they are for model fitting. They tell you what for what cases (and combinations of predictors) the model doesn't work for.
  2. Choosing the right statistical method to match your research question and data is an important skill. There are other techniques other than Anova and Linear Regression.
  3. Feature and variable selection is important.
  4. Exploratory data analysis is important.
  5. Regression is everything : Anova = Regression = Machine Learning. There is a unified approach.
  6. Regression is a social construct.

Since participating in the Titanic data competition, I'd add a few more things that I've learnt are important:

  1. Cross validation is an important technique
  2. Understanding a programming language is an important skill if you want to automate the data processing part (excel is a slow way to process data, create new variables, etc)

No comments:

Post a Comment