- Residuals are important. I doubt that I would have taken notice of residuals before, except perhaps to see that the standardized residuals were mostly under 2 or 3 standard deviations. Now I see how important they are for model fitting. They tell you what for what cases (and combinations of predictors) the model doesn't work for.
- Choosing the right statistical method to match your research question and data is an important skill. There are other techniques other than Anova and Linear Regression.
- Feature and variable selection is important.
- Exploratory data analysis is important.
- Regression is everything : Anova = Regression = Machine Learning. There is a unified approach.
- Regression is a social construct.
Since participating in the Titanic data competition, I'd add a few more things that I've learnt are important:
- Cross validation is an important technique
- Understanding a programming language is an important skill if you want to automate the data processing part (excel is a slow way to process data, create new variables, etc)