I've used a regression model to attempt to predict age more accurately.

Using a regression model with indicator variables for the different titles, we achieve an R Squared of 28.6%.

If we include basically all other variables in the regression, the R Square increases to 43.7%.

Using backward elimination, a model with fewer variables had R Square of 43.3%

It will be interesting to see if and how this improves the predictions of a logistic regression model.

The final model was :

Predictors: (Constant),
fare_per_person, cabin_G, cabin_F, Embarked_Q, Title_Other, Title_Miss, cabin_Y,
Title_Master, Embarked_C, sibsp, male, fare, pclass, Title_Mr

I will do some more work on the residuals and other statistics. For example, the 'all variables in the model" had a maximum Mahalanobis Value of 712; under the simpler model produced by backward elimination, the Mahal. value of this particular case had reduced to 6.7, and the max Mahal. value was now 218.

## No comments:

## Post a Comment