Sunday, April 28, 2013

Titanic Data Competition - Submission 8

I tried a couple of variations on my best submission to Kaggle today, but did not improve my score.

Remember that my best score ( 0.79904) so far is based on a logistic regression model with the following terms:

  • male (ie gender)
  • pclass
  • fare
  • fare per person
  • Title
  • age * class
  • sex * class
  • combined age (this is age, is missing values based on the median of respective Title)
  • family (total of sibsp and parch)

My first submission (31)  was to remove family and add sibsp and parch. This resulted in score of 0.79426 - slightly under my best score.

Second submission (32) was to run a backwards elimination, and identify the following model:

  • pclass
  • Title
  • sex * class
  • combined age
  • family
This scored 0.78947, not too far under the best score, but with 5 predictors rather than 9 (although that won't have been the situation once SPSS automatically recoded Title into multiple dummy variables)

Third submission (33) was to create a new variable "child". Using combined age variable, cases with age less than or equal to 18 were coded  1 , with cases over 18 being coded 0..

This produced a score of 0.79426.

Still haven't got past the 0.80 threshold !

No comments:

Post a Comment