Sunday, April 21, 2013

Titanic Data Competition - Submission 6

5 submissions today, and my best result so far. I improved 766 positions today - I'm now equal 334.

Today's submissions were:
  • Binomial logistic regression, with the following variables;

          Based on previous best:

          gender, pclass, fare and age 9with missing values replaced by age imputed from title median

          Additional variables

           age & class interaction, class and gender interaction, fare per person, and title

           Score :  0.68900  -  not an improvement.

           However, in this model I had inadvertently classified the age&class interaction as categorical.

  • Same as above, but did not code age & class variable as categorical.
          Amazingly, this improved my score by 430 positions, to 0.77990
  •  Same as above, but changed cut point to 0.59.
           This resulted in a lower score : 0.77512

  • Same as above, but changed cut point back to 0.5.
          Also removed more variables as being coded as categorical.

          This resulted in a further improvement and my best score to date: 0.78947

          I moved up the public leaderboard by 336 places.

  •  Same model as above, but used multinomial logistic regression. Factors and covariates were correctly coded, so this may have result in lessor result. In future, will see if I can code some of the factors as covariates to see what impact this has.

No comments:

Post a Comment