Remember that my best score ( 0.79904) so far is based on a logistic regression model with the following terms:
- male (ie gender)
- pclass
- fare
- fare per person
- Title
- age * class
- sex * class
- combined age (this is age, is missing values based on the median of respective Title)
- family (total of sibsp and parch)
My first submission (31) was to remove family and add sibsp and parch. This resulted in score of 0.79426 - slightly under my best score.
Second submission (32) was to run a backwards elimination, and identify the following model:
- pclass
- Title
- sex * class
- combined age
- family
Third submission (33) was to create a new variable "child". Using combined age variable, cases with age less than or equal to 18 were coded 1 , with cases over 18 being coded 0..
This produced a score of 0.79426.
Still haven't got past the 0.80 threshold !
No comments:
Post a Comment