Wednesday, June 5, 2013

Is Title a Significant Predictor of Survival?

To consider whether Title is a significant predictor of “survival”:

The variable “Title” contains 17 levels, most of which have very low frequencies – see table below.

First step is to consolidate all levels with low frequencies into an “Other” level.

This gives us the following frequencies:

-          Other – 25
-          Master – 40
-          Miss – 184
-          Mr – 517
-          Mrs – 125

Then we run a binary logistic regression using just Title – although we now have replaced a consolidated variable with these dummy variables (excluding Mrs) as the predictors.

Only one variable was not significant in the regression:

So conclusion is that Title is a significant predictor of survival.

Second test was to submit entry to kaggle using best performing model excluding Title:

model  <- glm(formula = survived ~ male + pclass + fare + fare_per_person + age_class.interaction + sex_class + combined_age + family + age_squared + age_class_squared, family = binomial(),data = train)

This scored 0.77512, well below my current best score of 0.80861.

No comments:

Post a Comment