Pages

Saturday, March 30, 2013

Kaggle Titanic Competition

There's a great website which I'm sure you've heard of called kaggle.com. Kaggle runs predictive analytic competitions.

Some time ago, Kaggle started offering "Getting Started" competitions, which " provide an ideal starting place for people who may not have a lot of experience in data science and machine learning".  This semester, the subject I am doing at University (Advanced Topics in Regression) fits in nicely with learning about machine learning. What I like is that the tutorials provided by Kaggle show that high power software is not necessary to start and learn about the concepts involved. Even Excel can be used to understand what is involved.

So I've submitted my first entry in the Titanic: Machine Learning from Disaster competition - a default naive entry. The majority of people in the training dataset did not survive, and so I've predicted that no-one in the test datset will survived.

That entry gives me a score of 0.62679, which places me as equal 2175 out of 2295. The top ten scores range from 0.85167 to 0.96172.  It will be interesting to see whether any of the current top entries are disqualified, as information about the outcome for the test set is readily available on the internet.

To place this default submission in context, the Kaggle benchmark entries are:

- Gender, price and class - position 430 (0.77990)
- My First Random Forrest - position 593 (0.77512)
- Gender - position 1177 (0.76555)

The following histograms show the distribution of scores: