I recently finished a project using the dataset from Pump It Up: Mining the Water Table from DRIVENDATA. I’m going to walk through the Random Forest Classifier, one of the classifiers I tested, which was the one I found to perform the best after tuning its hyperparameters.

I won’t go into it here but there is a significant amount of data cleaning and feature selection to do before the data is ready for a model. …

Mike Erb

Data Scientist with a background in Computer Science and as an Entrepreneur in the Bike industry — Based in Ithaca, NY

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store