Using Scikit-Learn's interface for turning Spaghetti Data Science into Maintainable Software

By Holger Peters

Finding a good structure for number-crunching code can be a problem, this especially applies to routines preceding the core algorithms: transformations such as data processing and cleanup, as well as feature construction.

With such code, the programmer faces the problem, that their code easily turns into a sequence of highly interdependent operations, which are hard to separate. It can be challenging to test, maintain and reuse such “Data Science Spaghetti code”.

Scikit-Learn offers a simple yet powerful interface for data science algorithms: the estimator and composite classes (called meta-estimators). By example, I show how clever usage of meta-estimators can encapsulate elaborate machine learning models into a maintainable tree of objects that is both handy to use and simple to test.

Looking at examples, I will show how this approach simplifies model development, testing and validation and how this brings together best practices from software engineering as well as data science.

Knowledge of Scikit-Learn is handy but not necessary to follow this talk.

in on Tuesday 21 July at 16:45 See schedule

Video

Comments

Would be perfect if the speaker can share his presentation
— Александр Чекунков, 21 July 2015
Slides are available here: https://github.com/blue-yonder/documents/tree/master/presentations/EuroPython%202015
— Holger Peters, 23 July 2015

New comment

Comment

Name

Email address

URL

Captcha