Scikit-learn tutorial

Machine Learning is the branch of computer science concerned with the development of algorithms which can learn from previously-seen data in order to make predictions about future data, and has become an important part of research in many scientific fields. This set of tutorials will introduce the basics of machine learning, and how these learning tasks can be accomplished using Scikit-Learn, a machine learning library written in Python and built on NumPy, SciPy, and Matplotlib. By the end of the tutorials, participants will be poised to take advantage of Scikit-learn’s wide variety of machine learning algorithms to explore their own data sets.


This tutorial requires the following packages:

  • Python version 2.7 or 3.4+
  • numpy version 1.8 or later:
  • scipy version 0.15 or later:
  • matplotlib version 1.3 or later:
  • scikit-learn version 0.15 or later:
  • ipython/jupyter version 3.0 or later, with notebook support:
  • seaborn: version 0.5 or later, used mainly for plot styling

The easiest way to get these is to use the conda environment manager. I suggest downloading and installingminiconda. The following command will install all required packages:

$ conda install numpy scipy matplotlib scikit-learn ipython-notebook

Alternatively, you can download and install the (very large) Anaconda software distribution, found at

The tutorial will be based on Jupyter notebooks. In order to follow along, you will need to either

  • clone this github repo:
  • if you don’t have git installed, you can get a zip archive from

Loïc Estève

Loïc has a background in Particle Physics, which is how he discovered
Python towards the end of his PhD. After a few year stint in an
investment fund of writing mostly C++ and as much Python as possible,
he was lured back to an academic environment. He is now scikit-learn
and joblib developer in the Parietal team at Inria. He has been
involved in a few different Python open-source projects in the past
few years, amongst which sphinx-gallery and nilearn.