When

Thursday January 12, 2017 from 8:30 AM to 11:30 AM EST
Add to Calendar 

Where

Northwest Building, B1 Level 
52 Oxford Street
Cambridge, MA 02138
 

 
Driving Directions 

Contact

Harvard Paulson School 
Institute for Applied Computational Science
Sheila Coveney, Program Manager 

617-384-9091 
iacs-info@seas.harvard.edu 

 

IACS ComputeFest Workshop:
Data Science in Python (Day Four)

Thursday January 12, 2017
8:30 AM - 11:30 AM 

This is day four of a five-day, three hours-per-day workshop that will take you from being a person with some idea of how to program to a person with some idea of how to do data science.  Day four will cover the following topics: classification and model comparison.

Pre-requisites:

Attendees must have programmed in some programming language; being math savvy will help but is not necessary. 

Participants must bring a laptop with Anaconda Python Distribution installed: https://www.continuum.io/downloads.  We will use Python 2.7 in all sessions.

Overview of the entire week:

We'll work through learning those parts of Python needed to do data science, starting with numerical python; we'll then move on to exploratory data analysis and visualization; from there we'll tackle training some machine learning models, both regression (the prediction of continuous outcomes) and classification (the prediction of labels), including concepts such as feature selection, cross-validation, and regularization, and (time permitting) including the use of ensembles.

Finally, you’ll learn how to train these models when the data sizes are two large for one machine, and how to reduce the amount of computational time required to train these models.

Topics covered throughout the week include:

Day 1: Monday, January 9
Intro to Python, Numpy, Matplotlib, and Bokeh.

Day 2: Tuesday, January 10
Exploratory analysis and vizualization; the basics of machine learning.

Day 3: Wednesday, January 11
Learning a model (complexity, regularization, cross-validation); Regression.

Day 4: Thursday, January 12
Classification and Model comparison.

Day 5: Friday, January 13
Large scale machine learning with joblib, dask, and ipython parallel. If time permits, Ensembles.

Please note that you must register for each workshop separately.