Data Bootcamp: MBA Fall 2018

Where and When

Important Dates

Code Pratice Submissions

Assignments will be posted on NYU Classes. Submit your python code in PDF format or ipython notebook in NYU Classes.

Week By Week Guide…

Topic 1. Data + Python = Magic!

Day and Date: Thrusday 9/27/2018 Handouts: Outline | Book | Three ideas

Examples: Gapminder | cancer screening | Uber in NYC | medical expenditures | mortality | earthquake | Gender pay gap | Fertility | Vaccines
Summary: It’s nice to have skills; installing Anaconda; Spyder and Jupyter/IPython; data; questions; idea machines.

Topic 2. Python fundamentals 1

Day and Date: Thursday 10/04/2018 Handouts: Outline | Book chapter
Summary: Calculations; assignments; strings; lists; tuples; built-in functions; objects; methods; tab completion.

Topic 3. Python fundamentals 2

Day and Date: Thursday 10/11/2018 Handouts: Outline | Book chapter | Code Practice #1 (Due October 11)
Summary: True and False; comparisons; conditionals; slicing; loops; function definitions and returns; dictionaries.

Topic 4. Data input: Packages and Pandas

Day and Date: Thursday 10/18/2018 Handouts: Outline | Book chapter | Code | Code Practice #2 (Due October 18) (code template)
Summary: Packages; import; Pandas; csv files; reading csv/xls files; dataframes; columns; index; APIs.

Topic 5. Python graphics: Matplotlib fundamentals

Day and Date: Thursday 10/25/2018 Handouts: Outline | Book chapter | Code (Download “Raw” as ipynb) | Code Practice #3 (Due by October 25)
Summary: Three approaches to graphics: dataframe plot methods, plot(x,y), and fig/ax objects and methods; lines, scatters, bars, horizontal bars, styles.

Topic 6. Review & applications

Day and Date: Thursday 11/01/2018 Handouts: Outline | Code (review | applications)
Summary: Exam review, followed by applications to get us thinking about interesting datasets and how to work with them.

Topic 7. Midterm Exam: November 8, 2018

Posted after class: Exam with answers

Topic 8. Thinking about projects and Introduction to Pandas DataFrame

Day and Date: Thursday 11/15/2018 Handouts: Outline | Project Examples | Code (examples | current indicators | demography | Airbnb)
Summary: Projects: say something interesting with data. Idea machines. Examples.

Topic 9. Pandas Cleaning and Statmodels

Day and Date: Thursday 11/29/2018 Handouts: Outline | Code_Pandas_Cleaning
Summary: Pandas has incredible facilities for managing data. We look at fixing numbers misidentified as strings, managing missing observations, selecting variables and observations, and the isin and contains methods.
We also cover statmodels, python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

Topic 10. Pandas Shaping and Scikit-learn

Day and Date: Thursday 12/06/2018 Handouts: Outline | Code_Pandas_Shaping
Summary: Here we introduce four key methods for “shaping” our data: df.set_index, df.reset_index, df.stack, and df.unstack. When we say shaping we mean manipulating the data so we get specific row and column labels.
We also cover Scikit-learn, Machine Learning package to model various classification, regression and clustering algorithms.

Topic 11. Pandas Combining and Scikit-learn

Day and Date: Thursday 12/13/2018 Handouts: (Code_Pandas_Combining|summarizing)
Summary: Combining dataframes (merge, concatenate). Statistics (mean, median, quantiles), categorical variables, grouping data by categories, counts and statistics by category.
We will also cover remaining topics in Scikit-learn in this class.

Topic 12. Wrap up.

Day and Date: Thursday 12/20/2018 Discussion regarding project status, challenges, etc.