# Data Bootcamp: MBA Spring 2018

## Where and When

- Instructor:
**Benjamin Zweig**(bzweig@stern.nyu.edu) - Teaching fellow:
**Manoj Wajekar**(mw3451@stern.nyu.edu) Meeting times: Tuesday 6 PM- 9 PM

Meeting place: KMC 4-80

## Important Links

**THE SYLLABUS.**All the important details about the course, procedures, important dates, etc.**THE BOOK.**The topics in the first half are all in the book. We will follow this closely. At the book link, click the large blue Read button to read online – or download the pdf. Both come with links.**NOTEBOOKS**Github repository of notebooks used in class.**DISCUSSION GROUP**Post your doubts on NYU Classes forum tab.

## Important Dates

**Code Practice 1**Due Date:**February 20, 2018.****Code Practice 2**Due Date:**February 27, 2018.****Code Practice 3**Due Date:**Mar 06, 2018.****Midterm Exam:****Mar 27, 2018.**Quick info: In class, open book, open internet if the wireless is up, bring one page of notes.**Final Project**Due Date:**End of Day May 04, 2018**

## Code Pratice Submissions

Assignments will be posted on NYU Classes. Submit your python code in PDF format or ipython notebook in NYU Classes.

### Week By Week Guide…

## Topic 1. Data + Python = Magic!

**Handouts:** Outline | Book (Click on blue “Read” button) | Three ideas

**Examples:** Gapminder | cancer screening | Uber in NYC | medical expenditures | mortality | earthquake | Gender pay gap | Fertility | Vaccines

**Summary:** It’s nice to have skills; installing Anaconda; Spyder and Jupyter/IPython; data; questions; idea machines.

## Topic 2. Python fundamentals 1

**Handouts:** Outline | Book chapter | Code Practice #1 (Due February 20)

**Summary:** Calculations; assignments; strings; lists; tuples; built-in functions; objects; methods; tab completion.

## Topic 3. Python fundamentals 2

**Handouts:** Outline | Book chapter | Code Practice #2 (Due February 27)

**Summary:** True and False; comparisons; conditionals; slicing; loops; function definitions and returns; dictionaries.

## Topic 4. Data input: Packages and Pandas

**Handouts:** Outline | Book chapter | Code (Download “Raw”) | Code Practice #3 (Due Mar 06) (code template)

**Summary:** Packages; import; Pandas; csv files; reading csv/xls files; dataframes; columns; index; APIs.

## Topic 5. Python graphics: Matplotlib fundamentals

**Handouts:** Outline | Book chapter | Code (Download “Raw” as ipynb) | Code Practice A (try by March 20) (Download “Raw” as ipynb)

**Summary:** Three approaches to graphics: dataframe plot methods, plot(x,y), and fig/ax objects and methods; lines, scatters, bars, horizontal bars, styles.

## Topic 6. Review & applications

**Handouts:** Outline | Code (review | applications)

**Summary:** Exam review, followed by applications to get us thinking about interesting datasets and how to work with them.

## Topic 7. Midterm Exam: March 27

**Posted after class:** Exam with answers

## Topic 8. Thinking about projects and Introduction to Pandas

**Handouts:** Outline | Project Examples | Code (examples | current indicators | demography | Airbnb)

**Summary:** Projects: say something interesting with data. Idea machines. Examples.

## Topic 9. Pandas Cleaning and Statmodels

**Handouts:** Outline | Code_Pandas_Cleaning

**Summary:** Pandas has incredible facilities for managing data. We look at fixing numbers misidentified as strings, managing missing observations, selecting variables and observations, and the `isin`

and `contains`

methods.

We also cover statmodels, python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

## Topic 10. Pandas Shaping and Scikit-learn

**Handouts:** Outline | Code_Pandas_Shaping

**Summary:** Here we introduce four key methods for “shaping” our data: `df.set_index`

, `df.reset_index`

, `df.stack`

, and `df.unstack`

. When we say shaping we mean manipulating the data so we get specific row and column labels.

We also cover Scikit-learn, Machine Learning package to model various classification, regression and clustering algorithms.

## Topic 11. Pandas Combining and Scikit-learn

**Handouts:** (Code_Pandas_Combining|summarizing)

**Summary:** Combining dataframes (merge, concatenate). Statistics (mean, median, quantiles), categorical variables, grouping data by categories, counts and statistics by category.

We will also cover remaining topics in Scikit-learn in this class.

## Topic 12. Review and General Project Discussion.

Discussion regarding project status, challenges, etc.