# Data Bootcamp: MBA Fall 2017

## Where and When

- Instructor:
**Benjamin Zweig**(bzweig@stern.nyu.edu) - Teaching fellow:
**Manoj Wajekar**(mw3451@stern.nyu.edu) Meeting times: Tuesday 6 PM- 9 PM

Meeting place: KMC 4-80

## Important Links

**THE SYLLABUS.**All the important details about the course, procedures, important dates, etc.**THE BOOK.**The topics in the first half are all in the book. We will follow this closely. At the book link, click the large blue Read button to read online – or download the pdf. Both come with links.**NOTEBOOKS**Github repository of notebooks used in class.**DISCUSSION GROUP**Post your doubts on NYU Classes forum tab.

## Important Dates

**Code Practice 1**Due Date:**October 10, 2017.****Code Practice 2**Due Date:**October 17, 2017.****Code Practice 3**Due Date:**October 24, 2017.****Midterm Exam:****November 7, 2017.**Quick info: 75 minutes, in class, open book, open internet if the wireless is up, bring one page of notes.**Final Project**Due Date:**End of Day December 21, 2017**

## Code Pratice Submissions

Assignments will be posted on NYU Classes. Submit your python code in PDF format or ipython notebook in NYU Classes.

### Week By Week Guide…

## Topic 1. Data + Python = Magic!

**Handouts:** Outline | Book (Click on blue “Read” button) | Three ideas

**Examples:** Gapminder | cancer screening | Uber in NYC | medical expenditures | mortality | earthquake | Gender pay gap | Fertility | Vaccines

**Summary:** It’s nice to have skills; installing Anaconda; Spyder and Jupyter/IPython; data; questions; idea machines.

## Topic 2. Python fundamentals 1

**Handouts:** Outline | Book chapter | Code Practice #1 (Due October 10)

**Summary:** Calculations; assignments; strings; lists; tuples; built-in functions; objects; methods; tab completion.

## Topic 3. Python fundamentals 2

**Handouts:** Outline | Book chapter | Code Practice #2 (Due October 17)

**Summary:** True and False; comparisons; conditionals; slicing; loops; function definitions and returns; dictionaries.

## Topic 4. Data input: Packages and Pandas

**Handouts:** Outline | Book chapter | Code (Download “Raw”) | Code Practice #3 (Due October 24) (code template)

**Summary:** Packages; import; Pandas; csv files; reading csv/xls files; dataframes; columns; index; APIs.

## Topic 5. Python graphics: Matplotlib fundamentals

**Handouts:** Outline | Book chapter | Code (Download “Raw” as ipynb) | Code Practice A (try by October 31) (Download “Raw” as ipynb)

**Summary:** Three approaches to graphics: dataframe plot methods, plot(x,y), and fig/ax objects and methods; lines, scatters, bars, horizontal bars, styles.

## Topic 6. Review & applications

**Handouts:** Outline | Code (review | applications)

**Summary:** Exam review, followed by applications to get us thinking about interesting datasets and how to work with them.

## Topic 7. Midterm Exam: November 7

**Posted after class:** Exam with answers

## Topic 8. Thinking about projects and Introduction to Pandas

**Handouts:** Outline | Project Examples | Code (examples | current indicators | demography | Airbnb)

**Summary:** Projects: say something interesting with data. Idea machines. Examples.

## Topic 9. Pandas Cleaning and Seaborn

**Handouts:** Outline | Code_Pandas_Cleaning |
Code_seaborn

**Summary:** Pandas has incredible facilities for managing data. We look at fixing numbers misidentified as strings, managing missing observations, selecting variables and observations, and the `isin`

and `contains`

methods.

We cover more advanced graphics using the seaborn. We show how seaborn can be used to easliy construct common, yet sophisticated graphics with little additional effort.

## Topic 10. Pandas Shaping and Statsmodels

**Handouts:** Outline | Code_Pandas_Shaping

**Summary:** Here we introduce four key methods for “shaping” our data: `df.set_index`

, `df.reset_index`

, `df.stack`

, and `df.unstack`

. When we say shaping we mean manipulating the data so we get specific row and column labels.

We also cover statmodels, python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

## Topic 11. Pandas Combining and Scikit-learn

**Handouts:** (Code_Pandas_Combining|summarizing)

**Summary:** Combining dataframes (merge, concatenate). Statistics (mean, median, quantiles), categorical variables, grouping data by categories, counts and statistics by category.

We also cover Scikit-learn, Machine Learning package to model various classification, regression and clustering algorithms.

## Topic 12. Review and General Project Discussion.

Discussion regarding project status, challenges, etc.