Data Bootcamp: MBA Spring 2018
Where and When
- Instructor: Benjamin Zweig (firstname.lastname@example.org)
- Teaching fellow: Manoj Wajekar (email@example.com)
Meeting times: Tuesday 6 PM- 9 PM
Meeting place: KMC 4-80
THE SYLLABUS. All the important details about the course, procedures, important dates, etc.
THE BOOK. The topics in the first half are all in the book. We will follow this closely. At the book link, click the large blue Read button to read online – or download the pdf. Both come with links.
NOTEBOOKS Github repository of notebooks used in class.
DISCUSSION GROUP Post your doubts on NYU Classes forum tab.
Code Practice 1 Due Date: February 20, 2018.
Code Practice 2 Due Date: February 27, 2018.
Code Practice 3 Due Date: Mar 06, 2018.
Midterm Exam: Mar 27, 2018. Quick info: In class, open book, open internet if the wireless is up, bring one page of notes.
Final Project Due Date: End of Day May 04, 2018
Code Pratice Submissions
Assignments will be posted on NYU Classes. Submit your python code in PDF format or ipython notebook in NYU Classes.
Week By Week Guide…
Topic 1. Data + Python = Magic!
Handouts: Outline | Book (Click on blue “Read” button) | Three ideas
Examples: Gapminder | cancer screening | Uber in NYC | medical expenditures | mortality | earthquake | Gender pay gap | Fertility | Vaccines
Summary: It’s nice to have skills; installing Anaconda; Spyder and Jupyter/IPython; data; questions; idea machines.
Topic 2. Python fundamentals 1
Topic 3. Python fundamentals 2
Topic 4. Data input: Packages and Pandas
Handouts: Outline | Book chapter | Code (Download “Raw”) | Code Practice #3 (Due Mar 06) (code template)
Summary: Packages; import; Pandas; csv files; reading csv/xls files; dataframes; columns; index; APIs.
Topic 5. Python graphics: Matplotlib fundamentals
Handouts: Outline | Book chapter | Code (Download “Raw” as ipynb) | Code Practice A (try by March 20) (Download “Raw” as ipynb)
Summary: Three approaches to graphics: dataframe plot methods, plot(x,y), and fig/ax objects and methods; lines, scatters, bars, horizontal bars, styles.
Topic 6. Review & applications
Topic 7. Midterm Exam: March 27
Posted after class: Exam with answers
Topic 8. Thinking about projects and Introduction to Pandas
Topic 9. Pandas Cleaning and Statmodels
Handouts: Outline | Code_Pandas_Cleaning
Summary: Pandas has incredible facilities for managing data. We look at fixing numbers misidentified as strings, managing missing observations, selecting variables and observations, and the
We also cover statmodels, python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
Topic 10. Pandas Shaping and Scikit-learn
Handouts: Outline | Code_Pandas_Shaping
Summary: Here we introduce four key methods for “shaping” our data:
df.unstack. When we say shaping we mean manipulating the data so we get specific row and column labels.
We also cover Scikit-learn, Machine Learning package to model various classification, regression and clustering algorithms.
Topic 11. Pandas Combining and Scikit-learn
Summary: Combining dataframes (merge, concatenate). Statistics (mean, median, quantiles), categorical variables, grouping data by categories, counts and statistics by category.
We will also cover remaining topics in Scikit-learn in this class.
Topic 12. Review and General Project Discussion.
Discussion regarding project status, challenges, etc.