Data Bootcamp: Undergrad Fall 2018
This page is your key resource for the course. Everything you need is here! Below are links to key documents such as the syllabus, the book, the blog, and my GitHub repository for the class. Moreover, there is a date by date list of topics, and links to material used in each class. Please watch this site regularly to stay up to date.
Last update: 1/22/2019
Where and When
- Who:
- Michael Waugh (instructor)
Office Hours: in KMC 7-74, Times TBA
- Aditya Vashistha (teaching fellow)
Office Hours: TBA
- Michael Waugh (instructor)
-
Meeting times: Tuesday and Thursday, 2:00 - 3:15
- Meeting place: TISC_UC24
Important Links
-
THE SYLLABUS. All the important details about the course, procedures, important dates, etc.
-
THE BOOK. The topics in the first half are all in the book. We will follow this closely. At the book link, click the large blue Read button to read online – or download the pdf. Both come with links.
-
THE BLOG. Remember this course is a data course that uses Python. In THE BLOG, I’ll discuss interesting uses of data that I find on the web and talk through various issues.
-
NOTEBOOKS Github repository of notebooks used in class.
Important Dates
- Problem Sets:
- Take Home Midterm
- Starts March 13 at noon, due 5pm Friday 15.
- Guided project
- Basic data exploration (Due by April 8)
- Beer prices (several brands), one year (Due by April 26)
- Beer prices (several brands), all years, with Miller-Weinberg plots (Due by May 6th)
- Independent project
- Three questions: April 11th
- Data report (and final question): May 2nd
- Final project: Monday May 20th
Week By Week Guide…
Topic 1. Introduction: Data + Python = Magic!
Handouts: Book | Three ideas
Summary: It’s nice to have skills; installing Anaconda; Jupyter/IPython; data; questions; idea machines.
Topic 2. Python fundamentals 1
Handouts: Book chapter | Code
Summary: Calculations; assignments; strings; lists; tuples; built-in functions; objects; methods; tab completion.
Topic 3. Python fundamentals 2
Handouts: Book chapter | Code
Summary: True and False; comparisons; conditionals; slicing; loops; function definitions and returns; dictionaries.
Topic 4. Python fundamentals 3
Handouts: Book chapter | Code
Summary: True and False; comparisons; conditionals; slicing; loops; function definitions and returns; dictionaries.
Topic 5. Intro to Pandas
Handouts: Book chapter | Code
Summary: Packages; import; Pandas; csv files; reading csv/xls files; dataframes; columns; index; APIs.
Topic 6. Python graphics: Matplotlib fundamentals
Handouts: Book chapter | Code
Summary: Approach to graphics focused on the fig/ax objects and methods; lines, scatters, bars, horizontal bars, histograms, styles.
In class code/lectures:
-
GDP and its comovement with subcomponents (fundamentals of plotting, line and scatter plots).
-
Why are some countries rich, others poor? (more histograms, bar charts, fancy scatter plots, data wrangling).
Topic 7. Thinking about projects
Handouts: Outline | Project Examples | Code (examples | current indicators | demography | Airbnb)
Summary: Projects: say something interesting with data. Idea machines. Examples.
Topic 8 More Pandas: Combining
Handouts: Code
Summary: Often we need to combine data from two or more dataframes. We explore the merge
feature of Pandas. Along the way we take an extended detour to review methods for downloading and unzipping compressed files
Topic 9 More Pandas: Cleaning
Handouts: Code
Summary: Pandas has incredible facilities for managing data. We look at fixing numbers misidentified as strings, managing missing observations, selecting variables and observations, and the isin
and contains
methods. Application: What is the price of Guacamole at Chipotle?
Topic 10 More Pandas: Shaping
Handouts: Code
Summary: Understand and be able to apply the melt/stack/unstack/pivot methods.