Introduction Data Engineering with Python (PyData)

This workshop content may change. We normally tweak/customize it based on the audience level (for example the student version of the course may be different from prcticing professional version).

All sessions are hands on and practice based.

PyData – Collecting, Cleaning and Transforming Data

Session-0: Introduction to PyData

  1. What is Machine Learning – The anatomy of a Machine Learning Application
  2. Data from the real world – CSV files, databases, JSON, XML, Web HTML, Text
  3. Unstructured, Semi-structured and Structured Data
  4. The Data pipeline
  5. Where Data Scientists and ML Engineers spend most of the time. 


Session-1: Python CSV Module

  1. Read a CSV file and display it
  2. Read a CSV file and store it in Data Frame
  3. Write a Data Frame to a CSV File
  4. Exercises

Session-2: Scraping Data from Websites 

  1. Command-Line Tools
  2. Web Spiders
  3. Link Extractors
  4. Web Crawler

Session-3: Getting Data from RSS Feeds

  1. Introduction to  RSS Concepts
  2. Feed Reader
  3. News Aggregator
  4. News Filter


Session-4: Getting Data from API 

  1. Storing Search results in a Database

Session-5; Pandas – Data Frames

  1. Introduction to Data Frames
  2. Validating Data
  3. Detecting Outliers
  4. Cleaning Data

Session-6: Visualizing Data

  1. Visualizing CSV Data
  2. Visualizing Pandas Data Frames
  3. Visualizing Database Data