New O'Reilly Video Training: Data Pipelines with Python

Posted on Tue 13 December 2016 in trainings

I'm really excited to announce a new Python video course with O'Reilly on data pipelines. If you are interested in learning some of the popular options available for workflow automation and management in Python, take a look!

In the course, I cover:

  • Using Celery for simple automation
  • Setting up Hadoop for file storage
  • Comparing tools like Airflow and Luigi for your pipeline needs
  • How to parallelize data processing with Dask
  • A brief look at other popular tools like Apache Spark and Django Channels
  • More general and broad concepts like testing, DAGs, producers, consumers and how to be a not-awful systems caretaker.

There is also a public repository available which covers the code and tools used.

I appreciate any and all feedback from students who are enrolled or have taken the course, so please reach out! :)