I'm really excited to announce a new Python video course with O'Reilly on data pipelines. If you are interested in learning some of the popular options available for workflow automation and management in Python, take a look!
In the course, I cover:
- Using Celery for simple automation
- Setting up Hadoop for file storage
- Comparing tools like Airflow and Luigi for your pipeline needs
- How to parallelize data processing with Dask
- A brief look at other popular tools like Apache Spark and Django Channels
- More general and broad concepts like testing, DAGs, producers, consumers and how to be a not-awful systems caretaker.
There is also a public repository available which covers the code and tools used.
I appreciate any and all feedback from students who are enrolled or have taken the course, so please reach out! :)