Practical Data Cleaning with Python Resources

Posted on Wed 03 May 2017 in trainings

Practical Data Cleaning Resources

(O'Reilly Live Online Training)

This week I will be giving my first O'Reilly Live Online Training via the Safari platform. I'm pretty excited to share some of my favorite data cleaning libraries and tips for validating and testing your data workflows.

This post hopes to be …


Continue reading

PyData Amsterdam Keynote on Ethical Machine Learning

Posted on Fri 07 April 2017 in conferences

I was kindly asked by the PyData Amsterdam organizers to keynote the conference. As a passionate fan of ethical machine learning and the great research being done by data scientists and academics around the world -- I am very enthused to present the topic to the conference.

My slides are currently …


Continue reading

Ten Tips for First-Time Conference Speakers

Posted on Sat 11 February 2017 in conferences

The saddest moment for me at conferences is when I'm in the middle of an interesting conversation with a bright person and I ask her when her talk is and she says, "Who me?"

The number of folks I speak with every year at conferences who have amazing stories to …


Continue reading

The Practice of Programming: 18 Years Later

Posted on Fri 20 January 2017 in programming

Over the new year holiday time I had a chance to get away from it all, and snuck up to Finland to sit in a lodge on the Gulf of Finland, sip coffee, take saunas and read. I brought along a few books, the only programming one being Brian W …


Continue reading

New O'Reilly Video Training: Data Pipelines with Python

Posted on Tue 13 December 2016 in trainings

I'm really excited to announce a new Python video course with O'Reilly on data pipelines. If you are interested in learning some of the popular options available for workflow automation and management in Python, take a look!

In the course, I cover:


Continue reading

DAGs & Dask: How and When to Accelerate your Data Analysis

Posted on Sat 29 October 2016 in conferences

I gave a talk about Directed Acyclic Graphs (DAGs) and Dask at PyConCZ 2016. It was super fun and I had a great time at the conference. If you want to read my slides below, here they are! There will be videos available later, so I'll post the link / video …


Continue reading

Introduction to Data Wrangling @ PyConCZ

Posted on Sat 29 October 2016 in conferences

PyConCZ 2016 was such a fun conference! First off, it was the first time I got to see Jackie Kazil since we started writing our O'Reilly book Data Wrangling with Python together, HOORAYYYY!


Continue reading

Chatbot Scraper: Europarl Scraper: 24 Languages of Politics, at your fingertips

Posted on Thu 20 October 2016 in hacking

I participated in a two-day PyDataBerlin Hackathon event in early-October and decided to build a scraper for European Parliament. This was after I found the Europarl parallel corpus a bit underwhelming as it is messy and not tagged for party, speakers or topic (this is understandable, as it is primarily …


Continue reading

Chatbot Scraper: Using (today's) IRC logs as your NLP datasets

Posted on Thu 29 September 2016 in hacking

I dunno about you, but I often find myself bored with NLP (natural language processing) datasets. Too often they are older, based around something that is not particularly interesting to me or something I've analyzed or used before.

For me, IRC has often been a source of community, fun, sometimes …


Continue reading

Automating your Data Cleanup with Python

Posted on Sat 17 September 2016 in conferences

I gave a talk at PyCon UK 2016 on automating your data cleanup with Python. I want to again thank the organizers for having me and thank the folks who attended. If you have any questions or are interested in talking about data cleaning problems, feel free to reach out …


Continue reading