Chatbot Scraper: Europarl Scraper: 24 Languages of Politics, at your fingertips

Posted on Do 20 Oktober 2016 in hacking

I participated in a two-day PyDataBerlin Hackathon event in early-October and decided to build a scraper for European Parliament. This was after I found the Europarl parallel corpus a bit underwhelming as it is messy and not tagged for party, speakers or topic (this is understandable, as it is primarily …


Continue reading

Chatbot Scraper: Using (today's) IRC logs as your NLP datasets

Posted on Do 29 September 2016 in hacking

I dunno about you, but I often find myself bored with NLP (natural language processing) datasets. Too often they are older, based around something that is not particularly interesting to me or something I've analyzed or used before.

For me, IRC has often been a source of community, fun, sometimes …


Continue reading

Automating your Data Cleanup with Python

Posted on Sa 17 September 2016 in conferences

I gave a talk at PyCon UK 2016 on automating your data cleanup with Python. I want to again thank the organizers for having me and thank the folks who attended. If you have any questions or are interested in talking about data cleaning problems, feel free to reach out …


Continue reading

Embedded *isms in Vector-Based Natural Language Processing

Posted on Fr 16 September 2016 in research

You may have read recently about machine learning's bias problem particularly in word embeddings and vectors. It's a massive problem. If you are using word embeddings to generate associative words, phrases or to do comparisons, you should be aware of the biases you are introducing into your work. In preparation …


Continue reading

Obligatory Women In Tech Post

Posted on Fr 16 September 2016 in life

Question: How does it feel to be a woman in tech?

Answer:

via GIPHY

see also: OG PyLadies Interview


I Hate You, NLP ;)

Posted on Do 21 Juli 2016 in conferences

"I had a great time talking about Sentiment Analysis and Natural Language processing at EuroPython 2016. Here are my slides for your review, feel free to reach out on Twitter or email if you'd like to chat further about NLP, machine learning and sentiment. I look forward to starting more …


Continue reading

Python Flight Search

Posted on Di 29 März 2016 in hacking

Like many people, I enjoy travel. With family and friends all across the United States and a home base in Berlin, it's fairly easy to find a reason to travel -- either globally or within the EU. That said, what I find more difficult is to determine what's the best way …


Continue reading

Data Wrangling with Python Course

Posted on Mo 29 Februar 2016 in trainings

I'll be in New York on July 13th and 14th, teaching how to "big data" with Python. We'll cover Pandas, Hadoop, PySpark and more on automation, acquisition and managing your data.

Next Course: New York City, July 13-14

Tickets are available on Eventbrite with a special Early Bird and Student …


Continue reading

Data Wrangling with Python

Posted on So 01 November 2015 in books

Just a quick note that my book: Data Wrangling with Python is available for prepurchase on Amazon as well as in early release on O'Reilly's web site.

Data Wrangling with Python

Pick up a copy for less than full amount now. I'll be posting some examples of problems we work through in the book …


Continue reading

Europython 2015

Posted on Do 23 Juli 2015 in conferences

Introduction to Data Analysis Tutorial

Want to learn how to analyze data using Python? If you're at #europycon you should drop by my course! If not, watch the video online later today (will post link!)