kjam's blog

Comparing scikit-learn Text Classifiers on a Fake News Dataset

Posted on Mo 28 August 2017 in research

Finding ways to determine fake news from real news is a challenge most Natural Language Processing folks I meet and chat with want to solve. There is significant difficulty in doing this properly and without penalizing real news sources.

I was discussing this problem with Miguel Martinez-Alvarez on my last …

Embedded *isms in Vector-Based Natural Language Processing

Posted on Fr 16 September 2016 in research

You may have read recently about machine learning's bias problem particularly in word embeddings and vectors. It's a massive problem. If you are using word embeddings to generate associative words, phrases or to do comparisons, you should be aware of the biases you are introducing into your work. In preparation …