Priveedly: your private and personal content reader and recommender
Posted on Do 23 Januar 2025 in personal-ai
I'm excited to open-source a project that I've been using for the past 2 and a half years: a private/personal reader and recommender.
It works with:
- RSS feeds
- HackerNews
- Lobste.rs
and comes with an example Jupyter Notebook for training your own text-based recommendation model once you have enough content. For most folks, this will be about 3-6 months of active use -- depending on the amount of content you consume.
Interested in what it looks like? There's a short video introduction on YouTube
If you just want to get started, head over to the project's GitHub! If you want a little history of why I bothered to build this and how I use it, read on.
Why news and content is personal
Despite what Social Media(TM) wants you to think, your content choices are deeply personal. You like the things you like, surely others like them, but you might be a very special combination of things which is what guides your interests.1
The large content providers and social media platforms try to be everything to everyone, and when that doesn't work, they try to personalize by tracking you and putting you in ever smaller and smaller bins and cross-sections so that eventually your feed is "personalized" in a way that is still profitable for them to serve you content.
Unfortunately, this means that if you are curious about something outside of your normal interactions, one poor click or follow might haunt you and rearrange your content.2 In my opinion, you shouldn't be afraid that clicking or reading something you are mildly interested in means you're doomed to see ads (or even deal with changes in online prices or search results) just because you clicked on one article.3
It can also be fun to decide what you want to expose yourself to, for your own autonomy and purposes. Maybe your ideas change, maybe you are going through a huge life change, or maybe you want to surround yourself with a new bubble on the internet. Either way, deciding and determining directly what you read and see is a cool way to reclaim that autonomy.
For these reasons, I decided that I was going to try to pursue more directed attention on my own reading and content online.
Down the rabbit hole: shouldn't this be easy?
I had long used feed readers, but I wanted to combine that with other content sources, like Reddit, Twitter4 and other tech news sites. As I first started investigating ways to just do this easily with online services (i.e. private services that promised to keep my data private), it was hard to find one that justified the cost. Many other services weren't very clear on if they actually implemented tracking-free clicks and content.
I assumed there would be some easy open-source options, so then I looked there. There were some great ones I tried at first that were React-based, but since I am essentially incompetent at Javascript it was hard to figure out how to extend them. For Python-based readers, I tried NewsBlur, which was awesome, but also set up for much larger and in-depth usage than I was planning on. For me, the obvious options were asking too much (i.e. run a beefy, expensive server) and too complicated (i.e. learn Javascript).
Since I know some things about feed and web scraping and language processing, I thought it might be fun to set up a small PoC... heheh -- yes, I know I am this XKCD comic (see below) and I literally cannot stop, don't bother sending help.
Built small-and-simple for one-person use
If you don't need to commercialize it, you can personalize it! Added benefit: this means you don't have to reach scale other than 1 user! You are already winning if one person can log in and use it. I ran my content server for the first year on a very small $3/month server. :)
Since I already knew how to write scrapers and make a Django-based website, I did that. There are certainly a million other ways to do this, but I did what worked for me.
Over time, I realized that I might want to filter content that I'm not interested in, especially when I get busy and don't log in for a month or two. When that happened, I wanted to only read the potentially interesting stuff and mark everything else as read.
To start with building a recommender, I exported my data and played around with simple natural language processing to see what models worked for my data. I didn't overcomplicate or overthink it for my use, which is why I used scikit-learn and not some LLM.
You might be different and decide:
a) you want to build your own using a different web framework or open-source reader/recommender b) you want to have a LLM
I say: go for it! It's your project! :)
Note: My current server costs about $6 a month to manage running the feed-reading, parsing and bulk-rating of articles every few hours. If you want to run an LLM it will cost a lot more and require much more memory.
The treasure trove of your own data
One cool thing about running your own content reader/recommender is that you can study your data over time. As a data scientist, I think this is really awesome (yes, I am a nerd).
Once you have enough data to do some basic analysis or whenever you decide to train a model on your data, you can use that analysis or model introspection to investigate more about you. This can be a fun exercise and you can do it on the privacy of your own computer and/or server.5 There is an example notebook in the GitHub to get you started and an accompanying video on YouTube.
Should you want to change/retrain your model or even change what you read or what you mark interesting based on your model introspection, you can guide that yourself on your own terms. Changing the model is something you can do at any time, without anyone making money or poking you to change what you click so they can make money.
Open-sourcing Priveedly
This project was really just for me for a long time, but I thought now is a hard time for many people to control their news and what they want to read, so I decided to clean it up as best I could and open-source it. If you think you can make Priveedly better by helping with the open requests in the ReadMe or via GitHub Issues, I would be very grateful!
I hope you might be inspired to use Priveedly or whatever service/project you decide gives you the right balance of privacy, autonomy and fun.
If you find this project useful and want to support my work, you can subscribe to my newsletter, buy my most recent book, follow me on YouTube or even hire me for corporate trainings, advisory and speaking engagements on topics like Privacy and Security in ML/AI systems.
-
I am odd, just probably like you are odd in some ways. My ideal feed is heavy on tech, computer science, machine learning but also on things like my favorite cooking blogs, artsy blogs and artists/comics I like. ↩
-
I sometimes want to read stuff without teaching the algorithm, just because I am curious what's behind a link. And yes, sometimes it is clickbait and I wished I didn't click, but I try to be kind to myself and tell myself that's okay too. ↩
-
I don't like BigTech or Third-party-ad-platform trying to target me via my clicks or reading interests. It makes me feel uncomfortable about clicking things. This isn't the internet I signed up for... (sad trombone) ↩
-
RIP Twitter. The freely available API for your own feed got turned off a few months after The MuskRat took over. :( The original code that worked is still there (it accesses Lists and pulls from them), but I am highly doubtful that it still works and that the API hasn't dramatically changed. ↩
-
I presented some interesting trends and tokens from my personal recommender model at PyData Paris, including one of the most negative bigrams (2-word-combinations), which was "Elon says". When I first saw this, it made me laugh all day long and was well worth the additional time and effort. ↩