Adversarial Examples Demonstrate Memorization Properties

Posted on Mi 15 Januar 2025 in ml-memorization

In this article, the last in the problem exploration section of the series, you'll explore adversarial machine learning - or how to trick a deep learning system.

Adversarial examples demonstrate a different way to look at deep learning memorization and generalization. They can show us how important the learned decision space …


Continue reading

Differential Privacy as a Counterexample to AI/ML Memorization

Posted on Do 02 Januar 2025 in ml-memorization

At this point in reading the article series on AI/ML memorization you might be wondering, how did the field get so far without addressing the memorization problem? How did seminal papers like Zhang et al's Understanding Deep Learning Requires Rethinking Generalization not fundamentally change machine learning research? And maybe …


Continue reading

How Memorization Happens: Overparametrized Models

Posted on Mi 18 Dezember 2024 in ml-memorization

You've heard claims that we will "run out of data" to train AI systems. Why is that? In this article in the series on machine learning memorization you'll explore model size as a factor in memorization and the trend for bigger models as a general problem in machine learning.

Prefer …


Continue reading

How memorization happens: Novelty

Posted on Mo 09 Dezember 2024 in ml-memorization

So far in this series on memorization in deep learning, you've learned how massively repeated text and images incentivize training data memorization, but that's not the only training data that machine learning models memorize. Let's take a look at another proven memorization: novel examples.

Prefer to learn by video? This …


Continue reading

How memorization happens: Repetition

Posted on Di 03 Dezember 2024 in ml-memorization

In this article in the deep learning memorization series, you'll learn how one part of memorization happens -- highly repeated data from the "head" of the long-tailed distribution.

Prefer to learn by video? This post is summarized on Probably Private's YouTube.

Recall from the data collection article that some examples are …


Continue reading

Gaming Evaluation - The evolution of deep learning training and evaluation

Posted on Di 26 November 2024 in ml-memorization

In this article in the series on machine learning memorization, you'll dive deeper into how typical machine learning training and evaluation happens, a crucial step in ensuring the machine learning model actually "learns" something. Let's review the steps that lead up to training a deep learning model.

Two major steps are shown in rectangular boxes: Data Preparation and Preprocessing and Model Training and Evaluation. Above each of these major steps there are smaller boxes outlining substeps. The data preparation substeps are data collection, data cleaning and data labeling (if needed). The substeps for model training and evaluation are data encoding, model training and model evaluation. High-level steps to …


Continue reading

Exploring new meadows

Posted on Mi 20 November 2024 in misc

Hello!

We may not know each other, but here you are on my website -- perhaps because you saw a post or someone shared a link. I'm resourceful, determined, intelligent and looking for new challenges. Welcome!

Wenn Deutsch einfacher ist, schreiben Sie mir bitte per Email (katharine at kjamistan punkt com …


Continue reading

Private and Personalized AI

Posted on Di 19 November 2024 in personal-ai

I recently had the wonderful experience of keynoting PyData Paris, thanks again for the invite! When deciding on a topic, I was considering my recent research about how AI/ML systems memorize data. As I've mentioned in a few talks, if we indeed embraced the fact that machine learning systems …


Continue reading

Encodings and embeddings: How does data get into machine learning systems?

Posted on Mo 18 November 2024 in ml-memorization

In this series, you've learned a bit about how data is collected for machine learning, but what happens next? You need to turn the collected data -- images, text, video, audio or even just a spreadsheet -- into numbers that can be learned by a model. How does this happen?

TLDR (too …

Continue reading

Machine Learning dataset distributions, history, and biases

Posted on Mi 13 November 2024 in ml-memorization

You probably are already aware that many machine learning datasets come from scraped internet data. Maybe you received the infamous GPT response: "Please note that my knowledge is limited to information available up until September 2021." You might have also read fear-mongering opinions and articles that companies will "run out …


Continue reading