Quantifying ML Memorization via Membership Inference

14 January 2026 at 2PM
Presented by Tao Jiashu (National University of Singapore)


Abstract

Over the past decade, machine learning (ML) models have evolved from academic curiosity into everyday productivity. However, their strong performances come at a cost: memorization of training data is not a bug, but a necessity for all ML models. Hence, it is important to know the extent of memorization of the model before its release or deployment. To quantify memorization in ML models, the de facto standard is membership inference. In this talk, I will introduce the formal definition of membership inference, and share how prior and current state-of-the-art membership inference algorithms were conceived and formulated. Finally, I will highlight why a direct adaptation of existing memorization quantification frameworks from MLPs and CNNs to large language models (LLMs) can be suboptimal, before presenting our newly proposed token-level framework designed to study fine-grained memorization in LLMs.


See video on YouTube Zoom link