• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Study shows GPT-style models have a fixed memorization capacity of approximately 3.6 bits per parameter and training on more data forces models to memorize less per-sample is helping to reduce privacy risk

June 9, 2025 //  by Finnovate

A new study from researchers at Meta, Google DeepMind, Cornell University, and NVIDIA finds that GPT-style models have a fixed memorization capacity of approximately 3.6 bits per parameter. One key takeaway from the research is that models do not memorize more when trained on more data. Instead, a model’s fixed capacity is distributed across the dataset, meaning each individual datapoint receives less attention. Jack Morris, the lead author, explained via the social network X that “training on more data will force models to memorize less per-sample.” These findings may help ease concerns around large models memorizing copyrighted or sensitive content. If memorization is limited and diluted across many examples, the likelihood of reproducing any one specific training example decreases. In essence, more training data leads to safer generalization behavior, not increased risk. To precisely quantify how much language models memorize, the researchers used an unconventional but powerful approach: they trained transformer models on datasets composed of uniformly random bitstrings. Each of these bitstrings was sampled independently, ensuring that no patterns, structure, or redundancy existed across examples. Because each sample is unique and devoid of shared features, any ability the model shows in reconstructing or identifying these strings during evaluation directly reflects how much information it retained—or memorized—during training. This method allows the researchers to map a direct relationship between the number of model parameters and the total information stored. By gradually increasing model size and training each variant to saturation, across hundreds of experiments on models ranging from 500K to 1.5 billion parameters, they observed consistent results: 3.6 bits memorized per parameter, which they report as a fundamental measure of LLM memory capacity. The study also examined how model precision—comparing training in bfloat16 versus float32—affects memorization capacity. They observed a modest increase from 3.51 to 3.83 bits-per-parameter when switching to full 32-bit precision. However, this gain is far less than the doubling of available bits would suggest, implying diminishing returns from higher precision. The paper proposes a scaling law that relates a model’s capacity and dataset size to the effectiveness of membership inference attacks. These attacks attempt to determine whether a particular data point was part of a model’s training set. The research shows that such attacks become unreliable as dataset size grows, supporting the argument that large-scale training helps reduce privacy risk. By introducing a principled and quantifiable definition of memorization, the study gives developers and researchers new tools for evaluating the behavior of language models. This helps not only with model transparency but also with compliance, privacy, and ethical standards in AI development. The findings suggest that more data—and not less—may be the safer path when training large-scale language models.

Read Article

Category: Additional Reading

Previous Post: « Thread AI’s composable AI infrastructure connects AI models, data, and automation into adaptable, end-to-end workflows aligned with enterprise-specific needs to rapidly prototype and deploy event-driven, distributed AI agents
Next Post: Imandra Universe, a new platform enhances AI assistants like ChatGPT, Claude, and Cursor with advanced logical reasoning capabilities to perform complex reasoning tasks more accurately »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.