• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

New “selective generation” framework method uses a smaller, separate “intervention model” to decide whether the main LLM should generate an answer or abstain

May 28, 2025 //  by Finnovate

A new study from Google researchers introduces “sufficient context,” a novel perspective for understanding and improving retrieval augmented generation (RAG) systems in large language models (LLMs). This approach makes it possible to determine if an LLM has enough information to answer a query accurately, a critical factor for developers building real-world enterprise applications where reliability and factual correctness are paramount. They found that Google’s Gemini 1.5 Pro model, with a single example (1-shot), performed best in classifying context sufficiency, achieving high F1 scores and accuracy. The paper notes, “In real-world scenarios, we cannot expect candidate answers when evaluating model performance. Hence, it is desirable to use a method that works using only the query and context.” Interestingly, while RAG generally improves overall performance, additional context can also reduce a model’s ability to abstain from answering when it doesn’t have sufficient information. Given the finding that models may hallucinate rather than abstain, especially with RAG compared to no RAG setting, the researchers explored techniques to mitigate this. They developed a new “selective generation” framework. This method uses a smaller, separate “intervention model” to decide whether the main LLM should generate an answer or abstain, offering a controllable trade-off between accuracy and coverage. This framework can be combined with any LLM, including proprietary models like Gemini and GPT. The study found that using sufficient context as an additional signal in this framework leads to significantly higher accuracy for answered queries across various models and datasets. This method improved the fraction of correct answers among model responses by 2–10% for Gemini, GPT, and Gemma models. For enterprise teams looking to apply these insights to their own RAG systems, such as those powering internal knowledge bases or customer support AI, Cyrus Rashtchian, co-author of the study outlines a practical approach. He suggests first collecting a dataset of query-context pairs that represent the kind of examples the model will see in production. Next, use an LLM-based autorater to label each example as having sufficient or insufficient context.  “This already will give a good estimate of the % of sufficient context,” Rashtchian said. “If it is less than 80-90%, then there is likely a lot of room to improve on the retrieval or knowledge base side of things — this is a good observable symptom.”

Read Article

Category: Additional Reading

Previous Post: « Nonbanks are increasingly applying for bank charters to gain inroads into processing transactions and connecting directly to payment infrastructure
Next Post: Vercel’s AI model enables creating independent, scalable web-based checkout systems on developers’ own domains by offering customisable template with frontend components through API »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.