Researchers at the University of Illinois Urbana-Champaign and the University of Virginia have developed a new model architecture that could lead to more robust AI systems with more powerful reasoning capabilities. Called an energy-based transformer (EBT), the architecture shows a natural ability to use inference-time scaling to solve complex problems. For the enterprise, this could translate into cost-effective AI applications that can generalize to novel situations without the need for specialized fine-tuned models. EBTs are trained to first verify the compatibility between a context and a prediction, then refine predictions until they find the lowest-energy (most compatible) output. This process effectively simulates a thinking process for every prediction. The researchers developed two EBT variants: A decoder-only model inspired by the GPT architecture, and a bidirectional model similar to BERT. The architecture of EBTs make them flexible and compatible with various inference-time scaling techniques. Crucially, the study found that EBTs generalize better than the other architectures. Even with the same or worse pretraining performance, EBTs outperformed existing models on downstream tasks. The benefits of EBTs are important for two reasons. First, they suggest that at the massive scale of today’s foundation models, EBTs could significantly outperform the classic transformer architecture used in LLMs. Second, EBTs show much better data efficiency. This is a critical advantage in an era where high-quality training data is becoming a major bottleneck for scaling AI.