GPT-5 is giving ChatGPT more capabilities in at least five areas. Before GPT-5, ChatGPT users could choose between GPT models and the “o” reasoning series of models. GPT-5 merges both capabilities. “GPT-5 will automatically decide to use reasoning or not. Switching should be smoother in the next update,” said Elaine Ya Le, a researcher at OpenAI. ChatGPT Plus subscribers can send up to 160 messages using GPT-5 every three hours. That’s twice as many as prior models, according to OpenAI’s community forums. “We are going to double rate limits for Plus users as we finish rollout,” OpenAI CEO Sam Altman confirmed during the Q&A with Redditors. For GPT-5 Thinking, Plus users are capped at 200 messages per week if they manually select this option. When ChatGPT switches to “Thinking” mode by itself, this does not count toward the quota. Eric Mitchell from OpenAI’s research team told Redditors that OpenAI “definitely” intends Plus users to have “unlimited access to reasoning.” Users don’t have to toggle tools on or off with GPT-5. They are automatically enabled depending on what the user needs. Tools include web search, data analysis, image analysis, file analysis, canvas, image generation, memory and custom instructions. There are two ways to use “voice” mode: clicking on the microphone icon in the prompt window and speaking a prompt or query for ChatGPT to process; and activating full “voice” mode and interacting directly with the model. Sulman Choudhry, head of engineering at OpenAI, said voice mode is now better at following instructions with GPT-5. Overall, users should get more advanced capabilities with GPT-5: “GPT-5 is a huge improvement over GPT-4 in a few key areas: It thinks better (reasoning), writes better (creativity), follows instructions more closely and is more aligned to user intent.”
Anthropic Claude Sonnet 4 model can now process up to 1 million tokens of context in a single request — a fivefold increase that allows developers to analyze entire software projects
Claude Sonnet 4 artificial intelligence model can now process up to 1 million tokens of context in a single request — a fivefold increase that allows developers to analyze entire software projects or dozens of research papers without breaking them into smaller chunks. The expansion, available now in public beta through Anthropic’s API and Amazon Bedrock, represents a significant leap in how AI assistants can handle complex, data-intensive tasks. With the new capacity, developers can load codebases containing more than 75,000 lines of code, enabling Claude to understand complete project architecture and suggest improvements across entire systems rather than individual files. The extended context capability addresses a fundamental limitation that has constrained AI-powered software development. Eric Simons, CEO of Bolt.new, which integrates Claude into browser-based development platforms, said: “With the 1M context window, developers can now work on significantly larger projects while maintaining the high accuracy we need for real-world coding.” The expanded context enables three primary use cases that were previously difficult or impossible: comprehensive code analysis across entire repositories, document synthesis involving hundreds of files while maintaining awareness of relationships between them, and context-aware AI agents that can maintain coherence across hundreds of tool calls and complex workflows. The 1 million token context window represents significant technical advancement in AI memory and attention mechanisms. Anthropic’s internal testing revealed perfect recall performance across diverse scenarios, a crucial capability as context windows expand. The company embedded specific information within massive text volumes and tested Claude’s ability to find and use those details when answering questions.
Uno brings vibe coding to enterprise- users code an application once and Uno Platform makes it easy to ship other types of applications from the existing codebase for 5x the productivity
Amid the rise of new AI-powered low-code developer tools aimed at hobbyists and non-technical folks, Uno Platform is doubling down on enterprise developers instead. Uno Platform offers a suite of enterprise-grade tools for developers to build cross-platform .NET applications that can be supported on Android, Apple, Linux, and Windows systems. Users code an application once and Uno Platform makes it easy to ship other types of applications from the existing codebase. “If you’re coding something once and it works on five different platforms on desktop, web, and mobile, you’re getting 5x the productivity already,” Uno Platform co-founder and CEO Francois Tanguay told. “Everything we are shipping hasn’t been done before and we like we have a clear road map in terms of how we can add those extra capabilities [to] make everybody 10x faster,” Tanguay said. “Nobody’s capturing that market yet in the enterprise space.” Uno Platform just raised a C$3.5 million that will help Uno roll out its premium tooling tier, Uno Platform Studio, and new feature, “Hot Design” which allows developers to pause a running application and change its user interface in real time.
Anthropic’s Claude AI model can now handle longer prompts five times Claude’s previous limit (200,000 tokens), and more than double the 400,000 token context window offered by OpenAI’s GPT-5
Anthropic is increasing the amount of information that enterprise customers can send to Claude in a single prompt, part of an effort to attract more developers to the company’s popular AI coding models. For Anthropic’s API customers, the company’s Claude Sonnet 4 AI model now has a 1 million token context window — meaning the AI can handle requests as long as 750,000 words, more than the entire “Lord of the Rings” trilogy, or 75,000 lines of code. That’s roughly five times Claude’s previous limit (200,000 tokens), and more than double the 400,000 token context window offered by OpenAI’s GPT-5. Long context will also be available for Claude Sonnet 4 through Anthropic’s cloud partners, including on Amazon Bedrock and Google Cloud’s Vertex AI. Anthropic’s product lead for the Claude platform, Brad Abrams, expects AI coding platforms to get a “lot of benefit” from this update. When asked if GPT-5 put a dent in Claude’s API usage, Abrams downplayed the concern, saying he’s “really happy with the API business and the way it’s been growing.” Whereas OpenAI generates most of its revenue from consumer subscriptions to ChatGPT, Anthropic’s business centers around selling AI models to enterprises through an API. That’s made AI coding platforms a key customer for Anthropic and could be why the company is throwing in some new perks to attract users in the face of GPT-5. Abrams also told that Claude’s large context window helps it perform better at long agentic coding tasks, in which the AI model is autonomously working on a problem for minutes or hours. With a large context window, Claude can remember all its previous steps in long-horizon tasks. Abrams said that Anthropic’s research team focused on increasing not just the context window for Claude, but also the “effective context window,” suggesting that its AI can understand most of the information it’s given.
LFM2-VL, a new generation of vision-language foundation models can deploy across a wide range of hardware — from smartphones and laptops to wearables and embedded systems promising low-latency performance, strong accuracy, and flexibility for real-world applications
Liquid AI has released LFM2-VL, a new generation of vision-language foundation models designed for efficient deployment across a wide range of hardware — from smartphones and laptops to wearables and embedded systems. The models promise low-latency performance, strong accuracy, and flexibility for real-world applications. According to Liquid AI, the models deliver up to twice the GPU inference speed of comparable vision-language models, while maintaining competitive performance on common benchmarks. The release includes two model sizes: LFM2-VL-450M — a hyper-efficient model with less than half a billion parameters (internal settings) aimed at highly resource-constrained environments. LFM2-VL-1.6B — a more capable model that remains lightweight enough for single-GPU and device-based deployment. Both variants process images at native resolutions up to 512×512 pixels, avoiding distortion or unnecessary upscaling. For larger images, the system applies non-overlapping patching and adds a thumbnail for global context, enabling the model to capture both fine detail and the broader scene. Unlike traditional architectures, Liquid’s approach aims to deliver competitive or superior performance using significantly fewer computational resources, allowing for real-time adaptability during inference while maintaining low memory requirements. This makes LFMs well suited for both large-scale enterprise use cases and resource-limited edge deployments. LFM2-VL uses a modular architecture combining a language model backbone, a SigLIP2 NaFlex vision encoder, and a multimodal projector. The projector includes a two-layer MLP connector with pixel unshuffle, reducing the number of image tokens and improving throughput. Users can adjust parameters such as the maximum number of image tokens or patches, allowing them to balance speed and quality depending on the deployment scenario. The training process involved approximately 100 billion multimodal tokens, sourced from open datasets and in-house synthetic data.
Shortcut an MIT startup’s AI agents automates multi-step Excel work that adapt in real time to complex workflows enabling full financial modeling and analysis from natural language prompts
Fundamental Research Labs, an artificial intelligence startup launched out of MIT, has created Shortcut, a system of AI agents that can do multi-step work on Excel, such as creating discounted cash flow models in finance. Shortcut is accessed through its website, which is designed to look quite similar to Excel with its green lines and tabs. There is a side bar that opens on the right side where users can prompt the AI agents. Users can also open or upload Excel files on Shortcut. The user writes a prompt in natural language and uploads documents for the AI agents to “read.” The group of AI agents behind the tool then gets to work to create business models, financial statements and the like. Unlike macro scripts or cloud-based automation, Shortcut can adapt mid-task if something changes. That flexibility makes it more likely to handle the messy, inconsistent workflows that dominate office life. It also keeps sensitive data on-device, a selling point for regulated industries. Shortcut CEO Nico Christie said “It’s not about replacing Excel — it’s about replacing the need to open Excel in the first place.” Asked how Shortcut is different from Microsoft Copilot that is in Excel, Christie told that Copilot does specific tasks the user tells it to do, like write formulas or create charts. Shortcut does full financial modeling and analysis. Shortcut scores over 80% on cases presented at the Microsoft Excel World Championship, described as a “thrilling” competition among Excel users. Christie said Shortcut finished the cases in about 10 minutes, or 10 times faster than humans.
Refold AI’s multi-layered agentic integration platform allow companies to automate ERP, CRM, and SaaS API connectivity using adaptive, intelligent agents
Refold AI, an enterprise startup that provides artificial intelligence to automate API integrations, has launched with $6.5 million in seed funding. Refold developed AI software that supports everyone, from advanced engineers to end users. At its core, Refold offers Workflow Code Agents, which provide engineering teams the ability to generate, test and maintain integration logic without the need to use templates. In addition there are MCP Chains, which enable the use of natural language to describe outcomes to agents and have them generate operating workflows automatically. For software-as-a-service teams, Refold has launched an Embedded Integrations Platform, providing a plug-and-play toolkit with prebuilt user interface components. Ideally, by using AI agents instead of pre-written middleware or humans that have to react to tickets when things become a problem, software can inform users when something has gone wrong and help fix it automatically. This includes detecting when fields change and need to match up again or a software version modifies how an API works. An AI agent can be tasked with detecting failures, rerouting tasks and repairing integrations without user intervention. And then, inform the development team so that they can review the change or modify it. Refold works with more than 1,000 pre-built connectors, including 1Password, Hubspot, Salesforce, Gmail, OpenAI and Slack.
Google’s tiny AI model brings advanced, quantization-ready AI that fits on smartphones—empowering efficient, on-device reasoning and quick adaptation to enable private, offline AI for specialized and enterprise tasks
Google’s DeepMind AI research team has unveiled a new open source AI model, Gemma 3 270M — far smaller than the 70 billion or more parameters of many frontier LLMs (parameters being the number of internal settings governing the model’s behavior). While more parameters generally translates to a larger and more powerful model, Google’s focus with this is nearly the opposite: high-efficiency, giving developers a model small enough to run directly on smartphones and locally, without an internet connection, as shown in internal tests on a Pixel 9 Pro SoC. Yet, the model is still capable of handling complex, domain-specific tasks and can be quickly fine-tuned in mere minutes to fit an enterprise or indie developer’s needs. Google DeepMind Staff AI Developer Relations Engineer Omar Sanseviero added that it Gemma 3 270M can also run directly in a user’s web browser, on a Raspberry Pi, and “in your toaster,” underscoring its ability to operate on very lightweight hardware. Gemma 3 270M combines 170 million embedding parameters — thanks to a large 256k vocabulary capable of handling rare and specific tokens — with 100 million transformer block parameters. According to Google, the architecture supports strong performance on instruction-following tasks right out of the box while staying small enough for rapid fine-tuning and deployment on devices with limited resources, including mobile hardware. One of the model’s defining strengths is its energy efficiency. In internal tests using the INT4-quantized model on a Pixel 9 Pro SoC, 25 conversations consumed just 0.75% of the device’s battery. This makes Gemma 3 270M a practical choice for on-device AI, particularly in cases where privacy and offline functionality are important. The release includes both a pretrained and an instruction-tuned model, giving developers immediate utility for general instruction-following tasks. Quantization-Aware Trained (QAT) checkpoints are also available, enabling INT4 precision with minimal performance loss and making the model production-ready for resource-constrained environments. Google frames Gemma 3 270M as part of a broader philosophy of choosing the right tool for the job rather than relying on raw model size. For functions like sentiment analysis, entity extraction, query routing, structured text generation, compliance checks, and creative writing, the company says a fine-tuned small model can deliver faster, more cost-effective results than a large general-purpose one. By fine-tuning a Gemma 3 4B model for multilingual content moderation, the team outperformed much larger proprietary systems. Gemma 3 270M is designed to enable similar success at an even smaller scale, supporting fleets of specialized models tailored to individual tasks.
Google’s tiny AI model brings advanced, quantization-ready AI that fits on smartphones—empowering efficient, on-device reasoning and quick adaptation to enable private, offline AI for specialized and enterprise tasks
Google’s DeepMind AI research team has unveiled a new open source AI model, Gemma 3 270M — far smaller than the 70 billion or more parameters of many frontier LLMs (parameters being the number of internal settings governing the model’s behavior). While more parameters generally translates to a larger and more powerful model, Google’s focus with this is nearly the opposite: high-efficiency, giving developers a model small enough to run directly on smartphones and locally, without an internet connection, as shown in internal tests on a Pixel 9 Pro SoC. Yet, the model is still capable of handling complex, domain-specific tasks and can be quickly fine-tuned in mere minutes to fit an enterprise or indie developer’s needs. Google DeepMind Staff AI Developer Relations Engineer Omar Sanseviero added that it Gemma 3 270M can also run directly in a user’s web browser, on a Raspberry Pi, and “in your toaster,” underscoring its ability to operate on very lightweight hardware. Gemma 3 270M combines 170 million embedding parameters — thanks to a large 256k vocabulary capable of handling rare and specific tokens — with 100 million transformer block parameters. According to Google, the architecture supports strong performance on instruction-following tasks right out of the box while staying small enough for rapid fine-tuning and deployment on devices with limited resources, including mobile hardware. One of the model’s defining strengths is its energy efficiency. In internal tests using the INT4-quantized model on a Pixel 9 Pro SoC, 25 conversations consumed just 0.75% of the device’s battery. This makes Gemma 3 270M a practical choice for on-device AI, particularly in cases where privacy and offline functionality are important. The release includes both a pretrained and an instruction-tuned model, giving developers immediate utility for general instruction-following tasks. Quantization-Aware Trained (QAT) checkpoints are also available, enabling INT4 precision with minimal performance loss and making the model production-ready for resource-constrained environments. Google frames Gemma 3 270M as part of a broader philosophy of choosing the right tool for the job rather than relying on raw model size. For functions like sentiment analysis, entity extraction, query routing, structured text generation, compliance checks, and creative writing, the company says a fine-tuned small model can deliver faster, more cost-effective results than a large general-purpose one. By fine-tuning a Gemma 3 4B model for multilingual content moderation, the team outperformed much larger proprietary systems. Gemma 3 270M is designed to enable similar success at an even smaller scale, supporting fleets of specialized models tailored to individual tasks.
Multiverse’s Model Zoo offers compact high-performing AI for device commands and for local reasoning—bringing powerful intelligence to home appliances, smartphones, and PCs via quantum compression
AI startup Multiverse Computing has released two AI models that are the world’s smallest models that are still high-performing and can handle chat, speech, and even reasoning in one case. These new tiny models are intended to be embedded into Internet of Things devices, as well as run locally on smartphones, tablets, and PCs. “We can compress the model so much that they can fit on devices,” founder Román Orús told. “You can run them on premises, directly on your iPhone, or on your Apple Watch.” Its two new models are so small that they can bring chat AI capabilities to just about any IoT device and work without an internet connection. It humorously calls this family the Model Zoo because it’s naming the products based on animal brain sizes. A model it calls SuperFly is a compressed version of Hugging Face’s open source model SmolLM2-135. The original has 135 million parameters and was developed for on-device uses. SuperFly is 94 million parameters, which Orús likens to the size of a fly’s brain. “This is like having a fly, but a little bit more clever,” he said. SuperFly is designed to be trained on very restricted data, like a device’s operations. Multiverse envisions it embedded into home appliances, allowing users to operate them with voice commands like “start quick wash” for a washing machine. Or users can ask troubleshooting questions. With a little processing power (like an Arduino), the model can handle a voice interface. The other model is named ChickBrain, and is larger at 3.2 billion parameters, but is also far more capable and has reasoning capabilities. It’s a compressed version of Meta’s Llama 3.1 8B model, Multiverse says. Yet it’s small enough to run on a MacBook, no internet connection required. More importantly, Orús said that ChickBrain actually slightly outperforms the original in several standard benchmarks, including the language-skill benchmark MMLU-Pro, math skills benchmarks Math 500 and GSM8K, and the general knowledge benchmark GPQA Diamond.
