Microsoft expanded its Phi line of open-source language models with two new algorithms optimized for multimodal processing and hardware efficiency. The first addition is the text-only Phi-4-mini, which features 3.8 billion parameters. That makes it compact enough to run on mobile devices. It’s based on the ubiquitous transformer neural network architecture that underpins most LLMs. Phi-4-mini also uses a second performance optimization technique called grouped query attention, or GQA. It reduces the hardware usage of the algorithm’s attention mechanism. A language model’s attention mechanism helps it determine which data points are most relevant to a given processing task. Phi-4-mini can generate text, translate existing documents and take actions in external applications. Phi-4-multimodal, is an upgraded version of Phi-4-mini with 5.6 billion parameters that can also process visual and audio input. Microsoft trained the model using a new technique it dubs Mixture of LoRAs. Phi-4-multimodal model outperformed Gemini-2.0 Flash “by a large margin.” Phi-4-multimodal also bested InternOmni, an open-source LLM that is built specifically to process multimodal data and has a higher parameter count. Microsoft says that both models significantly outperform comparably sized alternatives at certain tasks.