MLCommons’s new standard for measuring the performance of LLMs on PCs to support NVIDIA and Apple Mac GPUs and new prompt categories, including structured prompts for code analysis and experimental long-context summarization tests using 4,000- and 8,000-token inputs • DigiBanker

MLCommons, the consortium behind the industry-standard MLPerf benchmarks, released MLPerf Client v1.0, a benchmark that sets a new standard for measuring the performance of LLMs on PCs and other client-class systems. MLPerf Client v1.0 introduces an expanded set of supported models, including Llama 2 7B Chat, Llama 3.1 8B Instruct, and Phi 3.5 Mini Instruct, with Phi 4 Reasoning 14B added as an experimental option to preview next-generation high-reasoning-capable LLMs. These additions reflect real-world use cases across a broader range of model sizes and capabilities. The benchmark expands its evaluation scope with new prompt categories, including structured prompts for code analysis and experimental long-context summarization tests using 4,000- and 8,000-token inputs. Hardware and platform support has also grown significantly. MLPerf Client v1.0 supports AMD and Intel NPUs and GPUs via ONNX Runtime, Ryzen AI SDK, and OpenVINO, with additional support for NVIDIA GPUs and Apple Mac GPUs through llama.cpp. It offers both command-line and graphical user interfaces. The GUI includes real-time compute and memory usage, persistent results history, comparison tables, and CSV exports. The CLI supports automation and scripting for regression testing and large-scale evaluations, making MLPerf Client v1.0 a comprehensive tool for benchmarking LLMs on client systems.

Read Article