Study finds running gen AI models on the phones instead of in the cloud consumed anywhere from 75% to 95% less power, with associated sharp decreases in water consumption and overall carbon footprint • DigiBanker

One of the easiest ways to minimize AI’s environmental impact may be to move where the processing is done, per new academic research conducted in partnership with Qualcomm. Running AI on devices instead of in the cloud slashes power consumption of queries by about 90%, the study finds. The industry has long touted the benefits of running models locally on devices instead of in the cloud — not just in energy terms, but also potentially making them cheaper and more private. Researchers at the University of California, Riverside ran a series of experiments comparing the performance of various generative AI models, both in the cloud and on phones powered with Qualcomm chips. Running any of six different models on the phones consumed anywhere from 75% to 95% less power, with associated sharp decreases in water consumption and overall carbon footprint. Qualcomm is also developing an AI simulator and calculator that illustrates, for any given query and user location, what the responses would look like on-device versus the cloud, and how much less power and water they would use. One example — running a coding skills question on the Llama-2-7B model in California — was 94% more power efficient and 96% more water efficient on-device. For all six models in the study, the inference time on the phones, measured in seconds, was higher than in the cloud. Narrowing or eliminating that gap, particularly on the most powerful and popular models, will be crucial to accelerating on-device adoption. For many AI users, the data center in your pocket might be all you need.

Read Article