Google AI compression technology saves data center energy

We have seen the future of AI via Large Language Models. And it's smaller than you think.

That much was clear in 2025, when we first saw China's DeepSeek — a slimmer, lighter LLM that required way less data center energy to do its job and performed surprisingly well on benchmark tests against heftier American AI models. (Ironically, it was built atop an open source U.S. model, Meta's Llama).

DeepSeek may have foundered on privacy concerns, but the trend towards smaller and smarter AI isn't going away. The evolution is on display again in TurboQuant, a compression algorithm that Google quietly unveiled this week via a Google Research paper.

The paper itself is pretty impenetrable if you're not an AI nerd who talks tokens and high-dimensional vectors. We'll get into a more detailed explanation below. But here's the TL;DR: The TurboQuant algorithm can make LLMs' memory usage six times smaller.

What does that mean? Less energy usage, perhaps to the point where running a powerful AI model on your powerful smartphone becomes possible. Less RAM usage, right on time for the ongoing RAM shortage.

Certainly, algorithms like this can help LLMs make more efficient use of the data centers they're hosted in — either by using the extra space to run more complex models, or, hear me out, by allowing us not to rush into building so many unpopular new data centers in the first place.

And that, paradoxically, could be a problem for the AI economy, at least as it's currently structured.

Mashable Light Speed

Why smaller and smarter will mess up NVIDIA

For the past three years, tech stocks have been riding ever higher on the back of one company alone: NVIDIA. And NVIDIA has been riding ever higher on the assumption that we're in the middle of what CEO Jensen Huang called this month "the largest infrastructure buildout in history" — an explosion of data centers, for which NVIDIA will be the chief provider of chips.

But that infrastructure build-out, if you look at data centers actually built versus data centers promised, is already stumbling, as a fresh New York Times investigation just made clear. What's the holdup? Not just opposition from concerned citizens across the U.S., now including the NAACP. It's also permits, applications, inspections, and the other unsexy but often necessary parts of the local government machinery.

Not least of the problems: A dearth of power generation and transmission, which doesn't sit well with the AI industry's unquantifiable ability to soak up electricity and suck up water.

What happens when the desire for more AI runs into a lack of infrastructure? Well, then necessity becomes the mother of invention. We learn to do more with less. And that's exactly what TurboQuant does.

Middle-out compression

Here's that explanation — although since TurboQuant is a compression algorithm, you'd be forgiven for imagining Google had the same NSFW "middle out" compression algorithm inspiration that drove the plot of the HBO comedy Silicon Valley.

So there's a couple of energy "bottlenecks" when AI models reach for something they really want and frequently use. One is called the key-value cache, which is like a really hot library that stores the most-used information. The other is the vector search, which matches things that look the same. TurboQuant effectively lubricates both at once, making memory grabs faster, smoother, and less fraught.

TurboQuant "helps unclog key-value cache bottlenecks by reducing the size of key-value pairs," Google's paper says, in part by the "clever" move of "randomly rotating the data vectors."

Got that? No? Well, it doesn't really matter. All you need to know is that there's a promising new field of extremely complex computational mathematics, and it works the way compression algorithms have long worked — making new technology faster, lighter, easier to run.

First, it was ZIP file downloads, then the video compression that enabled the streaming revolution, and now it's AI. The result could allow a more powerful LLM to run entirely on your phone, or it could crash the global economy, or both at the same time. Isn't life in 2026 wild?