Google unveils memory-saving Artificial Intelligence compression

Google says its TurboQuant system can cut the working memory used by chatbots during conversations by up to six times without reducing performance. The technique targets a major bottleneck in inference by compressing the key value cache in real time.