Used Optane memory runs trillion-parameter model on one GPU

A workstation built with second-hand Intel Optane persistent memory modules was used to run Kimi K2.5 locally with a single GPU. The setup highlights renewed interest in a memory tier between DRAM and SSDs for large language model inference.