PrismML has emerged from stealth with a $16.25 million seed round and an open source release of what it describes as a “1-bit” large language model family. Founded by Caltech researchers, the company is targeting one of the central pressures in Artificial Intelligence infrastructure: rising memory constraints and energy costs. Its pitch is to compress the model itself rather than only optimizing surrounding inference components.
The Bonsai model family’s flagship model is Bonsai 8B, an 8-billion-parameter model trained on Google v4 TPUs. According to PrismML, the model achieves competitive performance on benchmark suites including MMLU Redux, MuSR, GSM8K, HumanEval+, IFEval, and BFClv3, but with a memory footprint of roughly 1GB, compared to about 16GB for a typical 16-bit equivalent. PrismML is also releasing 1-bit Bonsai 4B and 1.7B models, with 0.5GB and 0.24GB memory footprint, respectively. The company says the models are fully binarized end to end, with all weights constrained to a single bit across embeddings, attention layers, and MLP blocks, without higher-precision exceptions.
PrismML attributes the results to a new mathematical framework developed at Caltech, although it has not disclosed the training methods or stabilization techniques behind the approach. CEO Babak Hassibi described the work as a new paradigm for Artificial Intelligence designed to adapt across diverse hardware environments. The company claims its 1-bit models can deliver up to eight times faster processing and reduce energy consumption by as much as 75 to 80% on existing hardware. PrismML also argues that future hardware optimized for 1-bit operations could improve efficiency further by replacing more complex multiplications with simpler arithmetic.
The company and its investors frame the technology as a way to move advanced Artificial Intelligence beyond centralized data centers and onto consumer and edge devices. PrismML says the models are designed to run on smartphones, wearables, and robotics, potentially enabling more capable local deployments without depending on cloud infrastructure. In a blog post, the company also introduced “intelligence density,” a metric intended to measure how much capability a model delivers per unit of size.
Key questions remain unresolved. PrismML’s claim that a fully 1-bit model can match higher-precision systems has not been validated beyond the company’s own benchmark results, and extreme quantization has historically struggled with complex reasoning tasks. Independent third-party testing and real-world deployments will determine whether PrismML’s approach is a genuine breakthrough or a narrower efficiency optimization. Even so, the launch underscores the industry’s broader shift toward efficiency-focused Artificial Intelligence design as model scaling becomes more expensive.
