Hacker News discussion on IBM Granite 4.0 hybrid models, tooling support and early benchmarks

March 29, 2026

Developers on Hacker News dissect IBM’s Granite 4.0 large language models, focusing on the new hybrid Mamba and Transformer architecture, local-run options, and mixed early performance signals. The thread highlights rapid community tooling support alongside questions about real-world benchmarks and governance.

A Hacker News thread centers on IBM’s Granite 4.0 large language models and their hybrid Mamba and Transformer design, with commenters trading hands-on notes, links to official resources, and early impressions. Several participants pointed readers to IBM’s own announcement and a company explainer on the Mamba architecture, while noting that community write-ups can be more informative than high-level coverage. One commenter also highlighted IBM’s reference to ISO/IEC 42001 certification for Artificial Intelligence management systems and asked what concrete practices that implies in product design and deployment.

Tooling and local inference support featured prominently. Contributors reported that support for Granite 4’s hybrid architecture landed in llama.cpp earlier this year, and that Ollama’s engine uses GGML directly while falling back to llama.cpp for models it does not yet support. Unsloth released dynamic GGUF conversions for a 32 billion parameter mixture-of-experts variant and shared a support agent fine-tuning notebook. Users tested local setups across LM Studio, Vulkan and ROCm back ends, and different quantizations, with one noting a switch to ROCm resolved a GPU loading issue. Another user tried an Ollama package that ran quickly at roughly 1.9 GB download size, though without Mamba components and with default context limits lower than the claimed maximum.

Early performance anecdotes were mixed. A practitioner reported the small 32 billion parameter mixture-of-experts quantized build at around 19 GB, roughly 20 GB at 100,000 token context, about 26 GB of VRAM in one runtime and 22 GB in another, and around 30 tokens per second. The same commenter judged coding ability to be underwhelming in initial tests, later citing third-party dashboards showing approximately 25.1 percent on livecodebench, 2 percent on a terminal benchmark, and 16 percent on a coding index for one Granite 4.0 variant. Elsewhere in the thread, developers asked for head-to-head comparisons with leading closed models and noted that third-party benchmarks to date appear less favorable than vendor materials.

The conversation also placed Granite 4.0 in a broader context of long-context and hybrid architectures. Commenters referenced other systems, including a model with a 256,000 token context and a newly released model that slows markedly beyond 40,000 tokens. Some expressed caution rooted in past experiences with IBM’s Artificial Intelligence offerings and skepticism about marketing claims, while others praised the pace of open tooling and the ability to run models locally. Overall, the thread captures an active, early-stage evaluation: strong momentum in ecosystem support, clear enterprise and governance positioning, and a wait-and-see posture on independent benchmarks and real-world task performance.

Source

55

Impact Score

Latest News

Apple plans Intel 18A-P for M7 and 14A for A21

May 14, 2026

Apple is expected to use Intel’s 18A-P process for M7 chips in MacBook models and Intel’s 14A process for A21 chips in iPhones. The shift points to a broader supplier strategy as Apple moves beyond TSMC for parts of its future silicon roadmap.

Jensen Huang sees skilled trades benefiting from the Artificial Intelligence build-out

May 14, 2026

Nvidia chief executive Jensen Huang said the generative Artificial Intelligence boom will create strong demand far beyond software and chip companies. He highlighted skilled trades as key beneficiaries of the infrastructure surge needed to support expanding compute capacity.

Google and other chatbots surface real phone numbers

May 14, 2026

Generative Artificial Intelligence chatbots are surfacing real phone numbers and other personal details, sometimes by pulling from obscure public sources and sometimes by inventing plausible but wrong contact information. Privacy experts say users have few reliable ways to find out whether their data is in model training sets or to force its removal.

U.S. and China revisit Artificial Intelligence emergency talks

May 14, 2026

Washington and Beijing are exploring renewed talks on an emergency communication channel for Artificial Intelligence as fears grow over the capabilities of Anthropic’s Mythos model. The shift reflects rising concern in both capitals that competitive pressure is outpacing safeguards.

Artificial Intelligence divides employers as hiring and headcount shift

May 14, 2026

U.S. hiring beat expectations in April, but employers remain split on whether Artificial Intelligence should drive layoffs, productivity gains, or internal redeployment. At the same time, candidate use of Artificial Intelligence is outpacing employer adoption in hiring, adding new pressure to screening and entry-level recruiting.

Hacker News discussion on IBM Granite 4.0 hybrid models, tooling support and early benchmarks

55

Impact Score

Latest News

Apple plans Intel 18A-P for M7 and 14A for A21

Jensen Huang sees skilled trades benefiting from the Artificial Intelligence build-out

Google and other chatbots surface real phone numbers

U.S. and China revisit Artificial Intelligence emergency talks

Artificial Intelligence divides employers as hiring and headcount shift

Contact Us