Hacker News discussion on IBM Granite 4.0 hybrid models, tooling support and early benchmarks

March 29, 2026

Developers on Hacker News dissect IBM’s Granite 4.0 large language models, focusing on the new hybrid Mamba and Transformer architecture, local-run options, and mixed early performance signals. The thread highlights rapid community tooling support alongside questions about real-world benchmarks and governance.

A Hacker News thread centers on IBM’s Granite 4.0 large language models and their hybrid Mamba and Transformer design, with commenters trading hands-on notes, links to official resources, and early impressions. Several participants pointed readers to IBM’s own announcement and a company explainer on the Mamba architecture, while noting that community write-ups can be more informative than high-level coverage. One commenter also highlighted IBM’s reference to ISO/IEC 42001 certification for Artificial Intelligence management systems and asked what concrete practices that implies in product design and deployment.

Tooling and local inference support featured prominently. Contributors reported that support for Granite 4’s hybrid architecture landed in llama.cpp earlier this year, and that Ollama’s engine uses GGML directly while falling back to llama.cpp for models it does not yet support. Unsloth released dynamic GGUF conversions for a 32 billion parameter mixture-of-experts variant and shared a support agent fine-tuning notebook. Users tested local setups across LM Studio, Vulkan and ROCm back ends, and different quantizations, with one noting a switch to ROCm resolved a GPU loading issue. Another user tried an Ollama package that ran quickly at roughly 1.9 GB download size, though without Mamba components and with default context limits lower than the claimed maximum.

Early performance anecdotes were mixed. A practitioner reported the small 32 billion parameter mixture-of-experts quantized build at around 19 GB, roughly 20 GB at 100,000 token context, about 26 GB of VRAM in one runtime and 22 GB in another, and around 30 tokens per second. The same commenter judged coding ability to be underwhelming in initial tests, later citing third-party dashboards showing approximately 25.1 percent on livecodebench, 2 percent on a terminal benchmark, and 16 percent on a coding index for one Granite 4.0 variant. Elsewhere in the thread, developers asked for head-to-head comparisons with leading closed models and noted that third-party benchmarks to date appear less favorable than vendor materials.

The conversation also placed Granite 4.0 in a broader context of long-context and hybrid architectures. Commenters referenced other systems, including a model with a 256,000 token context and a newly released model that slows markedly beyond 40,000 tokens. Some expressed caution rooted in past experiences with IBM’s Artificial Intelligence offerings and skepticism about marketing claims, while others praised the pace of open tooling and the ability to run models locally. Overall, the thread captures an active, early-stage evaluation: strong momentum in ecosystem support, clear enterprise and governance positioning, and a wait-and-see posture on independent benchmarks and real-world task performance.

Source

55

Impact Score

Latest News

SLAS2026 showcases Artificial Intelligence, automation and high-throughput innovation

March 30, 2026

SLAS2026 highlighted a wave of high-throughput, automation and Artificial Intelligence driven tools aimed at accelerating drug discovery and lab workflows. Major vendors unveiled new platforms spanning surface plasmon resonance, organ-on-chip, sample processing, target engagement and lab orchestration.

Best artificial intelligence video tools for fast content creation in 2026

March 30, 2026

A new wave of artificial intelligence video tools is turning scripts, articles, and raw clips into polished videos in minutes, with options ranging from text-to-video generators to smart collaborative editors. Different platforms target use cases such as social clips, animated explainers, and repurposed written content.

Fractile to invest £100m in Bristol and London artificial intelligence chip expansion

March 30, 2026

Chip start-up Fractile plans a £100m investment in Bristol and London over the next three years, reinforcing the United Kingdom’s growing artificial intelligence hardware ecosystem and Bristol’s status as an innovation hub.

Financial advisors balance artificial intelligence tools with demand for human guidance

March 30, 2026

Financial advisors are increasingly exploring generative artificial intelligence for efficiency gains, even as clients signal a need for deeper human context and emotional support. Research from Morningstar highlights both the promise and limits of automation in advice delivery.

179th cyber protection team uses artificial intelligence in defensive training

March 30, 2026

The 179th cyber protection team is integrating artificial intelligence into defensive cyber operations training to protect the digital infrastructure that enables large-scale combat operations. Soldiers are testing tools such as Gemini to speed planning while maintaining strict human oversight and decision authority.

Hacker News discussion on IBM Granite 4.0 hybrid models, tooling support and early benchmarks

55

Impact Score

Latest News

SLAS2026 showcases Artificial Intelligence, automation and high-throughput innovation

Best artificial intelligence video tools for fast content creation in 2026

Fractile to invest £100m in Bristol and London artificial intelligence chip expansion

Financial advisors balance artificial intelligence tools with demand for human guidance

179th cyber protection team uses artificial intelligence in defensive training

Contact Us