Nvidia vera cpus hit by pcie flaw with third party gpus

Nvidia's vera server cpus include a pcie controller bug that can break compatibility with non-nvidia gpus and accelerators, particularly under dma heavy workloads. The flaw is tied to how the chips generate memory addresses during specific mmio write operations on arm based systems.

Nvidia has introduced its vera cpus as standalone system on chips aimed at third party customers, positioning them as competitors to intel’s xeon and amd’s epyc server processors. Although these processors are expected to function as general purpose cpus that can interface with a wide range of accelerators, the vera generation appears to be tuned specifically for nvidia gpus. The system on chip contains a hardware bug that triggers errors when it is paired with non nvidia graphics cards or accelerators, raising concerns about reliable operation and system installation in heterogeneous environments.

The source of the problem lies in how the vera pcie controllers generate memory addresses. Under certain conditions, they produce invalid addresses that disrupt reliable communication with third party accelerators. This issue occurs during pcie memory mapped i o write operations when the cpu attempts to write with partial byte enable to mmio regions. The problem becomes more severe when these regions are mapped using arm’s normal non cacheable memory attribute ‘mt_normal_nc,’ which creates significant compatibility challenges for devices that rely on precise memory ordering and addressing.

Because arm uses more relaxed memory ordering for normal non cacheable attributes, this behavior can trigger the erratum in vera, which leads to erroneous address generation, data corruption, and potential pcie device failure. These failures are particularly likely during dma intensive workloads such as artificial intelligence training or large scale hpc simulations, where accelerators perform frequent and heavy data transfers. Nvidia gpus are co designed with vera cpus and their specific memory ordering model, so no such issues have been reported when vera is used in combination with nvidia’s own accelerators, highlighting a sharp contrast with the experience on amd gpus and other third party hardware.

58

Impact Score

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.