Charles Srisuwananukorn Discusses Scaling Artificial Intelligence Infrastructure at Together AI

Charles Srisuwananukorn, VP of Engineering at Together AI, reveals the complexity and demands of building and operating physical infrastructure for cutting-edge Artificial Intelligence development.

Charles Srisuwananukorn, Founding Vice President of Engineering at Together AI, shared insights during a Chat8VC fireside chat about navigating the demands of scaling physical infrastructure for advanced Artificial Intelligence applications. Detailing his career journey—from impactful work at Snorkel AI and Apple to building core infrastructure at Together AI—he emphasized the unique challenges of managing large, physical GPU clusters, in contrast to virtualized environments. This hands-on approach has become integral to Together AI’s growth and mission to provide robust compute resources and infrastructure for foundational model development.

Srisuwananukorn discussed the major gaps in the open-source ecosystem, particularly the scarcity of clean, high-quality datasets, which led Together AI to launch the RedPajama initiative. He also highlighted the need for improved reinforcement learning tools as models become more sophisticated. Together AI’s clusters, equipped with the latest GPUs like H100s and H200s, are used for both internal research and external client workloads, offering customized orchestration and optimized system performance via proprietary software like the Together Kernel Collection. This focus on deep technical optimization—spanning networking, kernel design, and systems reliability—enables clients to achieve faster, more efficient model training, often delivering a notable performance boost out of the box.

As the company scales to tens of thousands of GPUs, Srisuwananukorn described tackling unexpected low-level operational challenges such as hardware reliability, overheating, and maintaining consistent performance. Automation is key, yet physical interventions—like resolving hardware failures—remain necessary. On infrastructure flexibility, he addressed the evolving demand for both giant and smaller, faster models, noting Together AI´s investments in edge computing to reduce latency for real-world Artificial Intelligence applications. Despite the operational pressures, Srisuwananukorn expressed optimism about recent breakthroughs in model accessibility, which allow increasingly sophisticated models to run on consumer hardware, forecasting a wave of innovation in the Artificial Intelligence ecosystem.

76

Impact Score

Samsung shows 96% power reduction in NAND flash

Samsung researchers report a design that combines ferroelectric materials with oxide semiconductors to cut NAND flash string-level power by up to 96%. The team says the approach supports high density, including up to 5 bits per cell, and could lower power for data centers and mobile and edge-Artificial Intelligence devices.

the download: fossil fuels and new endometriosis tests

This edition of The Download highlights how this year’s UN climate talks again omitted the phrase “fossil fuels” and why new noninvasive tests could shorten the nearly 10 years it now takes to diagnose endometriosis.

SAP unveils EU Artificial Intelligence Cloud: a unified vision for Europe’s sovereign Artificial Intelligence and cloud future

SAP launched EU Artificial Intelligence Cloud as a sovereign offering that brings together its milestones into a full-stack cloud and Artificial Intelligence framework. The offering supports EU data residency and gives customers flexible sovereignty and deployment choices across SAP data centers, trusted European infrastructure or fully managed on-site solutions.

HPC won’t be an x86 monoculture forever

x86 dominance in high-performance computing is receding – its share of the TOP500 has fallen from almost nine in ten machines a decade ago to 57 percent today. The rise of GPUs, Arm and RISC-V and the demands of Artificial Intelligence and hyperscale workloads are reshaping processor choices.

A trillion dollars is a terrible thing to waste

Gary Marcus argues that the machine learning mainstream’s prolonged focus on scaling large language models may have cost roughly a trillion dollars and produced diminishing returns. He urges a pivot toward new ideas such as neurosymbolic techniques and built-in inductive constraints to address persistent problems.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.