DeepInfra Closes $107M Series B to Power Production-Scale AI Inference

DeepInfra Funding fuels global expansion of DeepInfra’s purpose-built inference cloud as AI demand shifts from model training to production scale PALO ALTO, Calif., May 04, 2026 (GLOBE NEWSWIRE) — DeepInfra, a purpose-built cloud platform for high-throughput AI inference, today announced $107 million in Series B funding to scale its inference cloud and global capacity. Processing…


DeepInfra Closes 7M Series B to Power Production-Scale AI Inference
DeepInfra
DeepInfra

Funding fuels global expansion of DeepInfra’s purpose-built inference cloud as AI demand shifts from model training to production scale

PALO ALTO, Calif., May 04, 2026 (GLOBE NEWSWIRE) — DeepInfra, a purpose-built cloud platform for high-throughput AI inference, today announced $107 million in Series B funding to scale its inference cloud and global capacity. Processing nearly five trillion tokens per week, DeepInfra enables enterprises and scaleups to run open-source and agent-driven AI workloads with improved cost, performance and security.

Developed by the team behind the popular messenger app, imo, which has scaled across more than 200 million users globally, DeepInfra’s latest round is co-led by 500 Global and Georges Harik, one of Google’s earliest engineers, with participation from A.Capital Ventures, Crescent Cove, Felicis, NVIDIA, Peak6, Samsung Next, Supermicro and Upper90.

“When we launched nearly four years ago, we believed inference would become the dominant driver of enterprise AI workloads – and we are now at this inflection point,” said Nikola Borisov, co-founder and CEO, DeepInfra. “What’s happening now is incredibly exciting – open-source models are rapidly reaching parity with proprietary systems, unlocking a new wave of innovation at a fraction of the cost and enabling widespread adoption. At the same time, agent-based systems are driving continuous, high-volume demand. Inference is no longer a thin layer – it’s the system constraint that will define the majority of workloads. Most cloud platforms weren’t built for this always-on, distributed model, so we built DeepInfra from the ground up to deliver better economics, performance, and security.”

The investment reflects 500 Global’s portfolio thesis across the AI stack. The firm’s conviction is that infrastructure will be as defining a category as the models themselves.

“Demand for AI is causing every layer of the AI stack to innovate, and inference is no exception. In the agentic age, new workflows are arising on a rapid basis, as evidenced recently by OpenClaw and AutoResearch. Enterprises and developers building with open source and agent-driven AI need infrastructure that was designed to be flexible, fast and reliable. We backed DeepInfra because, in our assessment, this team has already proven they can build and operate distributed systems at global scale, and because we believe purpose-built inference infrastructure will be fundamental to the next phase of AI as compute was to the last,” said Tony Wang, Managing Partner, 500 Global.

Source link