Cumulus Labs

Performant serverless GPU inference

Winter 2026ActiveB2BInfrastructureAIOpsB2BCloud ComputingInfrastructureSan Francisco, CA, USA

Company

Cumulus Labs is a serverless GPU cloud with a proprietary inference stack, purpose-built for AI teams who want faster performance, lower costs, and zero infrastructure work. If your team uses GPUs — for inference, training, fine-tuning, or any compute workload — Cumulus replaces the painful parts of your stack with a platform that just works. Most teams today are stuck choosing between bad options. Self-hosting inference means wrestling with vLLM or SGLang configurations, debugging CUDA issues, and babysitting infrastructure that breaks at scale. Managed API providers like OpenRouter or Fireworks are convenient but expensive, and you're paying their margin on top of the compute. Serverless GPU platforms give you flexibility but hit you with slow cold starts, idle billing, and no help on the inference layer — you're still on your own to make models run fast. Cumulus eliminates that tradeoff. The platform assigns containers in seconds, scales to zero when you're idle, and scales up to as many instances as you need with no waitlists or approvals. You're billed by the second for exactly the compute you use — nothing more. For inference, Cumulus ships Ion — a proprietary engine that supports all major LLMs, VLMs, and MoE architectures out of the box. Ion is optimized for latency and throughput beyond what teams typically achieve managing vLLM or SGLang themselves, with zero configuration required. Whether you're serving your own fine-tuned model or hosting an open-source model like Llama or Mixtral, Ion handles the performance layer so your team doesn't have to. Cumulus also supports checkpointing, model compilation, and LoRA serving, and the team forward-deploys custom optimizations directly for customers. For training, fine-tuning, and general container workloads, teams bring any job to Cumulus and run it on the same serverless infrastructure — no cluster management, no GPU debugging, no orchestration setup. If you're paying for inference APIs, Cumulus lets you run the same models yourself for less. If you're self-hosting, Cumulus makes your models faster without the operational burden. If you need GPUs for any workload at all, Cumulus is the simplest and most performant way to get them.

Verdict

High Signal

Market Opportunity

Serverless GPU inference is a massive and rapidly growing B2B infrastructure market as every AI team needs inference compute. ICP is clear: AI teams running open-source models who want to avoid self-hosting pain and managed API margins. Pay-per-second billing and scale-to-zero is a proven monetization model (see Lambda Labs, Modal, Replicate). TAM easily $10B+ given GPU cloud market trajectory.

Medium Signal

Founder Signal

Suryaa was Lead Engineer at TensorDock building a distributed GPU marketplace and then a Forward Deployed SWE at Palantir — directly relevant infra and AI deployment experience. Veer was Sr. Software Engineer at AiRANACULUS leading a Space Force SBIR contract for 2+ years through college. Both graduated December 2025, so they're fresh grads, but both carried real engineering responsibility during school. Technically credible for this domain, though neither has prior startup exits or senior industry roles post-graduation.

Low Signal

Competition

The serverless GPU inference space is extremely crowded: Modal, Replicate, RunPod, Banana, Baseten, Fireworks, Together AI, and AWS/GCP/Azure serverless GPU offerings all compete here. Cumulus's differentiation rests on cold start speed (12.5s vs Modal's claimed 70s) and the proprietary Ion inference engine — real technical claims, but unverified externally. Big cloud providers and well-funded incumbents like Modal have significant head starts and distribution advantages.

Medium Signal

Product

Live console, docs, and API exist with working code snippets shown. Key differentiator is 12.5s cold start vs Modal's 70s (internally benchmarked), plus a proprietary inference engine called Ion. No named customer logos, no revenue figures, and still gated behind 'Request Access' — so traction is unproven publicly.

OverallB Tier

Cumulus is attacking a real, large market with a technically credible angle — a proprietary inference stack (Ion) and claimed 5x faster cold starts than Modal. Both founders have hands-on GPU infrastructure experience that's directly relevant, which is more important than their fresh-grad status. The core risk is competitive: Modal, Replicate, Baseten, and Fireworks are well-capitalized and already have customer bases, making differentiation on cold start speed alone a thin moat. No public customer traction or revenue data is visible, and the NVIDIA Inception badge is a marketing program, not a customer. If Ion genuinely delivers measurable latency/throughput improvements at scale and they can show real enterprise customers, this could break into A territory — but right now it's a technically promising team in a brutally competitive infrastructure market without proof of traction.

Active Founders

Suryaa Rajinikanth

Founder

Suryaa Rajinikanth studied computer science at Georgia Tech, where he concurrently worked at TensorDock as a Lead Engineer, building the first distributed GPU marketplace serving thousands of consumers and businesses. He went on to deploy critical AI systems and infrastructure in high-performance environments at Palantir.

Veer Shah

Founder

Veer studied Computer Science at the University of Wisconsin—Madison, graduating in December 2025. During college, he worked at an aerospace startup where he led a Space Force SBIR contract for military satellite communications and contributed to several NASA SBIR programs, two of which were commercialized and are currently being flight tested in space. Before college, he captained his FIRST Robotics Team 5422: Stormgears, qualifying for Worlds all four years.

Cumulus Labs

TierB Tier

BatchWinter 2026

Team Size2

StatusActive

LocationSan Francisco, CA, USA