Polymath

Automating RL environment generation

Winter 2026ActiveB2BReinforcement LearningB2BBig DataInfrastructureAISan Francisco, CA, USA

Company

Polymath is building world generation models and systems to automate the creation of reinforcement learning environments. As the demand for AI shifts from question-answering to autonomous agents that can operate over long horizons, models must be trained in environments that reflect the real world. Today, RL environment generation is bottlenecked by human labor. Companies hire contractors to hand-build artifacts one-by-one. This approach is expensive and doesn’t scale. Moreover, human data alone will never lead to superintelligence. We’re building the core technology to enable automated environment generation using far less human effort than traditionally required, and eventually none. This allows for more complex and realistic worlds, and higher quality, scale, and diversity of tasks. This will be essential to unlock RL scaling. Our end goal is to create realistic, long-horizon environments from a text description alone. This will enable the creation of worlds of arbitrary complexity and scale, which is foundational for training & evaluating autonomous, superintelligent AI agents. We’re a team of researchers and engineers from UC Berkeley, Hume AI, Plaid, and Amazon. We have years of experience post-training frontier models in industry, and building large-scale data systems.

Verdict

High Signal

Market Opportunity

RL environment generation for AI agents is a direct infrastructure play targeting every major AI lab and enterprise deploying coding agents — a multi-billion dollar addressable market. The ICP is clear: AI companies training long-horizon coding agents. Monetization path is straightforward via benchmarks, environment APIs, or data licensing to labs.

Medium Signal

Founder Signal

Dylan Ma has ~2 years at AWS SageMaker (shipped real ML inference features, received Amazon Inventor Award) plus ~1.5 years at Hume AI doing SFT/RL on frontier voice models — directly relevant experience. Naren Yenuganti has ~2 years at Amazon (ML for logistics) and ~1 year at Plaid (early-stage product). Both are UC Berkeley EECS grads (class of ~2022), so ~3-4 years of post-grad experience. Solid but not exceptional — no prior exits, no staff/principal-level roles at top AI labs.

Medium Signal

Competition

Direct competitors include SWE-bench (academic benchmark), AgentBench, and internal environment teams at OpenAI, Anthropic, and Google DeepMind. Differentiation claim is automation of environment generation versus manual construction. However, large labs building this internally is a real threat, and no proprietary data moat is demonstrated yet.

Medium Signal

Product

They have a named product 'Horizon-SWE' described as a long-horizon, multi-tool SWE agent benchmark, which is a concrete artifact. However, no pricing page, no customer logos, no usage metrics, and the website is largely a landing page with minimal substance — no live demo or API docs visible.

OverallB Tier

Polymath is attacking a real and timely problem — RL environment generation is genuinely bottlenecked today and matters for everyone training agentic models. The founders have directly relevant experience (Dylan's RL/SFT work at Hume AI is a strong signal), but neither has lab-level research credentials or prior exits. The product is early-stage with only a benchmark announced and no visible revenue or customers. The biggest risk is that the largest potential customers (OpenAI, Anthropic, Google) build this internally, and the team lacks the research pedigree to credibly out-execute them on the core technical challenge.

Active Founders

Dylan Ma

Founder

Co-Founder / CEO @ Polymath. Previously @ Hume AI, AWS, UC Berkeley

Naren Yenuganti

Founder

Co-Founder / CTO @ Polymath. Previously @ Plaid, Amazon, UC Berkeley

Polymath

TierB Tier

BatchWinter 2026

Team Size2

StatusActive

LocationSan Francisco, CA, USA