Concavity is a quantitative trading firm built around reinforcement-learned LLM agents. We don't bolt models onto a legacy book — every signal, sizing decision and risk veto is generated by policy networks trained against a decade of high-resolution market simulation.
Strategies trained with our drawdown-aware reward function produce equity curves that are visibly more concave: faster recovery from shocks, shorter underwater periods, and tail behavior that compounds.
Concavity is a closed loop: market state in, policy out, P&L back as reward. Three components do the heavy lifting.
A 32B-parameter base model continued on filings, transcripts, order-flow narratives, and tick data. It emits dense, tradeable embeddings — not text.
PPO with a reward function that penalizes drawdown shape, not just magnitude. The agent learns to be wrong cheaply and right fully invested.
A separate agent decides how to slice and route. Trained against our internal LOB simulator with 50µs replay fidelity across 12 venues.
We treat each trading session as an episode. The LLM proposes; the policy network decides position, size, and timing; outcomes flow back as gradient signal.
Most quant systems optimize for mean P&L and discover too late that variance kills them. Our reward function explicitly rewards concave equity curves — fast recovery, shallow troughs, and asymmetry between winning and losing streaks.
All figures are net of fees, audited quarterly, and reported against a 60/40 global benchmark over the same window.
Every layer is owned in-house — from data ingestion to broker — so the policy never sees a feature it can't trust.
Plenty of funds run ML inside an analyst-first process. We invert it: the agent owns the book, humans set the constraints.