Live · Cohort 04 training on 2.1M market sessions

Trading built
AI-first.
Drawdowns down.
Conviction up.

Concavity is a quantitative trading firm built around reinforcement-learned LLM agents. We don't bolt models onto a legacy book — every signal, sizing decision and risk veto is generated by policy networks trained against a decade of high-resolution market simulation.

Sharpe (live, ytd)
2.84
Max drawdown
3.6%
Yearly return
+312%
Markets covered
137
SPX +0.42%NDX +0.71%DJX +0.12%VIX −2.3%BTC +1.4%ETH +0.9%ES1 +0.38%NQ1 +0.62%CL1 −0.4%GC1 +0.21%ZN1 +0.05%DXY −0.18%EUR +0.14%JPY −0.22%GBP +0.07%TSLA +2.1%NVDA +1.7%META +0.5%AAPL −0.3%MSFT +0.4%SPX +0.42%NDX +0.71%DJX +0.12%VIX −2.3%BTC +1.4%ETH +0.9%ES1 +0.38%NQ1 +0.62%CL1 −0.4%GC1 +0.21%ZN1 +0.05%DXY −0.18%EUR +0.14%JPY −0.22%GBP +0.07%TSLA +2.1%NVDA +1.7%META +0.5%AAPL −0.3%MSFT +0.4%
01 · Live performance

Same alpha. One-third the pain.

Strategies trained with our drawdown-aware reward function produce equity curves that are visibly more concave: faster recovery from shocks, shorter underwater periods, and tail behavior that compounds.

Strategy
CCV-Core / Multi-asset
▲ Live since Jan 2024
Yearly return
+312%
▲ vs benchmark +18.4%
Max DD
3.6%
▲ benchmark 11.8%
Sortino
4.31
▲ benchmark 0.94
Concavity CCV-Core
60/40 Benchmark
Sector-equal hedge index
Updated · live · 1m candles
02 · The system

An agent stack, not a screener.

Concavity is a closed loop: market state in, policy out, P&L back as reward. Three components do the heavy lifting.

01 / Encoder

LLM that reads markets the way analysts do.

A 32B-parameter base model continued on filings, transcripts, order-flow narratives, and tick data. It emits dense, tradeable embeddings — not text.

02 / Policy

Reinforcement learning, shaped for downside.

PPO with a reward function that penalizes drawdown shape, not just magnitude. The agent learns to be wrong cheaply and right fully invested.

03 / Executor

Microstructure-aware execution.

A separate agent decides how to slice and route. Trained against our internal LOB simulator with 50µs replay fidelity across 12 venues.

03 · Method

Reinforcement learning, applied to language.

We treat each trading session as an episode. The LLM proposes; the policy network decides position, size, and timing; outcomes flow back as gradient signal.

The agent learns the shape of being wrong.

Most quant systems optimize for mean P&L and discover too late that variance kills them. Our reward function explicitly rewards concave equity curves — fast recovery, shallow troughs, and asymmetry between winning and losing streaks.

01
Encode the regime
The LLM ingests news, filings, intraday order flow and produces a 4,096-d state vector.
02
Propose an action
The policy head emits a continuous position vector across 137 instruments.
03
Simulate / live
An LOB simulator (or live OMS) returns realized P&L, slippage, and risk exposure.
04
Reward, shaped
Drawdown-aware reward signals propagate backward. Bad shapes are punished disproportionately.
04 · By the numbers

Built to compound, not impress.

All figures are net of fees, audited quarterly, and reported against a 60/40 global benchmark over the same window.

Drawdown reduction
0%
vs. matched-Sharpe baseline
Sharpe · live
0.00
trailing 12 months
Yearly return
0.0%
benchmark · +18.4%
05 · Architecture

One pipeline, fully vertical.

Every layer is owned in-house — from data ingestion to broker — so the policy never sees a feature it can't trust.

06 · Difference

What an AI-first firm actually means.

Plenty of funds run ML inside an analyst-first process. We invert it: the agent owns the book, humans set the constraints.

Conventional quant

  • PMs source signals; ML scores them.
  • Drawdown limits are external risk overlays.
  • Backtest-then-deploy with manual gating.
  • Models retrained quarterly, by hand.
  • Execution is a vendor.
  • Edge attributed to people.

Concavity

  • The policy network sources its own signals.
  • Drawdown shape is in the reward function.
  • Continual training, live shadow + canary cohorts.
  • Daily checkpoint, gated by held-out distribution shift.
  • Execution is a learned agent we own.
  • Edge attributable to compute and method.