Live · Cohort 04 training on 2.1M market sessions

Trading built
AI-first.
Drawdowns down.
Conviction up.

Concavity is a quantitative trading firm built around reinforcement-learned LLM agents. We don't bolt models onto a legacy book — every signal, sizing decision and risk veto is generated by policy networks trained against a decade of high-resolution market simulation.

Request investor access Read the method →

Sharpe (live, ytd)

2.84

Max drawdown

3.6%

Yearly return

+312%

Markets covered

137

SPX +0.42%NDX +0.71%DJX +0.12%VIX −2.3%BTC +1.4%ETH +0.9%ES1 +0.38%NQ1 +0.62%CL1 −0.4%GC1 +0.21%ZN1 +0.05%DXY −0.18%EUR +0.14%JPY −0.22%GBP +0.07%TSLA +2.1%NVDA +1.7%META +0.5%AAPL −0.3%MSFT +0.4%SPX +0.42%NDX +0.71%DJX +0.12%VIX −2.3%BTC +1.4%ETH +0.9%ES1 +0.38%NQ1 +0.62%CL1 −0.4%GC1 +0.21%ZN1 +0.05%DXY −0.18%EUR +0.14%JPY −0.22%GBP +0.07%TSLA +2.1%NVDA +1.7%META +0.5%AAPL −0.3%MSFT +0.4%

01 · Live performance

Same alpha. One-third the pain.

Strategies trained with our drawdown-aware reward function produce equity curves that are visibly more concave: faster recovery from shocks, shorter underwater periods, and tail behavior that compounds.

Strategy

CCV-Core / Multi-asset

▲ Live since Jan 2024

Yearly return

+312%

▲ vs benchmark +18.4%

Max DD

3.6%

▲ benchmark 11.8%

Sortino

4.31

▲ benchmark 0.94

Concavity CCV-Core

60/40 Benchmark

Sector-equal hedge index

Updated · live · 1m candles

02 · The system

An agent stack, not a screener.

Concavity is a closed loop: market state in, policy out, P&L back as reward. Three components do the heavy lifting.

01 / Encoder

LLM that reads markets the way analysts do.

A 32B-parameter base model continued on filings, transcripts, order-flow narratives, and tick data. It emits dense, tradeable embeddings — not text.

02 / Policy

Reinforcement learning, shaped for downside.

PPO with a reward function that penalizes drawdown shape, not just magnitude. The agent learns to be wrong cheaply and right fully invested.

03 / Executor

Microstructure-aware execution.

A separate agent decides how to slice and route. Trained against our internal LOB simulator with 50µs replay fidelity across 12 venues.

03 · Method

Reinforcement learning, applied to language.

We treat each trading session as an episode. The LLM proposes; the policy network decides position, size, and timing; outcomes flow back as gradient signal.

The agent learns the shape of being wrong.

Most quant systems optimize for mean P&L and discover too late that variance kills them. Our reward function explicitly rewards concave equity curves — fast recovery, shallow troughs, and asymmetry between winning and losing streaks.

Encode the regime

The LLM ingests news, filings, intraday order flow and produces a 4,096-d state vector.

Propose an action

The policy head emits a continuous position vector across 137 instruments.

Simulate / live

An LOB simulator (or live OMS) returns realized P&L, slippage, and risk exposure.

Reward, shaped

Drawdown-aware reward signals propagate backward. Bad shapes are punished disproportionately.

04 · By the numbers

Built to compound, not impress.

All figures are net of fees, audited quarterly, and reported against a 60/40 global benchmark over the same window.

Drawdown reduction

vs. matched-Sharpe baseline

Sharpe · live

0.00

trailing 12 months

Yearly return

0.0%

benchmark · +18.4%

05 · Architecture

One pipeline, fully vertical.

Every layer is owned in-house — from data ingestion to broker — so the policy never sees a feature it can't trust.

06 · Difference

What an AI-first firm actually means.

Plenty of funds run ML inside an analyst-first process. We invert it: the agent owns the book, humans set the constraints.

Conventional quant

PMs source signals; ML scores them.
Drawdown limits are external risk overlays.
Backtest-then-deploy with manual gating.
Models retrained quarterly, by hand.
Execution is a vendor.
Edge attributed to people.

Concavity

The policy network sources its own signals.
Drawdown shape is in the reward function.
Continual training, live shadow + canary cohorts.
Daily checkpoint, gated by held-out distribution shift.
Execution is a learned agent we own.
Edge attributable to compute and method.

Trading builtAI-first.Drawdowns down.Conviction up.

Same alpha. One-third the pain.

An agent stack, not a screener.

LLM that reads markets the way analysts do.

Reinforcement learning, shaped for downside.

Microstructure-aware execution.

Reinforcement learning, applied to language.

The agent learns the shape of being wrong.

Built to compound, not impress.

One pipeline, fully vertical.

What an AI-first firm actually means.

Conventional quant

Concavity

Trading built
AI-first.
Drawdowns down.
Conviction up.