Policy frontier · Risk brake · Rotation lane

Historical replay target-match — walk-forward scoped, not a live trading guarantee.

Intelligence Lab · Product 10

RL Policy Lab

Research-only RL Policy Lab for scenario profiles, disagreement, volatility, event risk, and transparent decision-support boundaries.

Compare RL Policy Lab access Back to Intelligence Lab

Current answer

The live decision read

RL Policy Lab should compare decision-support profiles across states — reduce risk, lean into continuation, prefer relative value, hedge first, or wait — while making the reasoning, confidence, scenario sensitivity, and trade-offs easy to inspect.

Review this first
Use the current commodity snapshot to decide whether the watchlist exposure needs research review.

What would flip this decision?
A fresh price reversal, model disagreement, or catalyst miss would move this workflow route back to watch-only.

Ideal buyer

Advanced users, PMs, and desks that need transparent policy review logic, regime-aware scenario review, and audit-friendly review logic

Best tier

Enterprise / Premium Desk

Primary visual

Scenario grid + response map + episode replay

Continue your saved workflow

Answer preview is available now. Save a workflow later if this module becomes decision-critical for your names.

Build my workflow Run simulator with my watchlist

Workspace role

Choose a role to personalize

Commodity loop

Use a preset or pick a commodity

Watchlist

Add tickers to map exposure

Freshness

Ready to attach

What problem this solves

Policy decisions need to be explainable under uncertainty

Without a clear framework, reinforcement learning can feel opaque and difficult to trust. This module should show how policy profiles change across states, which regimes they fit, what risks are being managed, and where confidence is still fragile.

What the product actually does

Make the policy layer observable and operational

Live decision workspace

Preview the module on a live commodity

This workspace uses the current CommodityNode data stack and your saved workflow context so each product page behaves like a live decision-support surface instead of static brochure copy.

Commodity selector

Decision preview ready — choose a commodity to refresh the chart.

Alert inactive

Decision preview active

Saved workspaces use account context when available; this browser-saved preview remains useful without setup.

Decision console

Current scenario review

Current scenario snapshot ready

The decision console opens with the latest verified scenario signal, confidence, and baseline comparison, then refreshes when live model data is available.

Trust strip

How reliable is this scenario signal right now?

Freshness, stability, guardrails, and policy readiness are visible before a team uses the signal in a decision-support workflow.

Policy observability board

What state is the model seeing right now?

Expose the live scenario context, state pressure, profile governance, and baseline comparison in one board so the user can audit the setup before deeper review.

Latest verified state, profile choice, and baseline comparison are shown; live model refresh can refine this board.

Animated policy field

See policy pressure moving across the action space

This field turns scenario-response probabilities into a living motion surface so the model feels like an active decision-support engine instead of a static table.

Policy guardrail Probability-weighted policy field using the latest verified artifact

Reduce risk0%

Hold0%

Add continuation0%

Add hedge0%

Relative-value rotation0%

Baseline comparison

How does the scenario response compare with baseline policies?

Show hold, offline, PPO bootstrap, and Neural PPO side-by-side so the user can see whether the workflow route is actually better than the current baseline.

Policy	Action	Confidence	Replay reward	Replay uplift	Walk uplift	Verdict

Core-5 policy bake-off

Which policy survives across the full commodity basket?

Rank the active policies on the same five-commodity slate, then show the weakest commodity explicitly so the workflow route is backed by evidence instead of average-case optimism.

Cross-commodity bake-off summary uses the latest verified Core-5 policy slate.

Why this workflow route?

What is driving the current action?

Explain the strongest positive and negative drivers, then ground the workflow route with simple reason codes and comparable historical states.

Scenario workbench

What would flip this decision?

Explore how event risk, volatility, agreement, disagreement, and trend strength change the workflow route before a user commits to action.

Policy audit trail

What changed, and when?

Keep a timeline of workflow route events, profile selection changes, alerts, and replay evidence so users can audit the workflow route path instead of trusting a static score.

Policy probability surface

Which action is the model leaning toward?

See how confidence is distributed across defensive, neutral, and continuation actions instead of trusting a single headline action.

Policy evolution

How did the policy change through training?

Compare offline policy, PPO bootstrap, and neural PPO so people can see the decision layer actually moving.

State vector

Which signals are dominating the current state?

Normalize the live observation so users can see whether agreement, event risk, volatility, disagreement, or trend is actually pushing the policy.

Regime pressure map

Where is the policy actually earning trust?

Contrast hit-rate and opportunity density across continuation, risk-off, hedge, and rotation regimes so blind spots are visible immediately.

Walk-forward windows

Did it hold up on unseen periods?

Each window shows out-of-sample reward, risk band, and dominant action so the policy cannot hide behind one lucky backtest.

Profile governance

Why is this profile the active workflow route profile?

Show device, timesteps, prior weight, and selection score so users know the current workflow route came from governed profile selection, not hand-picked marketing output.

Episode replay

What decisions did the policy actually make?

Replay the latest path step-by-step to show how the model reacts as the state shifts instead of pretending it is a black box.

Regime playbook matrix

What does the policy want to do across regimes?

Surface the best action and realized value in continuation, risk-off, hedge, and rotation states so the page feels like a regime console instead of a single headline action.

Reward stack

Where did performance come from?

Break historical replay outcome, review cost, risk band, concentration, and event-gap costs apart so users understand what is helping and hurting the policy.

Decision trace

What is the model deciding right now?

Current workflow route, confidence spread, and replay stats make the model feel alive instead of static marketing copy.

Success dashboard

Operational readiness for this module

Policy confidence

Verified

Latest decision snapshot available.

Walk uplift vs hold

Guardrailed

Fallback copy keeps the surface useful while live model data refreshes.

Replay uplift vs hold

Actionable

Open the linked workflow for the next decision step.

Visualization system

How users should read it

State Vector Radar
Action Probability Profile
Policy Frontier Scatter
Episode Replay Timeline
Baseline Comparison
Trust & Limitations Card

Decision value

What this helps decide

Premium differentiation for advanced users who need scenario-aware decision support.
Good enterprise narrative when positioned as a policy audit and action-review surface.
Strong complement to simulator and stress testing.

Inputs

Data required

Regime labels
Agreement and anomaly context
Stress test outcomes
Curated policy profiles / reward heuristics

Verified access depth

What Pro adds to the workflow

Free: concept only
Pro: teaser state/profile map
Enterprise: full policy workbench, scenario audit trail, and exports

Access level

Choose the right access level for RL Policy Lab

This earns revenue when it is presented as a transparent policy workflow for advanced users who need explainable scenario-linked review logic, scenario review, and governance-friendly exports.

Free

Public preview

Conceptual explanation only so trust is built before any upsell.

Pro

Self-directed research workflow

A teaser state-action map for advanced users who want trust-first policy support.

Desk

Workflow depth

Policy frontier, replay view, scenario-linked scenario-linked review logic, and explicit what-would-flip-this review.

Best fit · Enterprise / Premium Desk

Enterprise

Team / API / exports

Full policy workbench, exports, governance-friendly action history, and scenario audit trails.