Offline Environments¶

Offline environments are designed for training on historical data (backtesting). These are not "offline RL" methods like CQL or IQL, but rather environments that use pre-collected market data instead of live exchange APIs. For deploying trained policies to real exchanges, see Online Environments.

Unified Architecture¶

TorchTrade provides 3 unified environment classes that each support both spot and futures trading via configuration. Set leverage=1 for spot (long-only) or leverage>1 for futures (with margin management and liquidation mechanics). Use negative action_levels to enable short positions.

Environment	Bracket Orders	One-Step	Best For
SequentialTradingEnv	-	-	Standard sequential trading
SequentialTradingEnvSLTP	Yes	-	Risk management with SL/TP
OneStepTradingEnv	Yes	Yes	GRPO, contextual bandits

Sequential (SequentialTradingEnv) — Step-by-step trading with fractional position sizing. Action values represent the fraction of capital to deploy (e.g., 0.5 = 50% allocation).

SL/TP (SequentialTradingEnvSLTP) — Extends sequential with bracket order risk management. Each trade includes configurable stop-loss and take-profit levels with a combinatorial action space.

OneStep (OneStepTradingEnv) — Optimized for fast episodic training with GRPO. The agent takes one action, and the environment simulates a rollout until SL/TP triggers or max rollout length. Policies can be deployed to SequentialTradingEnvSLTP for step-by-step execution.

Extensible Framework

Users can create custom environments by inheriting from existing base classes. See Building Custom Environments.

Account State¶

All environments expose a universal 6-element account_state tensor as part of the observation:

Index	Element	Description	Spot	Futures
0	`exposure_pct`	Position value / portfolio value	0.0–1.0	0.0–N (with leverage)
1	`position_direction`	Sign of position size	0 or +1	-1, 0, or +1
2	`unrealized_pnl_pct`	Unrealized P&L as % of entry price	≥0	Any
3	`holding_time`	Steps since position opened	≥0	≥0
4	`leverage`	Current leverage	1.0	1–125
5	`distance_to_liquidation`	Normalized distance to liquidation price	1.0 (no risk)	Calculated

This structure is shared across offline and online environments, ensuring policies transfer seamlessly between training and live deployment.

Fractional Position Sizing¶

SequentialTradingEnv uses action_levels to define discrete fractional position sizes in [-1.0, 1.0]:

Magnitude = fraction of balance to allocate (0.5 = 50%, 1.0 = 100%)
Sign = direction (positive = long, negative = short, zero = flat/close)
With leverage: position size = balance × |action| × leverage / price

action_levels = [-1.0, 0.0, 1.0]              # Coarse: full short/flat/full long
action_levels = [0.0, 0.25, 0.5, 0.75, 1.0]   # Long-only with granularity
action_levels = [-0.5, -0.25, 0.0, 0.25, 0.5]  # Conservative, no full positions

SLTP environments use include_hold_action (default True) to optionally include a HOLD/no-op action alongside the SL/TP bracket combinations.

Leverage¶

Leverage is a fixed global parameter, not part of the action space. action=0.5 always means "deploy 50% of capital" regardless of leverage setting. This keeps the action space small and separates risk management (leverage) from the learned policy (position sizing).

Timeframe Format

Always use canonical forms: ["1min", "5min", "15min", "1hour", "1day"]. Non-canonical forms like "60min" create different observation keys (market_data_60Minute vs market_data_1Hour), breaking model compatibility.

SequentialTradingEnv¶

The core sequential trading environment. Trading mode is determined by configuration: leverage=1 for spot, leverage>1 for futures.

Configuration¶

Spot TradingFutures Trading

from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig

config = SequentialTradingEnvConfig(
    time_frames=["1min", "5min", "15min", "1hour"],
    window_sizes=[12, 8, 8, 24],
    execute_on=(5, "Minute"),
    action_levels=[0.0, 0.5, 1.0],    # Close / 50% / 100% long
    initial_cash=1000,
    transaction_fee=0.0025,
    slippage=0.001,
)

env = SequentialTradingEnv(df, config)

from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig

config = SequentialTradingEnvConfig(
    time_frames=["1min", "5min", "15min", "1hour"],
    window_sizes=[12, 8, 8, 24],
    execute_on=(5, "Minute"),
    leverage=5,
    action_levels=[-1.0, -0.5, 0.0, 0.5, 1.0],  # Short/neutral/long
    initial_cash=10000,
    transaction_fee=0.0004,
    slippage=0.001,
)

env = SequentialTradingEnv(df, config)

Observation Space¶

observation = {
    "market_data_1Minute": Tensor([12, num_features]),    # 1m window
    "market_data_5Minute": Tensor([8, num_features]),     # 5m window
    "market_data_15Minute": Tensor([8, num_features]),    # 15m window
    "market_data_1Hour": Tensor([24, num_features]),      # 1h window
    "account_state": Tensor([6]),                         # See Account State above
}

Liquidation (Futures)¶

When leverage > 1, positions are liquidated if margin is insufficient. E.g., with $10k cash at 10x leverage, a 20% loss exceeds equity and triggers liquidation.

SequentialTradingEnvSLTP¶

Extends SequentialTradingEnv with bracket order risk management. Supports both spot and futures modes via leverage.

Configuration¶

from torchtrade.envs.offline import SequentialTradingEnvSLTP, SequentialTradingEnvSLTPConfig

config = SequentialTradingEnvSLTPConfig(
    stoploss_levels=[-0.02, -0.05],
    takeprofit_levels=[0.05, 0.10],
    include_hold_action=True,

    # Futures parameters (leverage > 1 enables short bracket orders)
    leverage=5,
    margin_call_threshold=0.2,

    time_frames=["1min", "5min", "15min"],
    window_sizes=[12, 8, 8],
    execute_on=(5, "Minute"),
    initial_cash=10000,
    transaction_fee=0.0004,
    slippage=0.001,
)

env = SequentialTradingEnvSLTP(df, config)

Action Space¶

With 2 SL levels, 2 TP levels, and leverage > 1 (futures):

Action 0: HOLD / Close position
Actions 1-4: LONG with SL/TP combinations
Actions 5-8: SHORT with SL/TP combinations

Formula: 1 + 2 × (num_sl × num_tp) = 9 actions. Without HOLD (include_hold_action=False): 2 × (num_sl × num_tp) = 8 actions.

OneStepTradingEnv¶

One-step episodic environment for GRPO and contextual bandits. The agent takes a single action, and the environment simulates a rollout until SL/TP triggers or max rollout length. Supports spot and futures via leverage.

Deployment

Policies trained on OneStepTradingEnv can be deployed directly to SequentialTradingEnvSLTP — both share the same observation and action spaces.

Configuration¶

from torchtrade.envs.offline import OneStepTradingEnv, OneStepTradingEnvConfig

config = OneStepTradingEnvConfig(
    stoploss_levels=[-0.02, -0.05],
    takeprofit_levels=[0.05, 0.10],
    include_hold_action=True,
    rollout_steps=24,

    # Futures parameters (leverage > 1 enables short bracket orders)
    leverage=5,
    margin_call_threshold=0.2,

    time_frames=["1min", "5min", "15min"],
    window_sizes=[12, 8, 8],
    execute_on=(5, "Minute"),
    initial_cash=10000,
    transaction_fee=0.0004,
)

env = OneStepTradingEnv(df, config)

Visualization¶

All offline environments support render_history() to visualize episode performance:

env.render_history()  # Display after running an episode
fig = env.render_history(return_fig=True)  # Or get the figure

All environments render 3 subplots: price + actions, portfolio vs buy-and-hold, and exposure history.

See Visualization Guide for details.

Next Steps¶

Online Environments - Deploy to live exchanges
Feature Engineering - Add technical indicators and custom rewards