Skip to content

Offline Environments

Offline environments are designed for training on historical data (backtesting). These are not "offline RL" methods like CQL or IQL, but rather environments that use pre-collected market data instead of live exchange APIs. For deploying trained policies to real exchanges, see Online Environments.

Unified Architecture

TorchTrade provides 3 unified environment classes that each support both spot and futures trading via configuration. Set leverage=1 for spot (long-only) or leverage>1 for futures (with margin management and liquidation mechanics). Use negative action_levels to enable short positions.

Environment Bracket Orders One-Step Best For
SequentialTradingEnv - - Standard sequential trading
VectorizedSequentialTradingEnv - - High-throughput training (experimental)
SequentialTradingEnvSLTP Yes - Risk management with SL/TP
VectorizedSequentialTradingEnvSLTP Yes - High-throughput SL/TP training (experimental)
OneStepTradingEnv Yes Yes GRPO, contextual bandits

Sequential (SequentialTradingEnv) — Step-by-step trading with fractional position sizing. Action values represent the fraction of capital to deploy (e.g., 0.5 = 50% allocation).

SL/TP (SequentialTradingEnvSLTP) — Extends sequential with bracket order risk management. Each trade includes configurable stop-loss and take-profit levels with a combinatorial action space.

OneStep (OneStepTradingEnv) — Optimized for fast episodic training with GRPO. The agent takes one action, and the environment simulates a rollout until SL/TP triggers or max rollout length. Policies can be deployed to SequentialTradingEnvSLTP for step-by-step execution.

Extensible Framework

Users can create custom environments by inheriting from existing base classes. See Building Custom Environments.


Account State

All environments expose a universal 6-element account_state tensor as part of the observation:

Index Element Description Spot Futures
0 exposure_pct Position value / portfolio value 0.0–1.0 0.0–N (with leverage)
1 position_direction Sign of position size 0 or +1 -1, 0, or +1
2 unrealized_pnl_pct Unrealized P&L as % of entry price ≥0 Any
3 holding_time Steps since position opened ≥0 ≥0
4 leverage Current leverage 1.0 1–125
5 distance_to_liquidation Normalized distance to liquidation price 1.0 (no risk) Calculated

This structure is shared across offline and online environments, ensuring policies transfer seamlessly between training and live deployment.

Fractional Position Sizing

SequentialTradingEnv uses action_levels to define discrete fractional position sizes in [-1.0, 1.0]:

  • Magnitude = fraction of balance to allocate (0.5 = 50%, 1.0 = 100%)
  • Sign = direction (positive = long, negative = short, zero = flat/close)
  • With leverage: position size = balance × |action| × leverage / price
action_levels = [-1.0, 0.0, 1.0]              # Coarse: full short/flat/full long
action_levels = [0.0, 0.25, 0.5, 0.75, 1.0]   # Long-only with granularity
action_levels = [-0.5, -0.25, 0.0, 0.25, 0.5]  # Conservative, no full positions

SLTP environments use include_hold_action (default True) to optionally include a HOLD/no-op action alongside the SL/TP bracket combinations.

Leverage

Leverage is a fixed global parameter, not part of the action space. action=0.5 always means "deploy 50% of capital" regardless of leverage setting. This keeps the action space small and separates risk management (leverage) from the learned policy (position sizing).

Timeframe Format

Always use canonical forms: ["1min", "5min", "15min", "1hour", "1day"]. Non-canonical forms like "60min" create different observation keys (market_data_60Minute vs market_data_1Hour), breaking model compatibility.


SequentialTradingEnv

The core sequential trading environment. Trading mode is determined by configuration: leverage=1 for spot, leverage>1 for futures.

Configuration

from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig

config = SequentialTradingEnvConfig(
    time_frames=["1min", "5min", "15min", "1hour"],
    window_sizes=[12, 8, 8, 24],
    execute_on=(5, "Minute"),
    action_levels=[0.0, 0.5, 1.0],    # Close / 50% / 100% long
    initial_cash=1000,
    transaction_fee=0.0025,
    slippage=0.001,
)

env = SequentialTradingEnv(df, config)
from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig

config = SequentialTradingEnvConfig(
    time_frames=["1min", "5min", "15min", "1hour"],
    window_sizes=[12, 8, 8, 24],
    execute_on=(5, "Minute"),
    leverage=5,
    action_levels=[-1.0, -0.5, 0.0, 0.5, 1.0],  # Short/neutral/long
    initial_cash=10000,
    transaction_fee=0.0004,
    slippage=0.001,
)

env = SequentialTradingEnv(df, config)

Observation Space

observation = {
    "market_data_1Minute": Tensor([12, num_features]),    # 1m window
    "market_data_5Minute": Tensor([8, num_features]),     # 5m window
    "market_data_15Minute": Tensor([8, num_features]),    # 15m window
    "market_data_1Hour": Tensor([24, num_features]),      # 1h window
    "account_state": Tensor([6]),                         # See Account State above
}

Liquidation (Futures)

When leverage > 1, positions are liquidated if margin is insufficient. E.g., with $10k cash at 10x leverage, a 20% loss exceeds equity and triggers liquidation.

Vectorized Version (Experimental)

VectorizedSequentialTradingEnv is a batched tensor implementation of SequentialTradingEnv that processes N environments in a single _step() call using pure tensor operations. It achieves 20-400x higher throughput compared to ParallelEnv by eliminating inter-process communication overhead.

Experimental

This environment is still experimental. While it passes extensive scalar-vectorized equivalence tests, it has not been battle-tested in production training runs. Use with caution and verify results against the standard SequentialTradingEnv.

from torchtrade.envs.offline import VectorizedSequentialTradingEnv, VectorizedSequentialTradingEnvConfig

config = VectorizedSequentialTradingEnvConfig(
    num_envs=64,
    time_frames=["1min", "5min", "15min", "1hour"],
    window_sizes=[12, 8, 8, 24],
    execute_on=(5, "Minute"),
    action_levels=[0.0, 0.5, 1.0],
    initial_cash=1000,
    transaction_fee=0.0025,
    leverage=1,  # or >1 for futures
)

env = VectorizedSequentialTradingEnv(df, config)

See the PPO Vectorized example for a complete training setup.

Vectorized SLTP Version (Experimental)

VectorizedSequentialTradingEnvSLTP extends the vectorized environment with bracket order risk management (stop-loss/take-profit). It provides the same 20-400x throughput improvement over ParallelEnv while supporting SL/TP bracket orders.

Experimental

This environment is still experimental. While it passes extensive scalar-vectorized equivalence tests against SequentialTradingEnvSLTP, it has not been battle-tested in production training runs.

from torchtrade.envs.offline import (
    VectorizedSequentialTradingEnvSLTP,
    VectorizedSequentialTradingEnvSLTPConfig,
)

config = VectorizedSequentialTradingEnvSLTPConfig(
    num_envs=64,
    stoploss_levels=[-0.02, -0.05],
    takeprofit_levels=[0.05, 0.10],
    time_frames=["1min", "5min", "15min", "1hour"],
    window_sizes=[12, 8, 8, 24],
    execute_on=(5, "Minute"),
    initial_cash=1000,
    transaction_fee=0.0025,
    leverage=1,  # or >1 for futures with short bracket orders

    # Position sizing (see Position Sizing section below)
    trade_mode="fractional",     # "fractional", "notional", or "quantity"
    position_fraction=0.1,       # 10% of portfolio per trade
)

env = VectorizedSequentialTradingEnvSLTP(df, config)

The action space matches SequentialTradingEnvSLTP: 1 + (num_sl × num_tp) actions for spot, 1 + 2 × (num_sl × num_tp) for futures.


SequentialTradingEnvSLTP

Extends SequentialTradingEnv with bracket order risk management. Supports both spot and futures modes via leverage.

Configuration

from torchtrade.envs.offline import SequentialTradingEnvSLTP, SequentialTradingEnvSLTPConfig

config = SequentialTradingEnvSLTPConfig(
    stoploss_levels=[-0.02, -0.05],
    takeprofit_levels=[0.05, 0.10],
    include_hold_action=True,

    # Futures parameters (leverage > 1 enables short bracket orders)
    leverage=5,
    margin_call_threshold=0.2,

    # Position sizing: "fractional" (% of portfolio), "notional" (fixed USD), or "quantity" (fixed units)
    trade_mode="fractional",     # "fractional", "notional", or "quantity"
    position_fraction=0.1,       # 10% of portfolio per trade (fractional mode)

    # Position locking (for OneStep policy evaluation parity)
    lock_position_until_sltp=False,  # If True, positions can only exit via SL/TP/liquidation

    time_frames=["1min", "5min", "15min"],
    window_sizes=[12, 8, 8],
    execute_on=(5, "Minute"),
    initial_cash=10000,
    transaction_fee=0.0004,
    slippage=0.001,
)

env = SequentialTradingEnvSLTP(df, config)

Action Space

With 2 SL levels, 2 TP levels, and leverage > 1 (futures):

  • Action 0: HOLD / Close position
  • Actions 1-4: LONG with SL/TP combinations
  • Actions 5-8: SHORT with SL/TP combinations

Formula: 1 + 2 × (num_sl × num_tp) = 9 actions. Without HOLD (include_hold_action=False): 2 × (num_sl × num_tp) = 8 actions.

Position Sizing

SLTP environments support three position sizing modes via trade_mode. This applies to all SLTP environments: SequentialTradingEnvSLTP, VectorizedSequentialTradingEnvSLTP, and OneStepTradingEnv.

Mode Config field Formula Use case
"fractional" position_fraction portfolio_value × fraction × leverage / price Training + adaptive sizing (default)
"notional" quantity_per_trade quantity_per_trade / price Fixed USD per trade
"quantity" quantity_per_trade quantity_per_trade directly Fixed base-asset units per trade
# Fractional: risk 10% of portfolio per bracket order (default mode)
config = SequentialTradingEnvSLTPConfig(
    trade_mode="fractional",     # default
    position_fraction=0.1,       # 10% of portfolio
    leverage=5,
    stoploss_levels=[-0.02, -0.05],
    takeprofit_levels=[0.03, 0.06],
    ...
)

# Notional: always trade $500 USD worth
config = SequentialTradingEnvSLTPConfig(
    trade_mode="notional",
    quantity_per_trade=500.0,    # $500 per trade
    ...
)

# Quantity: always trade exactly 0.05 BTC
config = SequentialTradingEnvSLTPConfig(
    trade_mode="quantity",
    quantity_per_trade=0.05,     # 0.05 BTC per trade
    ...
)

Train-Deploy Consistency

Live SLTP environments (Binance, Bybit, Bitget) support the same three modes with the same config fields. Train offline with trade_mode="fractional", then deploy live with identical settings for consistent behavior.

The default is trade_mode="fractional" with position_fraction=1.0 (all-in), which preserves backward compatibility with previous versions.

Position Locking

When lock_position_until_sltp=True, the agent's actions are ignored while a position is open. Positions can only exit via SL trigger, TP trigger, liquidation, or episode truncation — matching OneStepTradingEnv behavior.

config = SequentialTradingEnvSLTPConfig(
    lock_position_until_sltp=True,  # Ignore actions while in position
    ...
)

This is useful for evaluating policies trained on OneStepTradingEnv (where positions are inherently locked) on SequentialTradingEnvSLTP with matching conditions.


OneStepTradingEnv

One-step episodic environment for GRPO and contextual bandits. The agent takes a single action, and the environment simulates a rollout until SL/TP triggers or max rollout length. Supports spot and futures via leverage.

Deployment

Policies trained on OneStepTradingEnv can be deployed directly to SequentialTradingEnvSLTP — both share the same observation and action spaces.

Configuration

from torchtrade.envs.offline import OneStepTradingEnv, OneStepTradingEnvConfig

config = OneStepTradingEnvConfig(
    stoploss_levels=[-0.02, -0.05],
    takeprofit_levels=[0.05, 0.10],
    include_hold_action=True,
    rollout_steps=24,

    # Futures parameters (leverage > 1 enables short bracket orders)
    leverage=5,
    margin_call_threshold=0.2,

    # Position sizing (see Position Sizing section above)
    trade_mode="fractional",     # "fractional", "notional", or "quantity"
    position_fraction=0.1,       # 10% of portfolio per trade

    time_frames=["1min", "5min", "15min"],
    window_sizes=[12, 8, 8],
    execute_on=(5, "Minute"),
    initial_cash=10000,
    transaction_fee=0.0004,
)

env = OneStepTradingEnv(df, config)

Visualization

All offline environments support render_history() to visualize episode performance:

env.render_history()  # Display after running an episode
fig = env.render_history(return_fig=True)  # Or get the figure

All environments render 3 subplots: price + actions, portfolio vs buy-and-hold, and exposure history.

See Visualization Guide for details.


Next Steps