Offline RL¶

TorchTrade supports offline reinforcement learning, enabling agents to learn from pre-collected datasets without requiring live environment interaction during training.

Overview¶

TorchTrade provides TensorDict-based datasets that can be loaded and used directly with TorchRL's replay buffer. These datasets are available for download from HuggingFace/Torch-Trade and contain pre-collected trading trajectories for offline RL research.

Offline RL can be performed using datasets collected from two sources:

Offline Environment Interactions - Collect trajectories by running policies in backtesting environments (SequentialTradingEnv, SequentialTradingEnvSLTP, etc.)
Real Online Environment Interactions - Record actual trading data from live exchanges (Alpaca, Binance, Bitget)

This approach is particularly valuable for: - Learning from expert demonstrations or historical trading data - Training without market risk or transaction costs - Developing policies when live interaction is expensive or dangerous - Bootstrapping learning before deploying to real markets

Example: IQL (Implicit Q-Learning)¶

TorchTrade provides an example implementation of offline RL using Implicit Q-Learning (IQL) in examples/offline/iql/.

# Example: Training IQL on pre-collected dataset
from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig
from torchrl.collectors import SyncDataCollector
from torchrl.data import LazyTensorStorage, TensorDictReplayBuffer

# 1. Create environment (for evaluation only)
env = SequentialTradingEnv(df, config)

# 2. Load pre-collected dataset
# Dataset should contain trajectories: (observation, action, reward, next_observation, done)
replay_buffer = TensorDictReplayBuffer(
    storage=LazyTensorStorage(max_size=1_000_000),
)

# 3. Train IQL from offline data
for batch in replay_buffer:
    loss = iql_loss_module(batch)
    loss.backward()
    optimizer.step()

For a complete implementation, see examples/offline/iql/.

Dataset Collection¶

From Offline Environments¶

from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig
from torchrl.collectors import SyncDataCollector
from torchrl.data import LazyTensorStorage, TensorDictReplayBuffer

# Collect trajectories with any policy (random, rule-based, pre-trained)
collector = SyncDataCollector(
    env,
    policy,
    frames_per_batch=10000,
    total_frames=1_000_000,
)

# Store in replay buffer
replay_buffer = TensorDictReplayBuffer(
    storage=LazyTensorStorage(max_size=1_000_000),
)

for batch in collector:
    replay_buffer.extend(batch)

From Real Online Environments¶

from torchtrade.envs.alpaca import AlpacaTorchTradingEnv, AlpacaTradingEnvConfig

# Collect real trading data (paper trading recommended)
env = AlpacaTorchTradingEnv(config)

# Record interactions
for episode in range(num_episodes):
    td = env.reset()
    while not td["done"].item():
        action = policy(td)
        td = env.step(td)
        replay_buffer.add(td)

Provided Datasets¶

Coming Soon

We plan to provide pre-collected datasets on HuggingFace for offline RL research, including:

Expert demonstrations from rule-based strategies
Random policy trajectories for benchmarking
Real market interaction data (paper trading)

Stay tuned at HuggingFace/Torch-Trade!

Additional Offline RL Algorithms¶

TorchTrade's offline RL support is compatible with any offline RL algorithm from TorchRL, including:

CQL (Conservative Q-Learning) - Addresses overestimation in offline Q-learning
TD3+BC - Combines TD3 with behavior cloning for offline learning
Decision Transformer - Sequence modeling approach to offline RL
Any TorchRL algorithm - Use replay buffers with offline data

Next Steps¶

IQL Example - Complete offline RL implementation
Offline Environments - Environments for dataset collection
Online Environments - Live trading for data collection
Examples - Browse all training examples

References¶

IQL Paper - Implicit Q-Learning algorithm
TorchRL Replay Buffers - Data storage and sampling
Offline RL Guide - Comprehensive offline RL guide