Skip to content

Offline RL

TorchTrade supports offline reinforcement learning, enabling agents to learn from pre-collected datasets without requiring live environment interaction during training.

Overview

TorchTrade provides TensorDict-based datasets that can be loaded and used directly with TorchRL's replay buffer. These datasets are available for download from HuggingFace/Torch-Trade and contain pre-collected trading trajectories for offline RL research.

Offline RL can be performed using datasets collected from two sources:

  1. Offline Environment Interactions - Collect trajectories by running policies in backtesting environments (SequentialTradingEnv, SequentialTradingEnvSLTP, etc.)
  2. Real Online Environment Interactions - Record actual trading data from live exchanges (Alpaca, Binance, Bitget)

This approach is particularly valuable for: - Learning from expert demonstrations or historical trading data - Training without market risk or transaction costs - Developing policies when live interaction is expensive or dangerous - Bootstrapping learning before deploying to real markets

Example: IQL (Implicit Q-Learning)

TorchTrade provides an example implementation of offline RL using Implicit Q-Learning (IQL) in examples/offline/iql/.

# Example: Training IQL on pre-collected dataset
from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig
from torchrl.collectors import SyncDataCollector
from torchrl.data import LazyTensorStorage, TensorDictReplayBuffer

# 1. Create environment (for evaluation only)
env = SequentialTradingEnv(df, config)

# 2. Load pre-collected dataset
# Dataset should contain trajectories: (observation, action, reward, next_observation, done)
replay_buffer = TensorDictReplayBuffer(
    storage=LazyTensorStorage(max_size=1_000_000),
)

# 3. Train IQL from offline data
for batch in replay_buffer:
    loss = iql_loss_module(batch)
    loss.backward()
    optimizer.step()

For a complete implementation, see examples/offline/iql/.

Dataset Collection

From Offline Environments

from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig
from torchrl.collectors import SyncDataCollector
from torchrl.data import LazyTensorStorage, TensorDictReplayBuffer

# Collect trajectories with any policy (random, rule-based, pre-trained)
collector = SyncDataCollector(
    env,
    policy,
    frames_per_batch=10000,
    total_frames=1_000_000,
)

# Store in replay buffer
replay_buffer = TensorDictReplayBuffer(
    storage=LazyTensorStorage(max_size=1_000_000),
)

for batch in collector:
    replay_buffer.extend(batch)

From Real Online Environments

from torchtrade.envs.alpaca import AlpacaTorchTradingEnv, AlpacaTradingEnvConfig

# Collect real trading data (paper trading recommended)
env = AlpacaTorchTradingEnv(config)

# Record interactions
for episode in range(num_episodes):
    td = env.reset()
    while not td["done"].item():
        action = policy(td)
        td = env.step(td)
        replay_buffer.add(td)

Provided Datasets

Coming Soon

We plan to provide pre-collected datasets on HuggingFace for offline RL research, including:

  • Expert demonstrations from rule-based strategies
  • Random policy trajectories for benchmarking
  • Real market interaction data (paper trading)

Stay tuned at HuggingFace/Torch-Trade!

Additional Offline RL Algorithms

TorchTrade's offline RL support is compatible with any offline RL algorithm from TorchRL, including:

Next Steps

References