Feature Engineering¶
TorchTrade allows you to add custom technical indicators and features to your market observations. This guide shows you how to preprocess your OHLCV data with custom features before it's fed to your policy.
How It Works¶
The feature_preprocessing_fn parameter in environment configs transforms raw OHLCV data into custom features. This function is called on each resampled timeframe during environment initialization.
IMPORTANT: All feature columns must start with features_ prefix (e.g., features_close, features_rsi_14). Only columns with this prefix will be included in the observation space.
Timeframe Format Matters
When specifying time_frames, use canonical forms to avoid confusion:
- ✅ Use:
"1hour","2hours","1day" - ❌ Avoid:
"60min","120min","24hour","1440min"
Why? Different formats create different observation keys:
time_frames=["60min"]→ observation key:"market_data_60Minute"time_frames=["1hour"]→ observation key:"market_data_1Hour"
These are treated as DIFFERENT timeframes. Models trained with one format won't work with the other. The framework will issue a warning if you use non-canonical forms like "60min" to guide you toward cleaner observation keys.
Basic Usage¶
Example 1: Adding Technical Indicators¶
import pandas as pd
import ta # Technical Analysis library
from torchtrade.envs.offline import SequentialTradingEnv, SequentialTradingEnvConfig
def custom_preprocessing(df: pd.DataFrame) -> pd.DataFrame:
"""
Add technical indicators as features.
IMPORTANT: All feature columns must start with 'features_' prefix.
"""
# Basic OHLCV features (always include these)
df["features_open"] = df["open"]
df["features_high"] = df["high"]
df["features_low"] = df["low"]
df["features_close"] = df["close"]
df["features_volume"] = df["volume"]
# RSI (Relative Strength Index)
df["features_rsi_14"] = ta.momentum.RSIIndicator(
df["close"], window=14
).rsi()
# MACD (Moving Average Convergence Divergence)
macd = ta.trend.MACD(df["close"])
df["features_macd"] = macd.macd()
df["features_macd_signal"] = macd.macd_signal()
df["features_macd_histogram"] = macd.macd_diff()
# Bollinger Bands
bollinger = ta.volatility.BollingerBands(df["close"], window=20, window_dev=2)
df["features_bb_high"] = bollinger.bollinger_hband()
df["features_bb_mid"] = bollinger.bollinger_mavg()
df["features_bb_low"] = bollinger.bollinger_lband()
# Fill NaN values (important!)
df.fillna(0, inplace=True)
return df
# Use in environment config
config = SequentialTradingEnvConfig(
feature_preprocessing_fn=custom_preprocessing,
time_frames=["1min", "5min", "15min"], # Note: use "1hour" not "60min"
window_sizes=[12, 8, 8],
execute_on=(5, "Minute"),
initial_cash=1000
)
env = SequentialTradingEnv(df, config)
Example 2: Normalized Features (Recommended)¶
Feature normalization is critical for stable RL training. The recommended approach is to normalize features during preprocessing using sklearn's StandardScaler, which avoids device-related issues with TorchRL's VecNorm transforms.
import pandas as pd
from sklearn.preprocessing import StandardScaler
def normalized_preprocessing(df: pd.DataFrame) -> pd.DataFrame:
"""
Normalize features using StandardScaler for stable training.
This approach is preferred over VecNormV2/ObservationNorm transforms
which can have device compatibility issues on GPU.
"""
# Basic OHLCV
df["features_open"] = df["open"]
df["features_high"] = df["high"]
df["features_low"] = df["low"]
df["features_close"] = df["close"]
df["features_volume"] = df["volume"]
# Price changes (returns)
df["features_return"] = df["close"].pct_change()
# Normalize features
scaler = StandardScaler()
feature_cols = [col for col in df.columns if col.startswith("features_")]
df[feature_cols] = scaler.fit_transform(df[feature_cols])
# Fill NaN values
df.fillna(0, inplace=True)
return df
config = SequentialTradingEnvConfig(
feature_preprocessing_fn=normalized_preprocessing,
...
)
Alternative approaches: - TorchRL transforms - VecNormV2 and ObservationNorm are available but may have device compatibility issues - Network level - Use BatchNorm, LayerNorm, or other normalization layers in your policy network
Advanced Normalization
StandardScaler uses fixed statistics from training data. For data with regime changes, consider rolling window normalization or per-regime scalers. For most use cases, StandardScaler is sufficient.
Important Rules¶
- Feature prefix: All columns MUST start with
features_(e.g.,features_rsi_14). Columns without this prefix are ignored. - Handle NaN: Technical indicators produce NaN at the start. Always call
df.fillna(0, inplace=True)(orffill/bfill). - Return the DataFrame: Your function must
return df. - No lookahead bias: Only use past data. Never use
.shift(-1)or future values.
Common Technical Indicators¶
Quick Reference Table¶
| Category | Indicator | ta Library Code | Use Case |
|---|---|---|---|
| Momentum | RSI | ta.momentum.RSIIndicator(close, window=14).rsi() |
Overbought/oversold detection |
| Stochastic | ta.momentum.StochasticOscillator(high, low, close).stoch() |
Momentum confirmation | |
| Williams %R | ta.momentum.WilliamsRIndicator(high, low, close, lbp=14).williams_r() |
Short-term overbought/oversold | |
| Trend | SMA | ta.trend.SMAIndicator(close, window=20).sma_indicator() |
Trend direction |
| EMA | ta.trend.EMAIndicator(close, window=20).ema_indicator() |
Responsive trend following | |
| MACD | ta.trend.MACD(close).macd() |
Trend changes | |
| ADX | ta.trend.ADXIndicator(high, low, close, window=14).adx() |
Trend strength | |
| Volatility | Bollinger Bands | ta.volatility.BollingerBands(close, window=20) |
Volatility and price bounds |
| ATR | ta.volatility.AverageTrueRange(high, low, close, window=14).average_true_range() |
Volatility measurement | |
| Keltner | ta.volatility.KeltnerChannel(high, low, close) |
Alternative to Bollinger | |
| Volume | OBV | ta.volume.OnBalanceVolumeIndicator(close, volume).on_balance_volume() |
Accumulation/distribution |
| VPT | ta.volume.VolumePriceTrendIndicator(close, volume).volume_price_trend() |
Volume-price confirmation | |
| ADI | ta.volume.AccDistIndexIndicator(high, low, close, volume).acc_dist_index() |
Money flow |
Usage Pattern¶
import ta
def add_indicators(df: pd.DataFrame) -> pd.DataFrame:
# Basic OHLCV
df["features_close"] = df["close"]
df["features_volume"] = df["volume"]
# ... other OHLCV ...
# Pick indicators from table above
df["features_rsi_14"] = ta.momentum.RSIIndicator(df["close"], window=14).rsi()
df["features_sma_20"] = ta.trend.SMAIndicator(df["close"], window=20).sma_indicator()
df["features_atr"] = ta.volatility.AverageTrueRange(
df["high"], df["low"], df["close"], window=14
).average_true_range()
df.fillna(0, inplace=True)
return df
Performance Tips¶
- Vectorize: Use pandas operations (
df["close"].pct_change()) instead of loops — 100x faster. - Check NaN: Add
df.isna().sum()during development to catch indicator issues beforefillna.
Recommended Libraries¶
| Library | Indicators | Installation | Best For |
|---|---|---|---|
| ta | 40+ | pip install ta |
Standard indicators, easy API |
| pandas-ta | 130+ | pip install pandas-ta |
Comprehensive collection |
| TA-Lib | 150+ | pip install TA-Lib |
Performance, industry standard |
| sklearn | N/A | pip install scikit-learn |
Feature scaling, normalization |
Recommendation: Start with ta for simplicity, use TA-Lib if you need maximum performance.
Next Steps¶
- Reward Functions - Design reward signals that work with your features
- Understanding the Sampler - How multi-timeframe sampling works
- Transforms - Alternative feature engineering with Chronos embeddings
- Offline Environments - Apply custom features to environments