Loss Functions¶
TorchTrade provides specialized loss functions for training RL trading agents, built on TorchRL's LossModule interface.
Available Loss Functions¶
| Loss Function | Type | Use Case |
|---|---|---|
| GRPOLoss | Policy Gradient | One-step RL with SLTP environments |
| CTRLLoss | Representation Learning | Self-supervised encoder training |
| CTRLPPOLoss | Combined | Joint policy + representation learning |
For standard multi-step RL (PPO, DQN, SAC, IQL), use TorchRL's built-in loss modules directly.
GRPOLoss¶
Group Relative Policy Optimization for one-step RL. Designed for OneStepTradingEnv where episodes are single decisions with SL/TP bracket orders. Normalizes advantages within each batch: advantage = (reward - mean) / std.
| Parameter | Default | Description |
|---|---|---|
actor_network |
Required | Policy network (ProbabilisticTensorDictSequential) |
entropy_coeff |
0.01 | Entropy regularization coefficient |
epsilon_low / epsilon_high |
0.2 | Clipping bounds for policy ratio |
from torchtrade.losses import GRPOLoss
loss_module = GRPOLoss(actor_network=actor, entropy_coeff=0.01)
for batch in collector:
loss_td = loss_module(batch)
loss = loss_td["loss_objective"] + loss_td["loss_entropy"]
loss.backward()
optimizer.step()
Paper: DeepSeekMath (arXiv:2402.03300) — Section 2.2
CTRLLoss¶
Cross-Trajectory Representation Learning for self-supervised encoder training. Trains encoders to recognize behavioral similarity across trajectories without rewards, improving zero-shot generalization.
| Parameter | Default | Description |
|---|---|---|
encoder_network |
Required | Encoder that produces embeddings |
embedding_dim |
Required | Dimension of encoder output |
num_prototypes |
512 | Learnable prototype vectors |
sinkhorn_iters |
3 | Sinkhorn-Knopp iterations |
temperature |
0.1 | Softmax temperature |
myow_coeff |
1.0 | MYOW loss coefficient |
from torchtrade.losses import CTRLLoss
ctrl_loss = CTRLLoss(
encoder_network=encoder,
embedding_dim=128,
num_prototypes=512,
)
for batch in collector:
loss_td = ctrl_loss(batch)
loss_td["loss_ctrl"].backward()
optimizer.step()
Paper: Cross-Trajectory Representation Learning (arXiv:2106.02193)
CTRLPPOLoss¶
Combines ClipPPOLoss with CTRLLoss for joint policy and encoder training. The encoder learns useful representations while the policy learns to act.
from torchtrade.losses import CTRLLoss, CTRLPPOLoss
from torchrl.objectives import ClipPPOLoss
combined_loss = CTRLPPOLoss(
ppo_loss=ClipPPOLoss(actor, critic),
ctrl_loss=CTRLLoss(encoder, embedding_dim=128),
ctrl_coeff=0.5,
)
for batch in collector:
loss_td = combined_loss(batch)
total_loss = loss_td["loss_objective"] + loss_td["loss_critic"] + loss_td["loss_ctrl"]
total_loss.backward()
optimizer.step()
See Also¶
- Examples - Training scripts using these losses
- Environments - Compatible environment types
- TorchRL Objectives - Built-in PPO, DQN, SAC, IQL losses