2,596 questions
0
votes
0
answers
72
views
Custom GRPO Trainer not Learning
I am new to reinforcement learning. So as an educational exercise, I am implementing the GRPO from scratch with pytorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
0
votes
1
answer
47
views
When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag
Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it.
As an example, let's ...
Advice
0
votes
1
replies
95
views
How to read a large Python project (for example, a project of Deep Learning or Reinforcement Learning)
I've downloaded many Python projects about Reinforcement Learning from Github, but each takes me too much time to read.
It's easy to comprehend a simple Python project with only a few *.py files, but ...
Advice
0
votes
0
replies
31
views
When using TensorDictPrioritizedReplayBuffer, should I apply the priority weight manually or not?
With Prioritized Experience Replay (PER), we use Beta parameter, so we can find weight that will be used to offset the bias introduced by PER. Now, with PyTorch's TensorDictPrioritizedReplayBuffer, I ...
Advice
1
vote
0
replies
34
views
How can I design “story-driven NPCs” in a reinforcement-learned environment? Looking for development directions and architectural advice
I’m working on a thesis about "story-driven NPCs in a reinforcement-learning world", and I’m building a small multi-agent RL environment as a prototype. However, I’m unsure how to push the ...
3
votes
0
answers
69
views
KeyError: 'advantages' in PPO MARL using Ray RLLib
I use ray 2.50.1 to implement a MARL model using PPO.
However, I meet the following problem:
'advantages'
KeyError: 'advantages'
During handling of the above exception, another exception occurred:
...
0
votes
0
answers
83
views
Trouble configuring R-group substitution in REINVENT 4 (AstraZeneca) — validation errors for RLConfig and ScorerConfig
I’m using AstraZeneca’s REINVENT 4 (v4.6.27) to generate SMILES from a scaffold via R-group substitution, optimizing for 5-HT2A / D2 / 5-HT1A (maximize) and minimizing H1 / M1 / α1A, with DockStream ...
1
vote
0
answers
79
views
Pytorch on Import torchrl.data: AttributeError: __provides__. Did you mean: '__providedBy__'?
I’m new at this and I’m trying to dabble in Pytorch and PytorchRL. However, as the topic states, before I can even load up the model, I get that AtrributeError message. This is the full error message ...
0
votes
1
answer
95
views
PermissionError: [Errno 13] Permission denied: 'Qwen3-0.6B-SFT'
I am getting the following error when running training, using the TRL library in the following HuggingFace space: vishaljoshi24/trl-4-dnd.
My SDK is Docker and as far as I'm aware there are not ...
0
votes
1
answer
177
views
Getting different results across different machines while training RL
While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...
1
vote
0
answers
46
views
Does Stable_Baselines3 store the seed rng while saving?
I was wondering if a model might provide different performance if we load it at different times, while running a stochastic program. Because depending on when the model is loaded, various functions (...
0
votes
0
answers
66
views
Agent instantly teleports or moves too fast to capture the flag
I'm creating a Capture-the-Flag style game in Unity using ML-Agents. The setup includes:
2 Agents (one per team)
Each team has a flag and a base
NavMesh is also added to the floor. and navmesh agents ...
1
vote
1
answer
54
views
How to resolve the type error in pomegranate?
I am trying to set up a dummy code for the pomegranate (below), but for some reason I am getting an error when I try to run the ConditionalCategorical(). How do I resolve it?
from pomegranate....
1
vote
0
answers
51
views
Actor loss can't decrease in TD3-HER
I try to train a TD3-HER based agent in Carla, and the training environment is Endless-v0, but the loss curves of actor and critic in training look like this
The curves
and the videos of agents show ...
1
vote
0
answers
42
views
Difference between tokens generated on a configuration in two different contexts
I have a model that given a configuration, or state (of a Rubik's cube, but whatever, it is a sequence of integers) generates a movement (from 0 to 5). This movement can be used to bring the ...
1
vote
1
answer
142
views
How can I properly add seed/options to a dmc2gym environment with Gymnasium? [closed]
import gymnasium as gym
import dmc2gym
gymenv = gym.make("CartPole-v0")
gymenv.reset(seed=42, options=None) # It won't go wrong, no problem
dmcenv = dmc2gym.make(domain_name="quadruped&...
1
vote
1
answer
136
views
Error Raised with SAC for Centralized Training, Decentralized Execution in Ray RLlib
I'm using a slight variant of the RockPaperScissors multi-agent environment from the Ray RLlib documentation as a test environment to verify that a custom RLModule for Centralized Training, ...
1
vote
1
answer
92
views
DQN fails to learn good policy for Atari Pong
I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment.
I've tested my Deep Q-Network on a simple test environment, where ...
0
votes
1
answer
96
views
How TorchRL deals with multiple trajectories in the same batch?
I am trying to understand how the DQN algorithm with RNNs works in PyTorch's RL API through this tutorial. However, the way some of the classes handle episodes and batches during training are unclear ...
1
vote
0
answers
67
views
My RandomSampler() is always generating the same parameters
I used TPESampler and set it as follows while optimizing with optuna: sampler=optuna.samplers.TPESampler(multivariate=True, n_startup_trials=10, seed=None). But in the 10 startup_trials process, it ...
0
votes
0
answers
85
views
RL Trading Agent Can't Learn Sensible Behavior Even on a Simple Sine Wave — What Am I Doing Wrong?
I’ve been building a reinforcement learning trading agent using a synthetic sine wave as the price series — basically the simplest dataset I could imagine to test whether an agent can learn to buy low ...
0
votes
0
answers
52
views
Unable to reproduce training results in a dummy vector using stablebaseline3
I created a custom Gymnasium environment and trained an agent using Stable-Baselines3 with DummyVecEnv and VecNormalize. The agent performs well during training and consistently reaches the goal. ...
0
votes
1
answer
82
views
When using TensorDictPrioritizedReplayBuffer, should i store "td_error" field in TensorDict data?
Let's say you are gonna train DDPG or any algorithm that use Prioritized Replay Buffer. When using torchrl TensorDictPrioritizedReplayBuffer, after you calculate td_error, you gonna use it to call ...
0
votes
0
answers
150
views
RecurrentPPO model from sb3-contrib always gives me policy gradient loss and explained variance close to 0
Im working on trainning an RPPO agent to handle a temperature control system. Here's some snippet of the code.
class TempControlSeqEnv(gym.Env):
def __init__(self, curriculum_phase, time_steps=5):
...
0
votes
0
answers
49
views
Trouble training DQN agent in CarRacing environment — poor learning progress
I'm trying to train a DQN agent on my CarRacing environment, but I'm struggling to make the agent learn anything meaningful — the total episode reward stays very low and doesn't improve over time.
def ...
0
votes
0
answers
137
views
Why does pybullet physics sim behave non-realistic in simple experiment?
I want to know if pybullet is accurate enough to use it to simulate physics environments to train machine learning models. I want to create a line following robot that follows a line based on what it ...
0
votes
0
answers
54
views
Soft Actor Critic -- actor forward pass outputs NaN immediately
I have the following code for my Actor network in my Soft Actor Critic. However, once the batch is full and it starts to do back prop, (batch size: 128), after about 5 iterations of updating the ...
0
votes
0
answers
31
views
Why can't I implement ENERGYM?
I am trying to implement the ENERGYM library in Python as per the following paper: https://www.mdpi.com/2076-3417/11/8/3518 and https://bsl546.github.io/energym-pages/sources/install_min.html
After ...
0
votes
0
answers
43
views
Multiple environment resets per iteration using Ray RLlib DQN (D3QN) with a simple custom Gym environment
Hello,
I’m experimenting with Ray RLlib’s DQN (Dueling Double DQN) on a minimal custom environment, but I keep seeing many resets in a single training iteration, even though each episode completes ...
0
votes
0
answers
20
views
wand Sweep Agent Not Injecting ${args} into Command (Hyperparameters Never Passed)
I’m trying to run a W&B hyperparameter sweep against my Python training script, but none of the sweep’s hyperparameter flags ever get passed to the script. Instead, the agent simply runs my base ...
0
votes
0
answers
27
views
How could I get the images in custom policy in metadrive?
I am trying to custom a policy using metadrive frame work.To make a decision, I want to get the rgb_camera information in vehicle. I've already check the official files and I just know I can get frame ...
0
votes
0
answers
53
views
Reinforcement learning Atari video recording crashes
I am learning Reinforcement Learning and picked the Atari Breakout environment to learn. I have trained the neural nets, and now am trying to create some videos to visualize the results.
I am using ...
2
votes
1
answer
193
views
I keep getting this error, cuda available 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu
I'm training a transformer model using RLlib's PPO algorithm, but I encounter a device mismatch error:
RuntimeError: Expected all tensors to be on the same device, but found
at least two devices, ...
0
votes
1
answer
59
views
Why does the CityLearn tutorial keep throwing me a ValueError?
I am following this CityLearn tutorial.
I got through the first part (RBC) without incident. However, when I implement the second part (Q-learning, literally copy and paste from the site), I keep ...
2
votes
1
answer
379
views
Reproducibility of JAX calculations
I am using JAX in running Reinforcement Learning (RL) & Multi-Agent Reinforcement Learning (MARL) calculations. I have noticed the following behaviour:
In RL, my results are always fully ...
1
vote
2
answers
311
views
Ray rllib episode_reward_mean not showing
Can anyone explain to me why the episode_reward_mean is NOT part of the results dictionary?
Is it replaced by a different key in the latest API?
I see env_runners/episode_return_mean and env_runners/...
1
vote
0
answers
30
views
Matlab Reinforcement Learning, Issue with obtaining gradient from Qvalue critic using dlfeval,dlgradient,dlarrays
I'm trying to implement a custom agent, and inside my agent I'm running into issues with obtaining the gradient of the Q value with respect to my actor network parameters. I have my code below, main ...
0
votes
0
answers
31
views
Error when training the model for sensor data using Kafka and RDL index 4096 is out of bounds for axis 0 with size 4096
This is my code using python, RDL Reinforcement Deep Learning and Kafka, both zookeeper and and kafka server are working with no issue, when I run my following code using jupyter:
import json
import ...
2
votes
1
answer
33
views
How can I import the internal `VectorSARTTrajectory` from ReinforcementLearningCore.jl?
I'm trying to use the VectorSARTTrajectory type from the ReinforcementLearningCore.jl package, because it is also mentioned in the Introduction to RL.jl I found the implementation of ...
1
vote
0
answers
68
views
In Gymnasium, how can one run a vector environment's function in parallel similar to how step() can be run in parallel?
I have a custom Gymnasium environment, RLToy-v0 from the library MDP Playground. It separates out the transition function and reward function from the step function and calls them individually inside ...
0
votes
0
answers
67
views
Performance of my MountainCar gym environment does not improve at all
I am new to reinforcement learning and I am trying to solve different environments in gymnasium library. But no matter what I try, the reward for Mountain Car env turns out to be -200 both during ...
0
votes
0
answers
50
views
DQN agent: loss decreases, cumul. reward stagnates, q-values are very similar over all actions and get higher and higher
My loss is decreasing, but my agent isnt learning in a very easy environment... The cumulative reward stagnates and when printing the q-values they are identical or at least very similar over all ...
0
votes
0
answers
49
views
I created a very simple RL model for trading stocks but its output is the same regardless of input
I've been trying to build a basic Neural Net that analyzes a stock's price and chooses whether to buy, sell, or hold.
# Importing Libraries
import torch
import torch.nn as nn
import torch.optim as ...
0
votes
0
answers
53
views
Cartpole with Q-Learning not Learning Anything
So I was trying to solve the cartpole problem. This is a common problem when dealing with reinforcement learning. Essentially, you have a cart that is balancing a pole. The cart can move left or right....
1
vote
0
answers
92
views
Calculate correlation on dict type variables
I have a dataframe named hyperparam_df which looks like the following:
repo_name file_name \
0 DeepCoMP deepcomp/util/simulation.py ...
0
votes
0
answers
40
views
IndexError: too many indices for array in reinforcement learning model using Pennylane and PyTorch
I am working on a quantum reinforcement learning model using PennyLane, PyTorch, and a stock trading environment from finrl library.When I run my training function, I get following error. Any help is ...
1
vote
1
answer
417
views
How to use jax.vmap with a tuple of flax TrainStates as input?
I am setting up a Deep MARL framework and I need to assess my actor policies. Ideally, this would entail using jax.vmap over a tuple of actor flax TrainStates. I have tried the following:
import jax
...
0
votes
0
answers
97
views
Using Pytorch Sequential for reinforcement model without gym
I'm trying to play a game with AI, but I want to do it in real time. Because of that, I'm not using gym to create an environment.
I want to take a screenshot, preprocess it, then pass it through the ...
0
votes
1
answer
209
views
EvalCallback hangs in stable-baselines3
I'm trying to train an A2C model in stable-baselines3 and the EvalCallback appears to freeze when it is called. I cannot figure out why. Below you will find a script that recreates this problem. ...
-1
votes
1
answer
69
views
Actor-Critic behaved strange on cliff walking [closed]
0. Backgrounds
I was trying to train a AC-based agent for a task with large observation space. The task is similar to a huge cliff walk task. The agent starts at some random point on a 20 * 20 * 5 * 9 ...