Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
72 views

I am new to reinforcement learning. So as an educational exercise, I am implementing the GRPO from scratch with pytorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
csnate's user avatar
  • 1,661
0 votes
1 answer
47 views

Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it. As an example, let's ...
Bejo's user avatar
  • 13
Advice
0 votes
1 replies
95 views

I've downloaded many Python projects about Reinforcement Learning from Github, but each takes me too much time to read. It's easy to comprehend a simple Python project with only a few *.py files, but ...
Xingrui Zhuang's user avatar
Advice
0 votes
0 replies
31 views

With Prioritized Experience Replay (PER), we use Beta parameter, so we can find weight that will be used to offset the bias introduced by PER. Now, with PyTorch's TensorDictPrioritizedReplayBuffer, I ...
Bejo's user avatar
  • 13
Advice
1 vote
0 replies
34 views

I’m working on a thesis about "story-driven NPCs in a reinforcement-learning world", and I’m building a small multi-agent RL environment as a prototype. However, I’m unsure how to push the ...
DucTruong's user avatar
3 votes
0 answers
69 views

I use ray 2.50.1 to implement a MARL model using PPO. However, I meet the following problem: 'advantages' KeyError: 'advantages' During handling of the above exception, another exception occurred: ...
geniusadven's user avatar
0 votes
0 answers
83 views

I’m using AstraZeneca’s REINVENT 4 (v4.6.27) to generate SMILES from a scaffold via R-group substitution, optimizing for 5-HT2A / D2 / 5-HT1A (maximize) and minimizing H1 / M1 / α1A, with DockStream ...
Reuben Udohaya's user avatar
1 vote
0 answers
79 views

I’m new at this and I’m trying to dabble in Pytorch and PytorchRL. However, as the topic states, before I can even load up the model, I get that AtrributeError message. This is the full error message ...
Steve Brother's user avatar
0 votes
1 answer
95 views

I am getting the following error when running training, using the TRL library in the following HuggingFace space: vishaljoshi24/trl-4-dnd. My SDK is Docker and as far as I'm aware there are not ...
Vishal Joshi's user avatar
0 votes
1 answer
177 views

While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...
desert_ranger's user avatar
1 vote
0 answers
46 views

I was wondering if a model might provide different performance if we load it at different times, while running a stochastic program. Because depending on when the model is loaded, various functions (...
desert_ranger's user avatar
0 votes
0 answers
66 views

I'm creating a Capture-the-Flag style game in Unity using ML-Agents. The setup includes: 2 Agents (one per team) Each team has a flag and a base NavMesh is also added to the floor. and navmesh agents ...
Avi Garg's user avatar
1 vote
1 answer
54 views

I am trying to set up a dummy code for the pomegranate (below), but for some reason I am getting an error when I try to run the ConditionalCategorical(). How do I resolve it? from pomegranate....
Isaac A's user avatar
  • 589
1 vote
0 answers
51 views

I try to train a TD3-HER based agent in Carla, and the training environment is Endless-v0, but the loss curves of actor and critic in training look like this The curves and the videos of agents show ...
Jiashu Li's user avatar
1 vote
0 answers
42 views

I have a model that given a configuration, or state (of a Rubik's cube, but whatever, it is a sequence of integers) generates a movement (from 0 to 5). This movement can be used to bring the ...
Nikio's user avatar
  • 111
1 vote
1 answer
142 views

import gymnasium as gym import dmc2gym gymenv = gym.make("CartPole-v0") gymenv.reset(seed=42, options=None) # It won't go wrong, no problem dmcenv = dmc2gym.make(domain_name="quadruped&...
Xingrui Zhuang's user avatar
1 vote
1 answer
136 views

I'm using a slight variant of the RockPaperScissors multi-agent environment from the Ray RLlib documentation as a test environment to verify that a custom RLModule for Centralized Training, ...
Nelson Salazar's user avatar
1 vote
1 answer
92 views

I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment. I've tested my Deep Q-Network on a simple test environment, where ...
Rohan Patel's user avatar
0 votes
1 answer
96 views

I am trying to understand how the DQN algorithm with RNNs works in PyTorch's RL API through this tutorial. However, the way some of the classes handle episodes and batches during training are unclear ...
Ícaro Lorran's user avatar
1 vote
0 answers
67 views

I used TPESampler and set it as follows while optimizing with optuna: sampler=optuna.samplers.TPESampler(multivariate=True, n_startup_trials=10, seed=None). But in the 10 startup_trials process, it ...
YYYC's user avatar
  • 11
0 votes
0 answers
85 views

I’ve been building a reinforcement learning trading agent using a synthetic sine wave as the price series — basically the simplest dataset I could imagine to test whether an agent can learn to buy low ...
Oleg Bizin's user avatar
0 votes
0 answers
52 views

I created a custom Gymnasium environment and trained an agent using Stable-Baselines3 with DummyVecEnv and VecNormalize. The agent performs well during training and consistently reaches the goal. ...
Amir Hosein Nourian's user avatar
0 votes
1 answer
82 views

Let's say you are gonna train DDPG or any algorithm that use Prioritized Replay Buffer. When using torchrl TensorDictPrioritizedReplayBuffer, after you calculate td_error, you gonna use it to call ...
Bejo's user avatar
  • 13
0 votes
0 answers
150 views

Im working on trainning an RPPO agent to handle a temperature control system. Here's some snippet of the code. class TempControlSeqEnv(gym.Env): def __init__(self, curriculum_phase, time_steps=5): ...
Anthony's user avatar
0 votes
0 answers
49 views

I'm trying to train a DQN agent on my CarRacing environment, but I'm struggling to make the agent learn anything meaningful — the total episode reward stays very low and doesn't improve over time. def ...
papierowka's user avatar
0 votes
0 answers
137 views

I want to know if pybullet is accurate enough to use it to simulate physics environments to train machine learning models. I want to create a line following robot that follows a line based on what it ...
alienare 4422's user avatar
0 votes
0 answers
54 views

I have the following code for my Actor network in my Soft Actor Critic. However, once the batch is full and it starts to do back prop, (batch size: 128), after about 5 iterations of updating the ...
Zubin Oommen's user avatar
0 votes
0 answers
31 views

I am trying to implement the ENERGYM library in Python as per the following paper: https://www.mdpi.com/2076-3417/11/8/3518 and https://bsl546.github.io/energym-pages/sources/install_min.html After ...
Matthew Fleishman's user avatar
0 votes
0 answers
43 views

Hello, I’m experimenting with Ray RLlib’s DQN (Dueling Double DQN) on a minimal custom environment, but I keep seeing many resets in a single training iteration, even though each episode completes ...
طه الشريف's user avatar
0 votes
0 answers
20 views

I’m trying to run a W&B hyperparameter sweep against my Python training script, but none of the sweep’s hyperparameter flags ever get passed to the script. Instead, the agent simply runs my base ...
Dalek's user avatar
  • 4,388
0 votes
0 answers
27 views

I am trying to custom a policy using metadrive frame work.To make a decision, I want to get the rgb_camera information in vehicle. I've already check the official files and I just know I can get frame ...
Jiangde Tu's user avatar
0 votes
0 answers
53 views

I am learning Reinforcement Learning and picked the Atari Breakout environment to learn. I have trained the neural nets, and now am trying to create some videos to visualize the results. I am using ...
Leo's user avatar
  • 85
2 votes
1 answer
193 views

I'm training a transformer model using RLlib's PPO algorithm, but I encounter a device mismatch error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, ...
Thanasis Mpoulionis's user avatar
0 votes
1 answer
59 views

I am following this CityLearn tutorial. I got through the first part (RBC) without incident. However, when I implement the second part (Q-learning, literally copy and paste from the site), I keep ...
Matthew Fleishman's user avatar
2 votes
1 answer
379 views

I am using JAX in running Reinforcement Learning (RL) & Multi-Agent Reinforcement Learning (MARL) calculations. I have noticed the following behaviour: In RL, my results are always fully ...
amavrits's user avatar
1 vote
2 answers
311 views

Can anyone explain to me why the episode_reward_mean is NOT part of the results dictionary? Is it replaced by a different key in the latest API? I see env_runners/episode_return_mean and env_runners/...
aaden's user avatar
  • 23
1 vote
0 answers
30 views

I'm trying to implement a custom agent, and inside my agent I'm running into issues with obtaining the gradient of the Q value with respect to my actor network parameters. I have my code below, main ...
Sliferslacker's user avatar
0 votes
0 answers
31 views

This is my code using python, RDL Reinforcement Deep Learning and Kafka, both zookeeper and and kafka server are working with no issue, when I run my following code using jupyter: import json import ...
Developer's user avatar
2 votes
1 answer
33 views

I'm trying to use the VectorSARTTrajectory type from the ReinforcementLearningCore.jl package, because it is also mentioned in the Introduction to RL.jl I found the implementation of ...
Paul Weis's user avatar
1 vote
0 answers
68 views

I have a custom Gymnasium environment, RLToy-v0 from the library MDP Playground. It separates out the transition function and reward function from the step function and calls them individually inside ...
Warm_Duscher's user avatar
  • 1,474
0 votes
0 answers
67 views

I am new to reinforcement learning and I am trying to solve different environments in gymnasium library. But no matter what I try, the reward for Mountain Car env turns out to be -200 both during ...
Lakshmanan M's user avatar
0 votes
0 answers
50 views

My loss is decreasing, but my agent isnt learning in a very easy environment... The cumulative reward stagnates and when printing the q-values they are identical or at least very similar over all ...
Sum's user avatar
  • 11
0 votes
0 answers
49 views

I've been trying to build a basic Neural Net that analyzes a stock's price and chooses whether to buy, sell, or hold. # Importing Libraries import torch import torch.nn as nn import torch.optim as ...
Agastya P's user avatar
0 votes
0 answers
53 views

So I was trying to solve the cartpole problem. This is a common problem when dealing with reinforcement learning. Essentially, you have a cart that is balancing a pole. The cart can move left or right....
artemisFowl47's user avatar
1 vote
0 answers
92 views

I have a dataframe named hyperparam_df which looks like the following: repo_name file_name \ 0 DeepCoMP deepcomp/util/simulation.py ...
Brie MerryWeather's user avatar
0 votes
0 answers
40 views

I am working on a quantum reinforcement learning model using PennyLane, PyTorch, and a stock trading environment from finrl library.When I run my training function, I get following error. Any help is ...
user1566490's user avatar
1 vote
1 answer
417 views

I am setting up a Deep MARL framework and I need to assess my actor policies. Ideally, this would entail using jax.vmap over a tuple of actor flax TrainStates. I have tried the following: import jax ...
amavrits's user avatar
0 votes
0 answers
97 views

I'm trying to play a game with AI, but I want to do it in real time. Because of that, I'm not using gym to create an environment. I want to take a screenshot, preprocess it, then pass it through the ...
Controller816's user avatar
0 votes
1 answer
209 views

I'm trying to train an A2C model in stable-baselines3 and the EvalCallback appears to freeze when it is called. I cannot figure out why. Below you will find a script that recreates this problem. ...
Finncent Price's user avatar
-1 votes
1 answer
69 views

0. Backgrounds I was trying to train a AC-based agent for a task with large observation space. The task is similar to a huge cliff walk task. The agent starts at some random point on a 20 * 20 * 5 * 9 ...
Eric Monlye's user avatar

1
2 3 4 5
52