Add Diffusion Policy for Reinforcement Learning#9824
Add Diffusion Policy for Reinforcement Learning#9824sayakpaul merged 34 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
It seems like we're putting in the content of some repository within |
@sayakpaul Thank you for the feedback! I have made the changes. Now, it includes only an inference example of using diffusers for diffusion policy |
| return action.transpose(1, 2) # [batch_size, sequence_length, action_dim] | ||
|
|
||
| if __name__ == "__main__": | ||
| policy = DiffusionPolicy() |
There was a problem hiding this comment.
Should we load any pre-trained model here?
There was a problem hiding this comment.
Thanks for the valuable thought!
Diffusion policies are frequently tailored to specific use cases, and incorporating pretrained weights into the inference example could highly limit its general applicability and confuse users working on different tasks. Although I have pretrained weights available for a specific task that I can add here, to maintain the example’s universality, I recommend initializing the model without loading them. This will allow users to train their own models or integrate relevant pretrained weights based on their own applications!
There was a problem hiding this comment.
I beg to differ. I think if we can document it sufficiently it would make more sense to showcase this with a pre-trained model.
There was a problem hiding this comment.
Sounds good! I have made the changes. Now, the example loads from a pretrained model and contains comprehensive documentation
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks! Left some further comments.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
| from diffusers import DDPMScheduler, UNet1DModel | ||
|
|
||
|
|
||
| add_safe_globals( |
There was a problem hiding this comment.
After setting weights_only=True (from False), an error occurs for any pretrained model unless we use add_safe_globals to make custom or third-party methods available (since weights_only=True skips the configuration loading). I believe it is a preventative measure by HuggingFace for security reasons, because it explicitly states that if we use weights_only=False, we must trust the authors of the model
There was a problem hiding this comment.
It is happening at torch.load() so, I don't think it has anything to do with Hugging Face. Which torch version are you using?
There was a problem hiding this comment.
I see, thank you - I am using 2.5.1
|
It appears the 1 failing check is unrelated to the changes in this PR and may be due to external factors. Do they need to be addressed? |
|
Indeed. That is not merge-blocking. Thanks for the PR! |
* enable cpu ability * model creation + comprehensive testing * training + tests * all tests working * remove unneeded files + clarify docs * update train tests * update readme.md * remove data from gitignore * undo cpu enabled option * Update README.md * update readme * code quality fixes * diffusion policy example * update readme * add pretrained model weights + doc * add comment * add documentation * add docstrings * update comments * update readme * fix code quality * Update examples/reinforcement_learning/README.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update examples/reinforcement_learning/diffusion_policy.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * suggestions + safe globals for weights_only=True * suggestions + safe weights loading * fix code quality * reformat file --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
What does this PR do?
Adds Diffusion Policy, a diffusion model to predict robotic action sequences in reinforcement learning tasks, using the HuggingFace
diffuserslibrary.Demonstrates hows how diffusion models can generate smooth and multimodal action trajectories for robotic control. Features a robotic arm learning to push a T-shaped block into a target area by predicting an optimal trajectory using diffusion-based denoising.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sayakpaul @yiyixuxu @DN6 @a-r-r-o-w