DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Cluttered Environments

Arxiv | Website

I. Installation

Make a project directory, inside which clone this repo

git clone https://github.com/DavidLXu/DexSinGrasp.git

The project directory should have a structure like this

PROJECT
    └── Logs
        └── Results
            └── results_train
            └── results_distill
            └── results_trajectory
    └── Assets
        └── meshdatav3_pc_fps
        └── meshdatav3_scaled
        └── textures
        └── urdf
    └── DexSinGrasp
        └── results
        └── dexgrasp

Download Assets.zip and unzip to PROJECT/Assets

Download random_arrangements.zip and unzip to PROJECT/DexSinGrasp/dexgrasp/random_arrangements

(Optional) Download pretrained teacher models and student model for testing.

Install Environment:

Python 3.8 is required.

conda create -n dexgrasp python=3.8
conda activate dexgrasp

Install IsaacGym: Download isaacgym first.

cd path_to_issacgym/python
pip install -e .

Install DexSinGrasp:

cd PROJECT/DexSinGrasp
bash install.sh

when doing this step, you may encounter some issues with pytorch3d

The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

Install CUDA 11.7 in your custom dir and set the environment variable.

Install other dependencies:

cd pytorch_kinematics-default
pip install -e .

II. Teacher policies

Step1: Train Teacher Policy:

Enter the working directory

cd PROJECT/DexSinGrasp/dexgrasp/

Start by training a single object grasping task by setting surrounding_obj_num to 0.

python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  --num_envs 1000 --max_iterations 10000 --config dedicated_policy.yaml --expert_id 2 --surrounding_obj_num 0

The saved weights are in PROJECT/Logs/Results/results_train/

Curriculums

Based on the Single Object Grasping Policy, we train the curriculum in a sequence of D-4,D-6,D-8 followed by R-4,R-6,R-8, where D and R stand for dense and random arrangements respectively, and the number stands for the quantity of surrounding objects.

For dense arrangement environment, we use --expert_id 2.

# Training D-4 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  --num_envs 1000 --max_iterations 10000 --config dedicated_policy.yaml --expert_id 2 --surrounding_obj_num 4 --model_dir path_to_single_obj_ckpt.pt

# Training D-6 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  --num_envs 1000 --max_iterations 10000 --config dedicated_policy.yaml --expert_id 2 --surrounding_obj_num 6 --model_dir path_to_D4_ckpt.pt

# Training D-8 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  --num_envs 1000 --max_iterations 10000 --config dedicated_policy.yaml --expert_id 2 --surrounding_obj_num 8 --model_dir path_to_D6_ckpt.pt

The resulting D-8 expert is used for data collection in the distillation phase.

For random arrangement environment, we use --expert_id 3. (Note that --expert_id 1 is deprecated.)

# Training R-4 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  --num_envs 1000 --max_iterations 10000 --config dedicated_policy.yaml --expert_id 3 --surrounding_obj_num 4 --model_dir path_to_D8_ckpt.pt

# Training R-6 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  --num_envs 1000 --max_iterations 10000 --config dedicated_policy.yaml --expert_id 3 --surrounding_obj_num 6 --model_dir path_to_R4_ckpt.pt

# Training R-8 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  --num_envs 1000 --max_iterations 10000 --config dedicated_policy.yaml --expert_id 3 --surrounding_obj_num 8 --model_dir path_to_R6_ckpt.pt

The resulting R8 expert is used for data collection in the distillation phase.

For pretrained D8/R8 teacher policy, you can download here.

Step2: Test Teacher Policy

For the D/R-n policy, we test 10 envs for 10 episodes.

# Testing D-8 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0 --config dedicated_policy.yaml --expert_id 2 --surrounding_obj_num 8 --num_envs 10  --test --test_iteration 10 --model_dir path_to_D8_ckpt.pt

# Testing R-8 Policy
python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0 --config dedicated_policy.yaml --expert_id 3 --surrounding_obj_num 8 --num_envs 10  --test --test_iteration 10 --model_dir path_to_R8_ckpt.pt

III. Policy Distillation

Step3: Data Collection for Distillation

For example, to collect 200 episodes of dense 8 surrounding objects

python run_online.py --task StateBasedGrasp --algo ppo --seed 0 --rl_device cuda:0  \
 --num_envs 50 --max_iterations 10000 --config dedicated_policy.yaml --test --test_iteration 4  \
 --save --save_train --save_render --model_dir path_to_D8_teacher.pt \
 --save_camera True --table_dim_z 0.6 --expert_id 2 --surrounding_obj_num 8

Refer to run_collection.sh for details to collect trajectories in various environments. You can download collected data here to skip this part.

NOTE:

DO NOT pause the visualization of isaacgym during data collection, otherwise the collected pointcloud will be static.
We set --table_dim_z to be 0.6 to avoid a bug casuing PCA errors. In this case, raw pointclouds are collected with lowest height 0.6 (normalized during training). Other states in the observation space are invariant with table heights.

Dynamic visualization of pointclouds

python dexgrasp/pointcloud_vis_pkl_dyn.py

Step4: Vision-based Distillation

Train distilled vision-based policy

python run_offline.py --config universal_policy_vision_based.yaml --device cuda:0

The checkpoints are saved into PROJECT/Logs/Results/results_distill/random/universal_policy/distill_student_model

You can modify config['Offlines']['train_epochs'] and config['Offlines']['train_batchs'] in run_offline.py to change the epochs and batch size.

You can also download the pretrained student model here and save it as PROJECT/Logs/Results/results_distill/random/universal_policy/distill_student_model/model_best.pt

Test distilled vision-based policy on dense arrangements.

python run_online.py --task StateBasedGrasp --algo dagger_value --seed 0 --rl_device cuda:0 --num_envs 10 --config universal_policy_vision_based.yaml --test --test_iteration 10 --model_dir distill_student_model --save_camera True --table_dim_z 0.6 --expert_id 2 --surrounding_obj_num 8

Test distilled vision-based policy on random arrangements.

python run_online.py --task StateBasedGrasp --algo dagger_value --seed 0 --rl_device cuda:0 --num_envs 10 --config universal_policy_vision_based.yaml --test --test_iteration 10 --model_dir distill_student_model --save_camera True --table_dim_z 0.6 --expert_id 3 --surrounding_obj_num 8

Extra: Define new task environment

In dexgrasp/state_based_grasp_customed.py write class StateBasedGraspCustomed
In dexgrasp/utils/config.py

def retrieve_cfg(args, use_rlg_config=False):
    # add
    elif args.task == "StateBasedGraspCustomed":
        return os.path.join(args.logdir, "state_based_grasp_customed/{}/{}".format(args.algo, args.algo)), "cfg/{}/config.yaml".format(args.algo), "cfg/state_based_grasp_customed.yaml"

Add dexgrasp/cfg/state_based_grasp_customed.yaml, copy from existing file, but some content is not used and will be overwritten by other codes for now. Remember to change the env_name to python task file name.
Add in dexgrasp/utils/parse_task.py

from tasks.state_based_grasp_customed import StateBasedGraspCustomed

Call the task class from here

elif args.task_type == "Python":
    try:
        # ... previous if and elif ...
        elif cfg['env']['env_name'] == "state_based_grasp_customed":
            task = StateBasedGraspCustomed(
                cfg=cfg,
                sim_params=sim_params,
                physics_engine=args.physics_engine,
                device_type=args.device,
                device_id=device_id,
                headless=args.headless,
                is_multi_agent=False)

Train the customed task:

python run_online.py --task StateBasedGraspCustomed --algo ppo --seed 0 --rl_device cuda:0 --num_envs 1000 --max_iterations 10000 --config dedicated_policy_customed.yaml

Acknowledgements

This project builds upon and extends the work from UniGraspTransformer and UniDexGrasp++. We gratefully acknowledge the contributions of the researchers and developers behind these projects.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DexSinGrasp		DexSinGrasp
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Cluttered Environments

Arxiv | Website

I. Installation

Install Environment:

II. Teacher policies

Step1: Train Teacher Policy:

Step2: Test Teacher Policy

III. Policy Distillation

Step3: Data Collection for Distillation

Step4: Vision-based Distillation

Extra: Define new task environment

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

DavidLXu/DexSinGrasp

Folders and files

Latest commit

History

Repository files navigation

DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Cluttered Environments

Arxiv | Website

I. Installation

Install Environment:

II. Teacher policies

Step1: Train Teacher Policy:

Step2: Test Teacher Policy

III. Policy Distillation

Step3: Data Collection for Distillation

Step4: Vision-based Distillation

Extra: Define new task environment

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages