Skip to content

Commit e4574cd

Browse files
committed
Added model weights
1 parent cf44be4 commit e4574cd

File tree

1 file changed

+36
-4
lines changed

1 file changed

+36
-4
lines changed

README.md

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
11
## Language-Grounded Indoor 3D Semantic Segmentation in the Wild
22
### Implementation for our ECCV 2022 paper
3-
*Note: The full release of this repo is under progress - the necessary code snippets and the training pipeline is published already, but comments and model checkpoints will be released over the next few days.*
43

54
<div align="center">
65
<img src="docs/teaser.jpg" width = 100% >
76
</div>
87

98
**Abstract -**
109
Recent advances in 3D semantic segmentation with deep neural networks have shown remarkable success, with rapid performance increase on available datasets.
11-
However, current 3D semantic segmentation benchmarks contain only a small number of categories -- less than 30 for ScanNet and SemanticKITTI, for instance, which are not enough to reflect the diversity of real environments (e.g., semantic image understanding covers hundreds to thousands of classes).
10+
However, current 3D semantic segmentation benchmarks contain only a small number of categories -- less than 30 for ScanNet and SemanticKITTI, for instance, which are not enough to reflect the diversity of real environments (e.g., semantic image understanding covers hundreds to thousands of classes).
1211

1312
Thus, we propose to study a larger vocabulary for 3D semantic segmentation with a new extended benchmark on ScanNet data with 200 class categories, an order of magnitude more than previously studied.
1413
This large number of class categories also induces a large natural class imbalance, both of which are challenging for existing 3D semantic segmentation methods.
@@ -44,13 +43,14 @@ conda env create -f config/lg_semseg.yml
4443
conda activate lg_semseg
4544
```
4645

47-
Additionaly [MinkowskiEngine] has to be installed manually with specified CUDA version.
46+
Additionaly [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) has to be installed manually with specified CUDA version.
4847
E.g. for CUDA 11.1 run
4948

5049
```sh
5150
export CUDA_HOME=/usr/local/cuda-11.1
5251
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas=openblas"
5352
```
53+
Note: We use 0.5.x versions, where the pretrained weights are not compatible with models trained with 0.4.x ME releases.
5454

5555
## Dataset
5656

@@ -73,6 +73,26 @@ cd lib/preprocessing
7373
python scannet200_insseg.py --input <SCANNET_PATH>
7474
```
7575

76+
After the ScanNet200 dataset is preprocessed we provide [extracted data files](https://kaldir.vc.in.tum.de/rozenberszki/language_grounded_semseg/feature_data.zip) that we preprocessed for our method.
77+
The Zip file with all the necessary content can be downloaded from here and should be placed in the same folder where the processed data files live.
78+
Please refer to our paper how these files were created and what they are used for.
79+
So the preprocessed dataset should look something like this:
80+
81+
```
82+
feature_data/
83+
|--clip_feats_scannet_200.pkl
84+
|--dataset_frequencies.pkl
85+
|--scannet200_category_weights.pkl
86+
|-- ...
87+
train/
88+
|--scene0000_00.ply
89+
|--scene0000_01.ply
90+
|--...
91+
train.txt
92+
val.txt
93+
```
94+
95+
7696
## Language Grounded Pretraining
7797

7898
The goal of this stage is to anchor the representation space to the much more structured
@@ -89,7 +109,7 @@ source scripts/text_representation_train.sh <BATCH_SIZE> <TRAIN_NAME_POSTFIX> <A
89109

90110
Refer to our [config](config/config.py) file for additional training and vealuation parameters.
91111

92-
We also provide a pretrained model checkpoints for both [Res16UNet34C]() and [Res16UNet34D]() and the precomputed CLIP features for anchoring.
112+
We also provide a pretrained model checkpoints for different model sizes and the precomputed CLIP features for anchoring the pretraining stage.
93113

94114
## Downstream Semantic Segmentation
95115

@@ -116,5 +136,17 @@ cd downstream/insseg
116136
. scripts/train_scannet_slurm.sh <BATCH_SIZE> <MODEL> <TRAINING_POSTFIX> <PRETRAINED_CHECKPOINT>
117137
```
118138

139+
## Model Zoo
140+
141+
We provide trained models form our method and different stages. Pretrain stage means that we only anchored model representations to the CLIP text encodings, while finetuned models can be directly evaluated on ScanNet200.
142+
143+
| Model Architecture | Pretrain Strategy | Stage | Link |
144+
|:-------------------|:-----------------:|:--------:|:----------------------------------------------------------------------------------------------------------------:|
145+
| Res16UNet34D | Ours | Pretrain | [download](https://kaldir.vc.in.tum.de/rozenberszki/language_grounded_semseg/Weights/34D/34D_CLIP_pretrain.ckpt) |
146+
| Res16UNet34D | Ours | Finetune | [download](https://kaldir.vc.in.tum.de/rozenberszki/language_grounded_semseg/Weights/34D/34D_CLIP_finetune.ckpt) |
147+
| Res16UNet34C | Ours | Pretrain | [download](https://kaldir.vc.in.tum.de/rozenberszki/language_grounded_semseg/Weights/34C/34C_CLIP_pretrain.ckpt) |
148+
| Res16UNet34C | Ours | Finetune | [download](https://kaldir.vc.in.tum.de/rozenberszki/language_grounded_semseg/Weights/34C/34C_CLIP_finetune.ckpt) |
149+
150+
119151
## Acknowledgment
120152
We thank the authors of [CSC](https://github.com/facebookresearch/ContrastiveSceneContexts) and [SpatioTemporalSegmentation](https://github.com/chrischoy/SpatioTemporalSegmentation) for their valuable work and open-source implementations.

0 commit comments

Comments
 (0)