You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+36-4Lines changed: 36 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,13 @@
1
1
## Language-Grounded Indoor 3D Semantic Segmentation in the Wild
2
2
### Implementation for our ECCV 2022 paper
3
-
*Note: The full release of this repo is under progress - the necessary code snippets and the training pipeline is published already, but comments and model checkpoints will be released over the next few days.*
4
3
5
4
<divalign="center">
6
5
<img src="docs/teaser.jpg" width = 100% >
7
6
</div>
8
7
9
8
**Abstract -**
10
9
Recent advances in 3D semantic segmentation with deep neural networks have shown remarkable success, with rapid performance increase on available datasets.
11
-
However, current 3D semantic segmentation benchmarks contain only a small number of categories -- less than 30 for ScanNet and SemanticKITTI, for instance, which are not enough to reflect the diversity of real environments (e.g., semantic image understanding covers hundreds to thousands of classes).
10
+
However, current 3D semantic segmentation benchmarks contain only a small number of categories -- less than 30 for ScanNet and SemanticKITTI, for instance, which are not enough to reflect the diversity of real environments (e.g., semantic image understanding covers hundreds to thousands of classes).
12
11
13
12
Thus, we propose to study a larger vocabulary for 3D semantic segmentation with a new extended benchmark on ScanNet data with 200 class categories, an order of magnitude more than previously studied.
14
13
This large number of class categories also induces a large natural class imbalance, both of which are challenging for existing 3D semantic segmentation methods.
After the ScanNet200 dataset is preprocessed we provide [extracted data files](https://kaldir.vc.in.tum.de/rozenberszki/language_grounded_semseg/feature_data.zip) that we preprocessed for our method.
77
+
The Zip file with all the necessary content can be downloaded from here and should be placed in the same folder where the processed data files live.
78
+
Please refer to our paper how these files were created and what they are used for.
79
+
So the preprocessed dataset should look something like this:
80
+
81
+
```
82
+
feature_data/
83
+
|--clip_feats_scannet_200.pkl
84
+
|--dataset_frequencies.pkl
85
+
|--scannet200_category_weights.pkl
86
+
|-- ...
87
+
train/
88
+
|--scene0000_00.ply
89
+
|--scene0000_01.ply
90
+
|--...
91
+
train.txt
92
+
val.txt
93
+
```
94
+
95
+
76
96
## Language Grounded Pretraining
77
97
78
98
The goal of this stage is to anchor the representation space to the much more structured
We provide trained models form our method and different stages. Pretrain stage means that we only anchored model representations to the CLIP text encodings, while finetuned models can be directly evaluated on ScanNet200.
142
+
143
+
| Model Architecture | Pretrain Strategy | Stage | Link |
We thank the authors of [CSC](https://github.com/facebookresearch/ContrastiveSceneContexts) and [SpatioTemporalSegmentation](https://github.com/chrischoy/SpatioTemporalSegmentation) for their valuable work and open-source implementations.
0 commit comments