multimodal

Easy Start

Model

IFAformer (AAAI SA'23) is a novel dual Multimodal Transformer model with implicit feature alignment, which utilizes the Transformer structure uniformly in the visual and textual without explicitly designing modal alignment structure (Details in https://arxiv.org/pdf/2211.07504.pdf).

Experiment

The overall experimental results on IFAformer for Multi-Modal RE task can be seen as follows:

	Methods	Acc	Precision	Recall	F1
text	PCNN*	73.36	69.14	43.75	53.59
	BERT*	71.13	58.51	60.16	59.32
	MTB*	75.34	63.28	65.16	64.20
text+image
	BERT+SG+Att	74.59	60.97	66.56	63.64
	ViLBERT	74.89	64.50	61.86	63.61
	MEGA	76.15	64.51	68.44	66.41
Ours
	Vanilla IFAformer	87.75	69.90	68.11	68.99
	w/o Text Attn.	76.21	66.95	61.72	64.23
	w/ Visual Objects	92.38	82.59	80.78	81.67

Requirements

python == 3.8

torch == 1.5
transformers == 3.4.0
hydra-core == 1.0.6
deepke

Attention! Here transformers == 3.4.0 is the environmental requirement of the whole DeepKE. But to load the openai/clip-vit-base-patch32 model used in multimodal parts, transformers == 4.11.3 is needed indeed. So you are recommended to download the pretrained model on huggingface and use the local path to load the model.

Download Code

git clone https://github.com/zjunlp/DeepKE.git
cd DeepKE/example/re/multimodal

Install with Pip

Create and enter the python virtual environment.
Install dependencies: pip install -r requirements.txt.

Train and Predict

Dataset
- Download the dataset to this directory.
  
  The MNRE dataset comes from https://github.com/thecharm/Mega, many thanks.
  
  You can download the MNRE dataset with detected visual objects using folloing command:
```
wget 121.41.117.246:8080/Data/re/multimodal/data.tar.gz
tar -xzvf data.tar.gz
```
- The dataset MNRE with detected visual objects is stored in data:
  - img_detect：Detected objects using RCNN
  - img_vg：Detected objects using visual grounding
  - img_org： Original images
  - txt: Text set
  - vg_data：Bounding image and img_vg
  - ours_rel2id.json Relation set
- We use RCNN detected objects and visual grounding objects as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.
Training
- Parameters, model paths and configuration for training are in the conf folder and users can modify them before training.
- Training on MNRE
```
python run.py
```
- The trained model is stored in the checkpoint directory by default and you can change it by modifying "save_path" in train.yaml.
- Start to train from last-trained model
  
  modify load_path in train.yaml as the path of the last-trained model
- Logs for training are stored in the current directory by default and the path can be configured by modifying log_dir in .yaml
Prediction

Modify "load_path" in predict.yaml to the trained model path. In addition, we provide the model trained on MNRE dataset for users to predict directly.
```
python predict.py
```

Cite

If you use or extend our work, please cite the following paper:

@article{DBLP:journals/corr/abs-2211-07504,
  author    = {Lei Li and
               Xiang Chen and
               Shuofei Qiao and
               Feiyu Xiong and
               Huajun Chen and
               Ningyu Zhang},
  title     = {On Analyzing the Role of Image for Visual-enhanced Relation Extraction},
  journal   = {CoRR},
  volume    = {abs/2211.07504},
  year      = {2022},
  url       = {https://doi.org/10.48550/arXiv.2211.07504},
  doi       = {10.48550/arXiv.2211.07504},
  eprinttype = {arXiv},
  eprint    = {2211.07504},
  timestamp = {Tue, 27 Dec 2022 08:22:45 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2211-07504.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Name		Name	Last commit message	Last commit date
parent directory ..
conf		conf
README.md		README.md
README_CN.md		README_CN.md
mre_model.png		mre_model.png
predict.py		predict.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Easy Start

Model

Experiment

Requirements

Download Code

Install with Pip

Train and Predict

Cite

FilesExpand file tree

multimodal

Directory actions

More options

Directory actions

More options

Latest commit

History

multimodal

Folders and files

parent directory

README.md

Easy Start

Model

Experiment

Requirements

Download Code

Install with Pip

Train and Predict

Cite