Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Easy Start

English | 简体中文

Model

IFAformer (AAAI SA'23) is a novel dual Multimodal Transformer model with implicit feature alignment, which utilizes the Transformer structure uniformly in the visual and textual without explicitly designing modal alignment structure (Details in https://arxiv.org/pdf/2211.07504.pdf).

Experiment

The overall experimental results on IFAformer for Multi-Modal RE task can be seen as follows:

Methods Acc Precision Recall F1
text PCNN* 73.36 69.14 43.75 53.59
BERT* 71.13 58.51 60.16 59.32
MTB* 75.34 63.28 65.16 64.20
text+image
BERT+SG+Att 74.59 60.97 66.56 63.64
ViLBERT 74.89 64.50 61.86 63.61
MEGA 76.15 64.51 68.44 66.41
Ours
Vanilla IFAformer 87.75 69.90 68.11 68.99
 w/o Text Attn. 76.21 66.95 61.72 64.23
 w/ Visual Objects 92.38 82.59 80.78 81.67

Requirements

python == 3.8

  • torch == 1.5
  • transformers == 3.4.0
  • hydra-core == 1.0.6
  • deepke

Attention! Here transformers == 3.4.0 is the environmental requirement of the whole DeepKE. But to load the openai/clip-vit-base-patch32 model used in multimodal parts, transformers == 4.11.3 is needed indeed. So you are recommended to download the pretrained model on huggingface and use the local path to load the model.

Download Code

git clone https://github.com/zjunlp/DeepKE.git
cd DeepKE/example/re/multimodal

Install with Pip

  • Create and enter the python virtual environment.
  • Install dependencies: pip install -r requirements.txt.

Train and Predict

  • Dataset

    • Download the dataset to this directory.

      The MNRE dataset comes from https://github.com/thecharm/Mega, many thanks.

      You can download the MNRE dataset with detected visual objects using folloing command:

      wget 121.41.117.246:8080/Data/re/multimodal/data.tar.gz
      tar -xzvf data.tar.gz
    • The dataset MNRE with detected visual objects is stored in data:

      • img_detect:Detected objects using RCNN
      • img_vg:Detected objects using visual grounding
      • img_org: Original images
      • txt: Text set
      • vg_data:Bounding image and img_vg
      • ours_rel2id.json Relation set
    • We use RCNN detected objects and visual grounding objects as visual local information, where RCNN via faster_rcnn and visual grounding via onestage_grounding.

  • Training

    • Parameters, model paths and configuration for training are in the conf folder and users can modify them before training.

    • Training on MNRE

      python run.py
    • The trained model is stored in the checkpoint directory by default and you can change it by modifying "save_path" in train.yaml.

    • Start to train from last-trained model

      modify load_path in train.yaml as the path of the last-trained model

    • Logs for training are stored in the current directory by default and the path can be configured by modifying log_dir in .yaml

  • Prediction

    Modify "load_path" in predict.yaml to the trained model path. In addition, we provide the model trained on MNRE dataset for users to predict directly.

    python predict.py

Cite

If you use or extend our work, please cite the following paper:

@article{DBLP:journals/corr/abs-2211-07504,
  author    = {Lei Li and
               Xiang Chen and
               Shuofei Qiao and
               Feiyu Xiong and
               Huajun Chen and
               Ningyu Zhang},
  title     = {On Analyzing the Role of Image for Visual-enhanced Relation Extraction},
  journal   = {CoRR},
  volume    = {abs/2211.07504},
  year      = {2022},
  url       = {https://doi.org/10.48550/arXiv.2211.07504},
  doi       = {10.48550/arXiv.2211.07504},
  eprinttype = {arXiv},
  eprint    = {2211.07504},
  timestamp = {Tue, 27 Dec 2022 08:22:45 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2211-07504.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}