-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
Hi,
I am trying to run pytorch-pretrained-BERT through the JIT using the tracing API. I ran the example run_squad.py without any changes with the following command and it worked without any issues.
CUDA_VISIBLE_DEVICES="0" python run_squad.py \
--bert_model bert-large-uncased \
--fp16 \
--do_train \
--do_lower_case \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--train_batch_size 6 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 512 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/
To run the script with the JIT, I changed the following lines
model.train()
for _ in trange(int(args.num_train_epochs), desc="Epoch"):
for step, batch in enumerate(tqdm(train_dataloader, desc="Iteration", disable=args.local_rank not in [-1, 0])):
if n_gpu == 1:
batch = tuple(t.to(device) for t in batch) # multi-gpu does scattering it-self
input_ids, input_mask, segment_ids, start_positions, end_positions = batch
loss = model(input_ids, segment_ids, input_mask, start_positions, end_positions)
to be
model.train()
traced = False
for _ in trange(int(args.num_train_epochs), desc="Epoch"):
for step, batch in enumerate(tqdm(train_dataloader, desc="Iteration", disable=args.local_rank not in [-1, 0])):
if n_gpu == 1:
batch = tuple(t.to(device) for t in batch) # multi-gpu does scattering it-self
input_ids, input_mask, segment_ids, start_positions, end_positions = batch
if not traced:
model = torch.jit.trace(model, (input_ids, segment_ids, input_mask, start_positions, end_positions), check_trace=False)
traced = True
logger.info("Tracing complete")
loss = model(input_ids, segment_ids, input_mask, start_positions, end_positions)
I also disabled the FusedLayerNorm here to make it run with the tracing.
I ran the modified script with the same command, but I got a CUDA OOM Error.
Error Log: log
Since the unmodified code was running perfectly, the traced module should also run within the available GPU memory. Am I doing something wrong?
Environment
PyTorch version: 1.1.0
Is debug build: Yes
CUDA used to build PyTorch: 10.0.130
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.14.0
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: Tesla V100-PCIE-16GB
GPU 1: Tesla V100-PCIE-16GB
Nvidia driver version: 418.67
cuDNN version: /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn.so.7.4.2
Versions of relevant libraries:
[pip] numpy==1.16.3
[pip] pytorch-pretrained-bert==0.6.2
[pip] torch==1.1.0
[conda] blas 1.0 mkl
[conda] magma-cuda100 2.5.0 1 pytorch
[conda] mkl 2019.3 199
[conda] mkl-include 2019.3 199
[conda] mkl_fft 1.0.12 py36ha843d7b_0
[conda] mkl_random 1.0.2 py36hd81dba3_0
[conda] pytorch-pretrained-bert 0.6.2
[conda] torch 1.1.0
Thanks,
Tapan