BERT微调

Model Architecture

Predict intent and slot at the same time from one BERT model (=Joint model)
total_loss = intent_loss + coef * slot_loss (Change coef with --slot_loss_coef option)
If you want to use CRF layer, give --use_crf option

Dependencies

python>=3.6
torch==1.6.0
transformers==3.0.2
seqeval==0.0.12
pytorch-crf==0.7.2

Dataset

Train

Dev

Test

Intent Labels

Slot Labels

ATIS

4,478

500

893

120

Snips

13,084

700

The number of labels are based on the train dataset.
Add UNK for labels (For intent and slot labels which are only shown in dev and test dataset)
Add PAD for slot label

Training & Evaluation

$ python3 main.py --task {task_name} \
                  --model_type {model_type} \
                  --model_dir {model_dir_name} \
                  --do_train --do_eval \
                  --use_crf

# For ATIS
$ python3 main.py --task atis \
                  --model_type bert \
                  --model_dir atis_model \
                  --do_train --do_eval
# For Snips
$ python3 main.py --task snips \
                  --model_type bert \
                  --model_dir snips_model \
                  --do_train --do_eval

# python JointBERT/main.py --task atis --model_type bert --model_dir experiments/jointbert_0 --do_train --do_eval --train_batch_size 2

Prediction

$ python3 predict.py --input_file {INPUT_FILE_PATH} --output_file {OUTPUT_FILE_PATH} --model_dir {SAVED_CKPT_PATH}

Results

Run 5 ~ 10 epochs (Record the best result)
Only test with uncased model
ALBERT xxlarge sometimes can't converge well for slot prediction.

Intent acc (%)

Slot F1 (%)

Sentence acc (%)

Snips

BERT

99.14

96.90

93.00

BERT + CRF

98.57

97.24

93.57

ALBERT + CRF

99.00

96.55

92.57

ATIS

BERT

97.87

95.59

88.24

BERT + CRF

97.98

95.93

88.58

PreviousCode NextNER

Last updated 3 years ago

Was this helpful?