BERT微调
Model Architecture
Predict
intent
andslot
at the same time from one BERT model (=Joint model)total_loss = intent_loss + coef * slot_loss (Change coef with
--slot_loss_coef
option)If you want to use CRF layer, give
--use_crf
option
Dependencies
python>=3.6
torch==1.6.0
transformers==3.0.2
seqeval==0.0.12
pytorch-crf==0.7.2
Dataset
ATIS
4,478
500
893
21
120
Snips
13,084
700
700
7
72
The number of labels are based on the train dataset.
Add
UNK
for labels (For intent and slot labels which are only shown in dev and test dataset)Add
PAD
for slot label
Training & Evaluation
Prediction
Results
Run 5 ~ 10 epochs (Record the best result)
Only test with
uncased
modelALBERT xxlarge sometimes can't converge well for slot prediction.
Snips
BERT
99.14
96.90
93.00
BERT + CRF
98.57
97.24
93.57
ALBERT + CRF
99.00
96.55
92.57
ATIS
BERT
97.87
95.59
88.24
BERT + CRF
97.98
95.93
88.58
Last updated
Was this helpful?