BERT微调
Model Architecture
Predict
intentandslotat the same time from one BERT model (=Joint model)total_loss = intent_loss + coef * slot_loss (Change coef with
--slot_loss_coefoption)If you want to use CRF layer, give
--use_crfoption
Dependencies
python>=3.6
torch==1.6.0
transformers==3.0.2
seqeval==0.0.12
pytorch-crf==0.7.2
Dataset
ATIS
4,478
500
893
21
120
Snips
13,084
700
700
7
72
The number of labels are based on the train dataset.
Add
UNKfor labels (For intent and slot labels which are only shown in dev and test dataset)Add
PADfor slot label
Training & Evaluation
Prediction
Results
Run 5 ~ 10 epochs (Record the best result)
Only test with
uncasedmodelALBERT xxlarge sometimes can't converge well for slot prediction.
Snips
BERT
99.14
96.90
93.00
BERT + CRF
98.57
97.24
93.57
ALBERT + CRF
99.00
96.55
92.57
ATIS
BERT
97.87
95.59
88.24
BERT + CRF
97.98
95.93
88.58
Last updated
Was this helpful?