Cs 224 N 2019

winter-2019

课程资料:

学习笔记参考:

CS224n-2019 学习笔记

斯坦福CS224N深度学习自然语言处理2019冬学习笔记目录

参考书:

神经网络相关的基础:

Lecture 01: Introduction and Word Vectors

  1. The course (10 mins)

  2. Human language and word meaning (15 mins)

  3. Word2vec introduction (15 mins)

  4. Word2vec objective function gradients (25 mins)

  5. Optimization basics (5 mins)

  6. Looking at word vectors (10 mins or less)

课件

Suggested Readings

参考阅读

Assignment 1:Exploring Word Vectors

[code] [preview]

笔记整理

Lecture 02: Word Vectors 2 and Word Senses

  1. Finish looking at word vectors and word2vec (12 mins)

  2. Optimization basics (8 mins)

  3. Can we capture this essence more effectively by counting? (15m)

  4. The GloVe model of word vectors (10 min)

  5. Evaluating word vectors (15 mins)

  6. Word senses (5 mins)

课件

Suggested Readings

Additional Readings:

参考阅读

Python review[slides]

review

glove的思想、算法步骤分解、代码

评估词向量的方法

Lecture 03: Word Window Classification, Neural Networks, and Matrix Calculus

  1. Course information update (5 mins)

  2. Classification review/introduction (10 mins)

  3. Neural networks introduction (15 mins)

  4. Named Entity Recognition (5 mins)

  5. Binary true vs. corrupted word window classification (15 mins)

  6. Matrix calculus introduction (20 mins)

课件

Suggested Readings:

Additional Readings:

Assignment 2

[code] [handout]

review

NER

梯度

Lecture 04: Backpropagation and Computation Graphs

  1. Matrix gradients for our simple neural net and some tips [15 mins]

  2. Computation graphs and backpropagation [40 mins]

  3. Stuff you should know [15 mins]

    a. Regularization to prevent overfitting

    b. Vectorization

    c. Nonlinearities

    d. Initialization

    e. Optimizers

    f. Learning rates

课件

Suggested Readings:

Lecture 05: Linguistic Structure: Dependency Parsing

  1. Syntactic Structure: Consistency and Dependency (25 mins)

  2. Dependency Grammar and Treebanks (15 mins)

  3. Transition-based dependency parsing (15 mins)

  4. Neural dependency parsing (15 mins)

cs224n-2019-lecture05-dep-parsing [scrawled-on slides]

  • 短语结构,依赖结构

cs224n-2019-notes04-dependencyparsing

Suggested Readings:

Assignment 3

[code] [handout]

Lecture 06: The probability of a sentence? Recurrent Neural Networks and Language Models

Recurrent Neural Networks (RNNs) and why they’re great for Language Modeling (LM).

cs224n-2019-lecture06-rnnlm

  • 语言模型

  • RNN

cs224n-2019-notes05-LM_RNN

Suggested Readings:

  1. N-gram Language Models (textbook chapter)

Lecture 07: Vanishing Gradients and Fancy RNNs

  • Problems with RNNs and how to fix them

  • More complex RNN variants

cs224n-2019-lecture07-fancy-rnn

  • 梯度消失

  • LSTM和GRU

cs224n-2019-notes05-LM_RNN

Suggested Readings:

  1. Learning long-term dependencies with gradient descent is difficult (one of the original vanishing gradient papers)

  2. On the difficulty of training Recurrent Neural Networks (proof of vanishing gradient problem)

  3. Vanishing Gradients Jupyter Notebook (demo for feedforward networks)

  4. Understanding LSTM Networks (blog post overview)

Assignment 4

[code] [handout] [Azure Guide] [Practical Guide to VMs]

Lecture 08: Machine Translation, Seq2Seq and Attention

How we can do Neural Machine Translation (NMT) using an RNN based architecture called sequence to sequence with attention

cs224n-2019-lecture08-nmt

  • 机器翻译:

    • 1.1950s,早期是基于规则的,利用词典翻译;

    • 2.1990s-2010s,基于统计的机器翻译(SMT),从数据中学习统计模型,贝叶斯规则,考虑翻译和句子语法流畅。对齐:一对多,多对一,多对多。

    • 3.2014-,基于神经网络的机器翻译(NMT),seq2seq,两个RNNs。seq2seq任务有:总结(长文本到短文本),对话,解析,代码生成(自然语言到代码)。贪心解码。束搜索解码

    • 评估方式:BLEU(Bilingual Evaluation Understudy)

    • 未解决的问题:词汇表之外的词,领域不匹配,保持较长文本的上下文,低资源语料少,没有加入常识,从训练数据中学到了偏见,无法解释的翻译,

    • Attention。

cs224n-2019-notes06-NMT_seq2seq_attention

Suggested Readings:

  1. Statistical Machine Translation (book by Philipp Koehn)

  2. BLEU (original paper)

  3. Sequence Transduction with Recurrent Neural Networks (early seq2seq speech recognition paper)

  4. Massive Exploration of Neural Machine Translation Architectures (practical advice for hyperparameter choices)

Lecture 09: Practical Tips for Final Projects

  1. Final project types and details; assessment revisited

  2. Finding research topics; a couple of examples

  3. Finding data

  4. Review of gated neural sequence models

  5. A couple of MT topics

  6. Doing your research

  7. Presenting your results and evaluation

cs224n-2019-lecture09-final-projects

final-project-practical-tips

Suggested Readings:

  1. Practical Methodology (Deep Learning book chapter)

Lecture 10: Question Answering and the Default Final Project

  1. Final final project notes, etc.

  2. Motivation/History

  3. The SQuAD dataset

  4. The Stanford Attentive Reader model

  5. BiDAF

  6. Recent, more advanced architectures

  7. ELMo and BERT preview

cs224n-2019-lecture10-QA

  • 两个部分:寻找那些可能包含答案的文档(信息检索),从文档或段落中找答案(阅读理解)

  • 阅读理解的历史,2013年MCTest:P+Q——>A,2015/16:CNN/DM、SQuAD数据集

  • 开放领域问答的历史:1964年是依赖解析和匹配,1993年线上百科全书,1999年设立TREC问答,2011年IBM的DeepQA系统,2016年用神经网络和信息检索IR

  • SQuAD数据集,评估方法

  • 斯坦福的简单模型:Attentive Reader model,预测回答文本的起始位置和结束位置

Project Proposal

[instructions]

Default Final Project

[handout] [code]

Lecture 11: ConvNets for NLP

  1. Announcements (5 mins)

  2. Intro to CNNs (20 mins)

  3. Simple CNN for Sentence Classification: Yoon (2014) (20 mins)

  4. CNN potpourri (5 mins)

  5. Deep CNN for Sentence Classification: Conneau et al. (2017)

    (10 mins)

  6. Quasi-recurrent Neural Networks (10 mins)

cs224n-2019-lecture11-convnets

  • CNN

  • 句子分类

cs224n-2019-notes08-CNN

Suggested Readings:

Lecture 12: Information from parts of words: Subword Models

  1. A tiny bit of linguistics (10 mins)

  2. Purely character-level models (10 mins)

  3. Subword-models: Byte Pair Encoding and friends (20 mins)

  4. Hybrid character and word level models (30 mins)

  5. fastText (5 mins)

cs224n-2019-lecture12-subwords

Assignment 5

[original code (requires Stanford login) / public version] [handout]

Lecture 13: Modeling contexts of use: Contextual Representations and Pretraining

[slides] [video]

Suggested readings:

  1. Smith, Noah A. Contextual Word Representations: A Contextual Introduction. (Published just in time for this lecture!)

Lecture 14: Transformers and Self-Attention For Generative Models(guest lecture by Ashish Vaswani and Anna Huang)

[slides] [video]

Suggested readings:

Project Milestone

[instructions]

Lecture 15: Natural Language Generation

[slides] [video]

Lecture 16: Reference in Language and Coreference Resolution

[slides] [video]

Lecture 17: Multitask Learning: A general model for NLP? (guest lecture by Richard Socher)

[slides] [video]

Lecture 18: Constituency Parsing and Tree Recursive Neural Networks

[slides] [video] [notes]

Suggested Readings:

Lecture 19: Safety, Bias, and Fairness (guest lecture by Margaret Mitchell)

[slides] [video]

Lecture 20: Future of NLP + Deep Learning

[slides] [video]

Final project poster session [details]

Final Project Report due [instructions]

Project Poster/Video due [instructions]

Last updated

Was this helpful?