Cs 224 N 2019
Last updated
Was this helpful?
Last updated
Was this helpful?
winter-2019
课程资料:
Video page (Chinese):
学习笔记参考:
参考书:
Dan Jurafsky and James H. Martin.
Jacob Eisenstein.
Yoav Goldberg.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
神经网络相关的基础:
Michael A. Nielsen.
Eugene Charniak.
The course (10 mins)
Human language and word meaning (15 mins)
Word2vec introduction (15 mins)
Word2vec objective function gradients (25 mins)
Optimization basics (5 mins)
Looking at word vectors (10 mins or less)
课件
Suggested Readings
参考阅读
Assignment 1:Exploring Word Vectors
笔记整理
Finish looking at word vectors and word2vec (12 mins)
Optimization basics (8 mins)
Can we capture this essence more effectively by counting? (15m)
The GloVe model of word vectors (10 min)
Evaluating word vectors (15 mins)
Word senses (5 mins)
课件
Suggested Readings
Additional Readings:
参考阅读
review
glove的思想、算法步骤分解、代码
评估词向量的方法
Course information update (5 mins)
Classification review/introduction (10 mins)
Neural networks introduction (15 mins)
Named Entity Recognition (5 mins)
Binary true vs. corrupted word window classification (15 mins)
Matrix calculus introduction (20 mins)
课件
Suggested Readings:
Additional Readings:
Assignment 2
review
NER
梯度
Matrix gradients for our simple neural net and some tips [15 mins]
Computation graphs and backpropagation [40 mins]
Stuff you should know [15 mins]
a. Regularization to prevent overfitting
b. Vectorization
c. Nonlinearities
d. Initialization
e. Optimizers
f. Learning rates
课件
Suggested Readings:
Syntactic Structure: Consistency and Dependency (25 mins)
Dependency Grammar and Treebanks (15 mins)
Transition-based dependency parsing (15 mins)
Neural dependency parsing (15 mins)
短语结构,依赖结构
Suggested Readings:
Assignment 3
Recurrent Neural Networks (RNNs) and why they’re great for Language Modeling (LM).
语言模型
RNN
Suggested Readings:
Problems with RNNs and how to fix them
More complex RNN variants
梯度消失
LSTM和GRU
Suggested Readings:
Assignment 4
How we can do Neural Machine Translation (NMT) using an RNN based architecture called sequence to sequence with attention
机器翻译:
1.1950s,早期是基于规则的,利用词典翻译;
2.1990s-2010s,基于统计的机器翻译(SMT),从数据中学习统计模型,贝叶斯规则,考虑翻译和句子语法流畅。对齐:一对多,多对一,多对多。
3.2014-,基于神经网络的机器翻译(NMT),seq2seq,两个RNNs。seq2seq任务有:总结(长文本到短文本),对话,解析,代码生成(自然语言到代码)。贪心解码。束搜索解码
评估方式:BLEU(Bilingual Evaluation Understudy)
未解决的问题:词汇表之外的词,领域不匹配,保持较长文本的上下文,低资源语料少,没有加入常识,从训练数据中学到了偏见,无法解释的翻译,
Attention。
Suggested Readings:
Final project types and details; assessment revisited
Finding research topics; a couple of examples
Finding data
Review of gated neural sequence models
A couple of MT topics
Doing your research
Presenting your results and evaluation
默认的项目是问答系统SQuAD
数据:
Look at Kaggle,research papers,lists of datasets
Suggested Readings:
Final final project notes, etc.
Motivation/History
The SQuAD dataset
The Stanford Attentive Reader model
BiDAF
Recent, more advanced architectures
ELMo and BERT preview
两个部分:寻找那些可能包含答案的文档(信息检索),从文档或段落中找答案(阅读理解)
阅读理解的历史,2013年MCTest:P+Q——>A,2015/16:CNN/DM、SQuAD数据集
开放领域问答的历史:1964年是依赖解析和匹配,1993年线上百科全书,1999年设立TREC问答,2011年IBM的DeepQA系统,2016年用神经网络和信息检索IR
SQuAD数据集,评估方法
斯坦福的简单模型:Attentive Reader model,预测回答文本的起始位置和结束位置
BiDAF
Project Proposal
Default Final Project
Announcements (5 mins)
Intro to CNNs (20 mins)
Simple CNN for Sentence Classification: Yoon (2014) (20 mins)
CNN potpourri (5 mins)
Deep CNN for Sentence Classification: Conneau et al. (2017)
(10 mins)
Quasi-recurrent Neural Networks (10 mins)
CNN
句子分类
Suggested Readings:
A tiny bit of linguistics (10 mins)
Purely character-level models (10 mins)
Subword-models: Byte Pair Encoding and friends (20 mins)
Hybrid character and word level models (30 mins)
fastText (5 mins)
Suggested readings:
Assignment 5
Suggested readings:
Suggested readings:
Project Milestone
Suggested Readings:
(该博客分为2个部分,skipgram思想,以及改进训练方法:下采样和负采样)
(上述文章的翻译)
(word2vec用于推荐和广告)
(original word2vec paper)(没太看懂,之后再看一遍)
(negative sampling paper)
(推荐了一些很好的资料)
[] []
Gensim word vector visualization[] []
(original GloVe paper)
(很详细易懂,讲解了GloVe模型的思想)
Python review[]
[] []
[]
[] []
(textbook chapter)
(blog post overview)
(Sections 10.1 and 10.2)
(Sections 10.3, 10.5, 10.7-10.12)
(one of the original vanishing gradient papers)
(proof of vanishing gradient problem)
(demo for feedforward networks)
(blog post overview)
[] [] [] []
(lectures 2/3/4)
(book by Philipp Koehn)
(original paper)
(original seq2seq NMT paper)
(early seq2seq speech recognition paper)
(original seq2seq+attention paper)
(blog post overview)
(practical advice for hyperparameter choices)
Look at ACL anthology for NLP papers:
(Deep Learning book chapter)
[]
[] []
Minh-Thang Luong and Christopher Manning.
[ / ] []
[] []
Smith, Noah A. . (Published just in time for this lecture!)
[] []
[]
[] []
[] []
[] []
[] [] []
[] []
[] []
Final project poster session []
Final Project Report due []
Project Poster/Video due []