📓
Study
  • README
  • Application
    • Contest
      • 竞赛trick
  • Basic Know
    • 半监督学习
    • 贝叶斯
      • 朴素贝叶斯分类器
    • 对抗训练
    • 概率图模型
      • CRF
      • HMM
      • 概率图模型
    • 关联分析
    • 归纳偏置
      • [什么是 Inductive bias(归纳偏置)?](BasicKnow/归纳偏置/什么是 Inductive bias(归纳偏置)?.md)
    • 聚类
    • 决策树
    • 绿色深度学习
    • 树模型&集成学习
      • 提升树
      • Ada Boost
      • [集成学习]
    • 特征工程
      • 数据分桶
      • 特征工程概述
      • 特征选择
      • LDA
      • PCA
    • 线性模型
      • 感知机
      • 最大熵模型
      • SVM
        • SVM支持向量机
      • 逻辑回归
      • 线性回归
    • 优化算法
      • 拉格朗日对偶性
      • 牛顿法
        • 牛顿法&拟牛顿法
      • 梯度下降法
        • 梯度下降算法
      • 优化算法
    • 预处理
      • [1-1]正则表达式
      • [1-2]文本预处理
      • [1-3]词性
      • [1-4]语法分析
      • [1-6]文本分类
      • [1-7]网络爬取
      • 【备用】正则表达式
      • 7.re模块
      • 词典匹配
      • 分词
      • 子表达式
      • Todo
    • 主题模型
      • LDA
    • Deep Learning
      • 反向传播
      • 梯度消失&梯度爆炸
      • Batch Size
      • 1.DLbasis
      • 小概念
      • MLstrategy
      • CNN
      • RNN及其应用
      • 关于深度学习实践
      • 神经网络概述
      • Batch Normalization
      • Program CNN
      • Program D Lbasis
      • Program DN Nimprove
      • Program Neural Style Transfer
      • Summer DL
    • EM算法
    • GAN
      • Gans In Action Master
    • GNN
      • 搜广推之GNN
      • Representation Learning
        • Anomalydetection
        • Conclusion
        • Others
        • Papernotes
        • Recommadation
    • k近邻法
      • K近邻
    • Language Model
      • 语言模型解码采样策略
      • [1-1][语言模型]从N-gram模型讲起
      • [1-2][语言模型]NNLM(神经网络语言模型)
      • [1-3][语言模型]基于RNN的语言模型
      • [1-4][语言模型]用N-gram来做完形填空
      • [1-5][语言模型]用KenLM来做完形填空
    • Loss Function
      • 常用损失函数
      • Focal Loss
      • softmax+交叉熵
    • Machine Learning
      • [基础]概念
      • 待整合
      • 交叉验证
      • 无监督学习
      • 优缺点
      • ML Yearning
      • SVD
    • Statistics Math
      • 程序员的数学基础课
      • 数学基础
      • 统计&高数
      • 统计题目
      • 线性代数
      • 组合数学
      • Discrete Choice Model
      • Nested Choice Model
  • Course Note
    • 基于TensorFlow的机器学习速成课程
      • [Key ML Terminology](CourseNote/基于TensorFlow的机器学习速成课程/Key ML Terminology.md)
    • 集训营
      • 任务说明
      • 算法实践1.1模型构建
      • 算法实践1.2模型构建之集成模型
      • 算法实践2.1数据预处理
    • 李宏毅机器学习
      • 10DNN训练Tips
        • Chapter 18
      • 16无监督学习
        • Chapter 25
    • 贪心NLP
      • 贪心NLP笔记
    • Cs 224 N 2019
      • [A Simple But Tough To Beat Baseline For Sentence Embeddings](CourseNote/cs224n2019/A Simple but Tough-to-beat Baseline for Sentence Embeddings.md)
      • [Lecture 01 Introduction And Word Vectors](CourseNote/cs224n2019/Lecture 01 Introduction and Word Vectors.md)
      • [Lecture 02 Word Vectors 2 And Word Senses](CourseNote/cs224n2019/Lecture 02 Word Vectors 2 and Word Senses.md)
      • [Lecture 03 Word Window Classification Neural Networks And Matrix Calculus](CourseNote/cs224n2019/Lecture 03 Word Window Classification, Neural Networks, and Matrix Calculus.md)
      • [Lecture 04 Backpropagation And Computation Graphs](CourseNote/cs224n2019/Lecture 04 Backpropagation and Computation Graphs.md)
      • [Lecture 05 Linguistic Structure Dependency Parsing](CourseNote/cs224n2019/Lecture 05 Linguistic Structure Dependency Parsing.md)
      • [Lecture 06 The Probability Of A Sentence Recurrent Neural Networks And Language Models](CourseNote/cs224n2019/Lecture 06 The probability of a sentence Recurrent Neural Networks and Language Models.md)
      • Stanford NLP
    • Deep Learning Book Goodfellow
      • Books
        • Deep Learning Book Chapter Summaries Master
      • 提纲
      • C 5
      • C 6
      • [Part I Applied Math And Machine Learning Basics](CourseNote/Deep-Learning-Book-Goodfellow/Part I - Applied Math and Machine Learning basics.md)
    • Lihang
    • NLP实战高手课
      • 极客时间_NLP实战高手课
    • 工具&资料
    • 机器学习、深度学习面试知识点汇总
    • 七月kaggle课程
    • 算法工程师
    • 贪心科技机器学习必修知识点特训营
    • 唐宇迪机器学习
    • 语言及工具
    • AI技术内参
    • Suggestions
  • Data Related
    • 数据质量
      • 置信学习
    • 自然语言处理中的数据增广_车万翔
      • 自然语言处理中的数据增广
    • Mixup
    • 数据不均衡问题
    • 数据增强的方法
  • Knowledge Graph
    • Information Extraction
      • 联合抽取
        • PRGC
      • Code
        • BERT微调
      • NER
        • 阅读理解做NER
          • MRC
        • FLAT
        • Global Pointer
        • 命名实体识别NER
    • Keyword Extraction
      • 关键词抽取
    • 小米在知识表示学习的探索与实践
    • KG
  • Multi Task
    • EXT 5
      • Ex T 5
  • NLG
    • Dailogue
      • 比赛
        • 对话评估比赛
          • [simpread-DSTC10 开放领域对话评估比赛冠军方法总结](NLG/Dailogue/比赛/对话评估比赛/simpread-DSTC10 开放领域对话评估比赛冠军方法总结.md)
      • 任务型对话
        • DST
          • DST概述
        • NLG
          • NLG概述
        • NLU
          • NLU概述
        • 任务型对话概述
        • simpread-任务型对话系统预训练最新研究进展
      • 问答型对话
        • 检索式问答
          • 基于预训练模型的检索式对话系统
          • 检索式文本问答
        • 业界分享
          • 低资源场景下的知识图谱表示学习和问答_阿里_李杨
          • QQ浏览器搜索智能问答
        • 问答型对话系统概述
      • 闲聊型对话
        • 闲聊型对话系统概述
      • 业界分享
        • 人工智能与心理咨询
        • 腾讯多轮对话机器人
        • 微软小冰
        • 小布助手闲聊生成式算法
        • 美团智能客服实践_江会星
        • 去哪儿智能客服探索和实践
        • 实时语音对话场景下的算法实践_阿里_陈克寒
        • 智能语音交互中的无效query识别_小米_崔世起
        • UNIT智能对话
      • 主动对话
      • EVA
        • EVA分享
        • EVA模型
      • PLATO
      • RASA
    • Machine Translation
      • 业界分享
        • 爱奇艺台词翻译分享
      • Paper
        • Deep Encoder Shallow Decoder
    • RAGRelated
    • Text 2 SQL
      • M SQL
        • [M SQL 2](NLG/Text2SQL/M-SQL/M-SQL (2).md)
      • [Text2SQL Baseline解析](NLG/Text2SQL/Text2SQL Baseline解析.md)
      • Text 2 SQL
    • Text Summarization
      • [文本摘要][paper]CTRLSUM
      • 文本摘要
  • Pre Training
    • 业界分享
      • 超大语言模型与语言理解_黄民烈
        • 超大语言模型与语言理解
      • 大模型的加速算法_腾讯微信
        • 大模型的加速算法
      • 孟子轻量化预训练模型
      • 悟道文汇文图生成模型
      • 悟道文澜图文多模态大模型
      • 语义驱动可视化内容创造_微软
        • 语义驱动可视化内容创造
    • Base
      • Attention
      • Mask
        • NLP中的Mask
      • Position Encoding
        • 位置编码
    • BERT
      • ALBERT
      • Bert
        • Venv
          • Lib
            • Site Packages
              • idna-3.2.dist-info
                • LICENSE
              • Markdown-3.3.4.dist-info
                • LICENSE
              • Tensorflow
                • Include
                  • External
                    • Libjpeg Turbo
                      • LICENSE
                  • Unsupported
                    • Eigen
                      • CXX 11
                        • Src
                          • Tensor
              • Werkzeug
                • Debug
                  • Shared
                    • ICON LICENSE
        • CONTRIBUTING
        • Multilingual
      • Ro BER Ta
      • BERT
      • BERT面试问答
      • BERT源码解析
      • NSP BERT
    • BERT Flow
    • BERT Zip
      • Distilling The Knowledge In A Neural Network
      • TINYBERT
      • 模型压缩
    • CPM
    • CPT
      • 兼顾理解和生成的中文预训练模型CPT
    • ELECTRA
    • EL Mo
    • ERNIE系列语言模型
    • GPT
    • MBART
    • NEZHA
    • NLG Sum
      • [simpread-预训练时代下的文本生成|模型 & 技巧](Pre-training/NLGSum/simpread-预训练时代下的文本生成|模型 & 技巧.md)
    • Prompt
      • 预训练模型的提示学习方法_刘知远
        • 预训练模型的提示学习方法
    • T 5
      • Unified SKG
      • T 5
    • Transformer
    • Uni LM
    • XL Net
    • 预训练语言模型
    • BERT变种
  • Recsys
    • 多任务Multi-task&推荐
    • 推荐介绍
    • 推荐系统之召回与精排
      • 代码
        • Python
          • Recall
            • Deep Match Master
              • Docs
                • Source
                  • Examples
                  • FAQ
                  • Features
                  • History
                  • Model Methods
                  • Quick Start
    • 业界分享
      • 腾讯基于知识图谱长视频推荐
    • 召回
    • Sparrow Rec Sys
    • 深度学习推荐系统实战
    • 推荐模型
    • Deep FM
  • Search
    • 搜索
    • 业界分享
      • 爱奇艺搜索排序算法实践
      • 语义搜索技术和应用
    • 查询关键字理解
    • 搜索排序
    • BM 25
    • KDD21-淘宝搜索中语义向量检索技术
    • query理解
    • TFIDF
  • Self Supervised Learning
    • Contrastive Learning
      • 业界分享
        • 对比学习在微博内容表示的应用_张俊林
      • Paper
      • R Drop
      • Sim CSE
    • 自监督学习
  • Text Classification
    • [多标签分类(Multi-label Classification)](TextClassification/多标签分类(Multi-label Classification)/多标签分类(Multi-label Classification).md)
    • Fast Text
    • Text CNN
    • 文本分类
  • Text Matching
    • 文本匹配和多轮检索
    • CNN SIM
    • Word Embedding
      • Skip Gram
      • Glove
      • Word 2 Vec
    • 文本匹配概述
  • Tool
    • 埋点
    • 向量检索(Faiss等)
    • Bigdata
      • 大数据基础task1_创建虚拟机+熟悉linux
      • 任务链接
      • Mr
      • Task1参考答案
      • Task2参考答案
      • Task3参考答案
      • Task4参考答案
      • Task5参考答案
    • Docker
    • Elasticsearch
    • Keras
    • Numpy
    • Python
      • 可视化
        • Interactivegraphics
        • Matplotlib
        • Tkinter
        • Turtle
      • 数据类型
        • Datatype
      • python爬虫
        • Python Scraping Master
          • phantomjs-2.1.1-windows
        • Regularexp
        • Scrapying
        • Selenium
      • 代码优化
      • 一行代码
      • 用python进行语言检测
      • Debug
      • Exception
      • [Features Tricks](Tool/python/Features & Tricks.md)
      • Fileprocess
      • Format
      • Functional Programming
      • I Python
      • Magic
      • Math
      • Os
      • Others
      • Pandas
      • Python Datastructure
      • Python操作数据库
      • Streamlit
      • Time
    • Pytorch
      • Dive Into DL Py Torch
        • 02 Softmax And Classification
        • 03 Mlp
        • 04 Underfit Overfit
        • 05 Gradient Vanishing Exploding
        • 06 Text Preprocess
        • 07 Language Model
        • 08 Rnn Basics
        • 09 Machine Translation
        • 10 Attention Seq 2 Seq
        • 11 Transformer
        • 12 Cnn
        • 14 Batchnorm Resnet
        • 15 Convexoptim
        • 16 Gradientdescent
        • 17 Optim Advance
    • Spark
      • Pyspark
        • pyspark之填充缺失的时间数据
      • Spark
    • SQL
      • 数据库
      • Hive Sql
      • MySQL实战45讲
    • Tensor Flow
      • TensorFlow入门
  • Common
  • NLP知识体系
Powered by GitBook
On this page
  • Lecture 01: Introduction and Word Vectors
  • Lecture 02: Word Vectors 2 and Word Senses
  • Lecture 03: Word Window Classification, Neural Networks, and Matrix Calculus
  • Lecture 04: Backpropagation and Computation Graphs
  • Lecture 05: Linguistic Structure: Dependency Parsing
  • Lecture 06: The probability of a sentence? Recurrent Neural Networks and Language Models
  • Lecture 07: Vanishing Gradients and Fancy RNNs
  • Lecture 08: Machine Translation, Seq2Seq and Attention
  • Lecture 09: Practical Tips for Final Projects
  • Lecture 10: Question Answering and the Default Final Project
  • Lecture 11: ConvNets for NLP
  • Lecture 12: Information from parts of words: Subword Models
  • Lecture 13: Modeling contexts of use: Contextual Representations and Pretraining
  • Lecture 14: Transformers and Self-Attention For Generative Models(guest lecture by Ashish Vaswani and Anna Huang)
  • Lecture 15: Natural Language Generation
  • Lecture 16: Reference in Language and Coreference Resolution
  • Lecture 17: Multitask Learning: A general model for NLP? (guest lecture by Richard Socher)
  • Lecture 18: Constituency Parsing and Tree Recursive Neural Networks
  • Lecture 19: Safety, Bias, and Fairness (guest lecture by Margaret Mitchell)
  • Lecture 20: Future of NLP + Deep Learning

Was this helpful?

  1. Course Note

Cs 224 N 2019

Previous贪心NLP笔记Next[A Simple But Tough To Beat Baseline For Sentence Embeddings](CourseNote/cs224n2019/A Simple but Tough-to-beat Baseline for Sentence Embeddings.md)

Last updated 3 years ago

Was this helpful?

winter-2019

课程资料:

  • Video page (Chinese):

学习笔记参考:

参考书:

  • Dan Jurafsky and James H. Martin.

  • Jacob Eisenstein.

  • Yoav Goldberg.

  • Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

神经网络相关的基础:

  • Michael A. Nielsen.

  • Eugene Charniak.

Lecture 01: Introduction and Word Vectors

  1. The course (10 mins)

  2. Human language and word meaning (15 mins)

  3. Word2vec introduction (15 mins)

  4. Word2vec objective function gradients (25 mins)

  5. Optimization basics (5 mins)

  6. Looking at word vectors (10 mins or less)

课件

Suggested Readings

参考阅读

Assignment 1:Exploring Word Vectors

笔记整理

Lecture 02: Word Vectors 2 and Word Senses

  1. Finish looking at word vectors and word2vec (12 mins)

  2. Optimization basics (8 mins)

  3. Can we capture this essence more effectively by counting? (15m)

  4. The GloVe model of word vectors (10 min)

  5. Evaluating word vectors (15 mins)

  6. Word senses (5 mins)

课件

Suggested Readings

Additional Readings:

参考阅读

review

glove的思想、算法步骤分解、代码

评估词向量的方法

Lecture 03: Word Window Classification, Neural Networks, and Matrix Calculus

  1. Course information update (5 mins)

  2. Classification review/introduction (10 mins)

  3. Neural networks introduction (15 mins)

  4. Named Entity Recognition (5 mins)

  5. Binary true vs. corrupted word window classification (15 mins)

  6. Matrix calculus introduction (20 mins)

课件

Suggested Readings:

Additional Readings:

Assignment 2

review

NER

梯度

Lecture 04: Backpropagation and Computation Graphs

  1. Matrix gradients for our simple neural net and some tips [15 mins]

  2. Computation graphs and backpropagation [40 mins]

  3. Stuff you should know [15 mins]

    a. Regularization to prevent overfitting

    b. Vectorization

    c. Nonlinearities

    d. Initialization

    e. Optimizers

    f. Learning rates

课件

Suggested Readings:

Lecture 05: Linguistic Structure: Dependency Parsing

  1. Syntactic Structure: Consistency and Dependency (25 mins)

  2. Dependency Grammar and Treebanks (15 mins)

  3. Transition-based dependency parsing (15 mins)

  4. Neural dependency parsing (15 mins)

  • 短语结构,依赖结构

Suggested Readings:

Assignment 3

Lecture 06: The probability of a sentence? Recurrent Neural Networks and Language Models

Recurrent Neural Networks (RNNs) and why they’re great for Language Modeling (LM).

  • 语言模型

  • RNN

Suggested Readings:

Lecture 07: Vanishing Gradients and Fancy RNNs

  • Problems with RNNs and how to fix them

  • More complex RNN variants

  • 梯度消失

  • LSTM和GRU

Suggested Readings:

Assignment 4

Lecture 08: Machine Translation, Seq2Seq and Attention

How we can do Neural Machine Translation (NMT) using an RNN based architecture called sequence to sequence with attention

  • 机器翻译:

    • 1.1950s,早期是基于规则的,利用词典翻译;

    • 2.1990s-2010s,基于统计的机器翻译(SMT),从数据中学习统计模型,贝叶斯规则,考虑翻译和句子语法流畅。对齐:一对多,多对一,多对多。

    • 3.2014-,基于神经网络的机器翻译(NMT),seq2seq,两个RNNs。seq2seq任务有:总结(长文本到短文本),对话,解析,代码生成(自然语言到代码)。贪心解码。束搜索解码

    • 评估方式:BLEU(Bilingual Evaluation Understudy)

    • 未解决的问题:词汇表之外的词,领域不匹配,保持较长文本的上下文,低资源语料少,没有加入常识,从训练数据中学到了偏见,无法解释的翻译,

    • Attention。

Suggested Readings:

Lecture 09: Practical Tips for Final Projects

  1. Final project types and details; assessment revisited

  2. Finding research topics; a couple of examples

  3. Finding data

  4. Review of gated neural sequence models

  5. A couple of MT topics

  6. Doing your research

  7. Presenting your results and evaluation

  • 默认的项目是问答系统SQuAD

  • 数据:

    • Look at Kaggle,research papers,lists of datasets

Suggested Readings:

Lecture 10: Question Answering and the Default Final Project

  1. Final final project notes, etc.

  2. Motivation/History

  3. The SQuAD dataset

  4. The Stanford Attentive Reader model

  5. BiDAF

  6. Recent, more advanced architectures

  7. ELMo and BERT preview

  • 两个部分:寻找那些可能包含答案的文档(信息检索),从文档或段落中找答案(阅读理解)

  • 阅读理解的历史,2013年MCTest:P+Q——>A,2015/16:CNN/DM、SQuAD数据集

  • 开放领域问答的历史:1964年是依赖解析和匹配,1993年线上百科全书,1999年设立TREC问答,2011年IBM的DeepQA系统,2016年用神经网络和信息检索IR

  • SQuAD数据集,评估方法

  • 斯坦福的简单模型:Attentive Reader model,预测回答文本的起始位置和结束位置

  • BiDAF

Project Proposal

Default Final Project

Lecture 11: ConvNets for NLP

  1. Announcements (5 mins)

  2. Intro to CNNs (20 mins)

  3. Simple CNN for Sentence Classification: Yoon (2014) (20 mins)

  4. CNN potpourri (5 mins)

  5. Deep CNN for Sentence Classification: Conneau et al. (2017)

    (10 mins)

  6. Quasi-recurrent Neural Networks (10 mins)

  • CNN

  • 句子分类

Suggested Readings:

Lecture 12: Information from parts of words: Subword Models

  1. A tiny bit of linguistics (10 mins)

  2. Purely character-level models (10 mins)

  3. Subword-models: Byte Pair Encoding and friends (20 mins)

  4. Hybrid character and word level models (30 mins)

  5. fastText (5 mins)

  • Suggested readings:

Assignment 5

Lecture 13: Modeling contexts of use: Contextual Representations and Pretraining

Suggested readings:

Lecture 14: Transformers and Self-Attention For Generative Models(guest lecture by Ashish Vaswani and Anna Huang)

Suggested readings:

Project Milestone

Lecture 15: Natural Language Generation

Lecture 16: Reference in Language and Coreference Resolution

Lecture 17: Multitask Learning: A general model for NLP? (guest lecture by Richard Socher)

Lecture 18: Constituency Parsing and Tree Recursive Neural Networks

Suggested Readings:

Lecture 19: Safety, Bias, and Fairness (guest lecture by Margaret Mitchell)

Lecture 20: Future of NLP + Deep Learning

(该博客分为2个部分,skipgram思想,以及改进训练方法:下采样和负采样)

(上述文章的翻译)

(word2vec用于推荐和广告)

(original word2vec paper)(没太看懂,之后再看一遍)

(negative sampling paper)

(推荐了一些很好的资料)

[] []

Gensim word vector visualization[] []

(original GloVe paper)

(很详细易懂,讲解了GloVe模型的思想)

Python review[]

[] []

[]

[] []

(textbook chapter)

(blog post overview)

(Sections 10.1 and 10.2)

(Sections 10.3, 10.5, 10.7-10.12)

(one of the original vanishing gradient papers)

(proof of vanishing gradient problem)

(demo for feedforward networks)

(blog post overview)

[] [] [] []

(lectures 2/3/4)

(book by Philipp Koehn)

(original paper)

(original seq2seq NMT paper)

(early seq2seq speech recognition paper)

(original seq2seq+attention paper)

(blog post overview)

(practical advice for hyperparameter choices)

Look at ACL anthology for NLP papers:

(Deep Learning book chapter)

[]

[] []

Minh-Thang Luong and Christopher Manning.

[ / ] []

[] []

Smith, Noah A. . (Published just in time for this lecture!)

[] []

[]

[] []

[] []

[] []

[] [] []

[] []

[] []

Final project poster session []

Final Project Report due []

Project Poster/Video due []

Course page
Video page
可选字幕版
纯中文字幕版
CS224n-2019 学习笔记
斯坦福CS224N深度学习自然语言处理2019冬学习笔记目录
Speech and Language Processing (3rd ed. draft)
Natural Language Processing
A Primer on Neural Network Models for Natural Language Processing
Deep Learning
Neural Networks and Deep Learning
Introduction to Deep Learning
cs224n-2019-lecture01-wordvecs1
cs224n-2019-notes01-wordvecs1
Word2Vec Tutorial - The Skip-Gram Model
理解 Word2Vec 之 Skip-Gram 模型
Applying word2vec to Recommenders and Advertising
Efficient Estimation of Word Representations in Vector Space
Distributed Representations of Words and Phrases and their Compositionality
[NLP] 秒懂词向量Word2vec的本质
code
preview
code
preview
cs224n-2019-lecture02-wordvecs2
cs224n-2019-notes02-wordvecs2
GloVe: Global Vectors for Word Representation
Improving Distributional Similarity with Lessons Learned from Word Embeddings
Evaluation methods for unsupervised word embeddings
A Latent Variable Model Approach to PMI-based Word Embeddings
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
On the Dimensionality of Word Embedding.
理解GloVe模型(+总结)
slides
cs224n-2019-lecture03-neuralnets
matrix calculus notes
cs224n-2019-notes03-neuralnets
CS231n notes on backprop
Review of differential calculus
Natural Language Processing (Almost) from Scratch
code
handout
cs224n-2019-lecture04-backprop
cs224n-2019-notes03-neuralnets
CS231n notes on network architectures
Learning Representations by Backpropagating Errors
Derivatives, Backpropagation, and Vectorization
Yes you should understand backprop
cs224n-2019-lecture05-dep-parsing
scrawled-on slides
cs224n-2019-notes04-dependencyparsing
Incrementality in Deterministic Dependency Parsing
A Fast and Accurate Dependency Parser using Neural Networks
Dependency Parsing
Globally Normalized Transition-Based Neural Networks
Universal Stanford Dependencies: A cross-linguistic typology
Universal Dependencies website
code
handout
cs224n-2019-lecture06-rnnlm
cs224n-2019-notes05-LM_RNN
N-gram Language Models
The Unreasonable Effectiveness of Recurrent Neural Networks
Sequence Modeling: Recurrent and Recursive Neural Nets
On Chomsky and the Two Cultures of Statistical Learning
cs224n-2019-lecture07-fancy-rnn
cs224n-2019-notes05-LM_RNN
Sequence Modeling: Recurrent and Recursive Neural Nets
Learning long-term dependencies with gradient descent is difficult
On the difficulty of training Recurrent Neural Networks
Vanishing Gradients Jupyter Notebook
Understanding LSTM Networks
code
handout
Azure Guide
Practical Guide to VMs
cs224n-2019-lecture08-nmt
cs224n-2019-notes06-NMT_seq2seq_attention
Statistical Machine Translation slides, CS224n 2015
Statistical Machine Translation
BLEU
Sequence to Sequence Learning with Neural Networks
Sequence Transduction with Recurrent Neural Networks
Neural Machine Translation by Jointly Learning to Align and Translate
Attention and Augmented Recurrent Neural Networks
Massive Exploration of Neural Machine Translation Architectures
cs224n-2019-lecture09-final-projects
https://aclanthology.info
https://paperswithcode.com/sota
https://catalog.ldc.upenn.edu/
http://statmt.org
https://universaldependencies.org
https://machinelearningmastery.com/datasets-natural-languageprocessing/
https://github.com/niderhoff/nlp-datasets
final-project-practical-tips
Practical Methodology
cs224n-2019-lecture10-QA
cs224n-2019-notes07-QA
instructions
handout
code
cs224n-2019-lecture11-convnets
cs224n-2019-notes08-CNN
Convolutional Neural Networks for Sentence Classification
A Convolutional Neural Network for Modelling Sentences
cs224n-2019-lecture12-subwords
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
original code (requires Stanford login)
public version
handout
slides
video
Contextual Word Representations: A Contextual Introduction
The Illustrated BERT, ELMo, and co.
slides
video
Attention is all you need
Image Transformer
Music Transformer: Generating music with long-term structure
instructions
slides
video
slides
video
slides
video
slides
video
notes
Parsing with Compositional Vector Grammars.
Constituency Parsing with a Self-Attentive Encoder
slides
video
slides
video
details
instructions
instructions