📓
Study
  • README
  • Application
    • Contest
      • 竞赛trick
  • Basic Know
    • 半监督学习
    • 贝叶斯
      • 朴素贝叶斯分类器
    • 对抗训练
    • 概率图模型
      • CRF
      • HMM
      • 概率图模型
    • 关联分析
    • 归纳偏置
      • [什么是 Inductive bias(归纳偏置)?](BasicKnow/归纳偏置/什么是 Inductive bias(归纳偏置)?.md)
    • 聚类
    • 决策树
    • 绿色深度学习
    • 树模型&集成学习
      • 提升树
      • Ada Boost
      • [集成学习]
    • 特征工程
      • 数据分桶
      • 特征工程概述
      • 特征选择
      • LDA
      • PCA
    • 线性模型
      • 感知机
      • 最大熵模型
      • SVM
        • SVM支持向量机
      • 逻辑回归
      • 线性回归
    • 优化算法
      • 拉格朗日对偶性
      • 牛顿法
        • 牛顿法&拟牛顿法
      • 梯度下降法
        • 梯度下降算法
      • 优化算法
    • 预处理
      • [1-1]正则表达式
      • [1-2]文本预处理
      • [1-3]词性
      • [1-4]语法分析
      • [1-6]文本分类
      • [1-7]网络爬取
      • 【备用】正则表达式
      • 7.re模块
      • 词典匹配
      • 分词
      • 子表达式
      • Todo
    • 主题模型
      • LDA
    • Deep Learning
      • 反向传播
      • 梯度消失&梯度爆炸
      • Batch Size
      • 1.DLbasis
      • 小概念
      • MLstrategy
      • CNN
      • RNN及其应用
      • 关于深度学习实践
      • 神经网络概述
      • Batch Normalization
      • Program CNN
      • Program D Lbasis
      • Program DN Nimprove
      • Program Neural Style Transfer
      • Summer DL
    • EM算法
    • GAN
      • Gans In Action Master
    • GNN
      • 搜广推之GNN
      • Representation Learning
        • Anomalydetection
        • Conclusion
        • Others
        • Papernotes
        • Recommadation
    • k近邻法
      • K近邻
    • Language Model
      • 语言模型解码采样策略
      • [1-1][语言模型]从N-gram模型讲起
      • [1-2][语言模型]NNLM(神经网络语言模型)
      • [1-3][语言模型]基于RNN的语言模型
      • [1-4][语言模型]用N-gram来做完形填空
      • [1-5][语言模型]用KenLM来做完形填空
    • Loss Function
      • 常用损失函数
      • Focal Loss
      • softmax+交叉熵
    • Machine Learning
      • [基础]概念
      • 待整合
      • 交叉验证
      • 无监督学习
      • 优缺点
      • ML Yearning
      • SVD
    • Statistics Math
      • 程序员的数学基础课
      • 数学基础
      • 统计&高数
      • 统计题目
      • 线性代数
      • 组合数学
      • Discrete Choice Model
      • Nested Choice Model
  • Course Note
    • 基于TensorFlow的机器学习速成课程
      • [Key ML Terminology](CourseNote/基于TensorFlow的机器学习速成课程/Key ML Terminology.md)
    • 集训营
      • 任务说明
      • 算法实践1.1模型构建
      • 算法实践1.2模型构建之集成模型
      • 算法实践2.1数据预处理
    • 李宏毅机器学习
      • 10DNN训练Tips
        • Chapter 18
      • 16无监督学习
        • Chapter 25
    • 贪心NLP
      • 贪心NLP笔记
    • Cs 224 N 2019
      • [A Simple But Tough To Beat Baseline For Sentence Embeddings](CourseNote/cs224n2019/A Simple but Tough-to-beat Baseline for Sentence Embeddings.md)
      • [Lecture 01 Introduction And Word Vectors](CourseNote/cs224n2019/Lecture 01 Introduction and Word Vectors.md)
      • [Lecture 02 Word Vectors 2 And Word Senses](CourseNote/cs224n2019/Lecture 02 Word Vectors 2 and Word Senses.md)
      • [Lecture 03 Word Window Classification Neural Networks And Matrix Calculus](CourseNote/cs224n2019/Lecture 03 Word Window Classification, Neural Networks, and Matrix Calculus.md)
      • [Lecture 04 Backpropagation And Computation Graphs](CourseNote/cs224n2019/Lecture 04 Backpropagation and Computation Graphs.md)
      • [Lecture 05 Linguistic Structure Dependency Parsing](CourseNote/cs224n2019/Lecture 05 Linguistic Structure Dependency Parsing.md)
      • [Lecture 06 The Probability Of A Sentence Recurrent Neural Networks And Language Models](CourseNote/cs224n2019/Lecture 06 The probability of a sentence Recurrent Neural Networks and Language Models.md)
      • Stanford NLP
    • Deep Learning Book Goodfellow
      • Books
        • Deep Learning Book Chapter Summaries Master
      • 提纲
      • C 5
      • C 6
      • [Part I Applied Math And Machine Learning Basics](CourseNote/Deep-Learning-Book-Goodfellow/Part I - Applied Math and Machine Learning basics.md)
    • Lihang
    • NLP实战高手课
      • 极客时间_NLP实战高手课
    • 工具&资料
    • 机器学习、深度学习面试知识点汇总
    • 七月kaggle课程
    • 算法工程师
    • 贪心科技机器学习必修知识点特训营
    • 唐宇迪机器学习
    • 语言及工具
    • AI技术内参
    • Suggestions
  • Data Related
    • 数据质量
      • 置信学习
    • 自然语言处理中的数据增广_车万翔
      • 自然语言处理中的数据增广
    • Mixup
    • 数据不均衡问题
    • 数据增强的方法
  • Knowledge Graph
    • Information Extraction
      • 联合抽取
        • PRGC
      • Code
        • BERT微调
      • NER
        • 阅读理解做NER
          • MRC
        • FLAT
        • Global Pointer
        • 命名实体识别NER
    • Keyword Extraction
      • 关键词抽取
    • 小米在知识表示学习的探索与实践
    • KG
  • Multi Task
    • EXT 5
      • Ex T 5
  • NLG
    • Dailogue
      • 比赛
        • 对话评估比赛
          • [simpread-DSTC10 开放领域对话评估比赛冠军方法总结](NLG/Dailogue/比赛/对话评估比赛/simpread-DSTC10 开放领域对话评估比赛冠军方法总结.md)
      • 任务型对话
        • DST
          • DST概述
        • NLG
          • NLG概述
        • NLU
          • NLU概述
        • 任务型对话概述
        • simpread-任务型对话系统预训练最新研究进展
      • 问答型对话
        • 检索式问答
          • 基于预训练模型的检索式对话系统
          • 检索式文本问答
        • 业界分享
          • 低资源场景下的知识图谱表示学习和问答_阿里_李杨
          • QQ浏览器搜索智能问答
        • 问答型对话系统概述
      • 闲聊型对话
        • 闲聊型对话系统概述
      • 业界分享
        • 人工智能与心理咨询
        • 腾讯多轮对话机器人
        • 微软小冰
        • 小布助手闲聊生成式算法
        • 美团智能客服实践_江会星
        • 去哪儿智能客服探索和实践
        • 实时语音对话场景下的算法实践_阿里_陈克寒
        • 智能语音交互中的无效query识别_小米_崔世起
        • UNIT智能对话
      • 主动对话
      • EVA
        • EVA分享
        • EVA模型
      • PLATO
      • RASA
    • Machine Translation
      • 业界分享
        • 爱奇艺台词翻译分享
      • Paper
        • Deep Encoder Shallow Decoder
    • RAGRelated
    • Text 2 SQL
      • M SQL
        • [M SQL 2](NLG/Text2SQL/M-SQL/M-SQL (2).md)
      • [Text2SQL Baseline解析](NLG/Text2SQL/Text2SQL Baseline解析.md)
      • Text 2 SQL
    • Text Summarization
      • [文本摘要][paper]CTRLSUM
      • 文本摘要
  • Pre Training
    • 业界分享
      • 超大语言模型与语言理解_黄民烈
        • 超大语言模型与语言理解
      • 大模型的加速算法_腾讯微信
        • 大模型的加速算法
      • 孟子轻量化预训练模型
      • 悟道文汇文图生成模型
      • 悟道文澜图文多模态大模型
      • 语义驱动可视化内容创造_微软
        • 语义驱动可视化内容创造
    • Base
      • Attention
      • Mask
        • NLP中的Mask
      • Position Encoding
        • 位置编码
    • BERT
      • ALBERT
      • Bert
        • Venv
          • Lib
            • Site Packages
              • idna-3.2.dist-info
                • LICENSE
              • Markdown-3.3.4.dist-info
                • LICENSE
              • Tensorflow
                • Include
                  • External
                    • Libjpeg Turbo
                      • LICENSE
                  • Unsupported
                    • Eigen
                      • CXX 11
                        • Src
                          • Tensor
              • Werkzeug
                • Debug
                  • Shared
                    • ICON LICENSE
        • CONTRIBUTING
        • Multilingual
      • Ro BER Ta
      • BERT
      • BERT面试问答
      • BERT源码解析
      • NSP BERT
    • BERT Flow
    • BERT Zip
      • Distilling The Knowledge In A Neural Network
      • TINYBERT
      • 模型压缩
    • CPM
    • CPT
      • 兼顾理解和生成的中文预训练模型CPT
    • ELECTRA
    • EL Mo
    • ERNIE系列语言模型
    • GPT
    • MBART
    • NEZHA
    • NLG Sum
      • [simpread-预训练时代下的文本生成|模型 & 技巧](Pre-training/NLGSum/simpread-预训练时代下的文本生成|模型 & 技巧.md)
    • Prompt
      • 预训练模型的提示学习方法_刘知远
        • 预训练模型的提示学习方法
    • T 5
      • Unified SKG
      • T 5
    • Transformer
    • Uni LM
    • XL Net
    • 预训练语言模型
    • BERT变种
  • Recsys
    • 多任务Multi-task&推荐
    • 推荐介绍
    • 推荐系统之召回与精排
      • 代码
        • Python
          • Recall
            • Deep Match Master
              • Docs
                • Source
                  • Examples
                  • FAQ
                  • Features
                  • History
                  • Model Methods
                  • Quick Start
    • 业界分享
      • 腾讯基于知识图谱长视频推荐
    • 召回
    • Sparrow Rec Sys
    • 深度学习推荐系统实战
    • 推荐模型
    • Deep FM
  • Search
    • 搜索
    • 业界分享
      • 爱奇艺搜索排序算法实践
      • 语义搜索技术和应用
    • 查询关键字理解
    • 搜索排序
    • BM 25
    • KDD21-淘宝搜索中语义向量检索技术
    • query理解
    • TFIDF
  • Self Supervised Learning
    • Contrastive Learning
      • 业界分享
        • 对比学习在微博内容表示的应用_张俊林
      • Paper
      • R Drop
      • Sim CSE
    • 自监督学习
  • Text Classification
    • [多标签分类(Multi-label Classification)](TextClassification/多标签分类(Multi-label Classification)/多标签分类(Multi-label Classification).md)
    • Fast Text
    • Text CNN
    • 文本分类
  • Text Matching
    • 文本匹配和多轮检索
    • CNN SIM
    • Word Embedding
      • Skip Gram
      • Glove
      • Word 2 Vec
    • 文本匹配概述
  • Tool
    • 埋点
    • 向量检索(Faiss等)
    • Bigdata
      • 大数据基础task1_创建虚拟机+熟悉linux
      • 任务链接
      • Mr
      • Task1参考答案
      • Task2参考答案
      • Task3参考答案
      • Task4参考答案
      • Task5参考答案
    • Docker
    • Elasticsearch
    • Keras
    • Numpy
    • Python
      • 可视化
        • Interactivegraphics
        • Matplotlib
        • Tkinter
        • Turtle
      • 数据类型
        • Datatype
      • python爬虫
        • Python Scraping Master
          • phantomjs-2.1.1-windows
        • Regularexp
        • Scrapying
        • Selenium
      • 代码优化
      • 一行代码
      • 用python进行语言检测
      • Debug
      • Exception
      • [Features Tricks](Tool/python/Features & Tricks.md)
      • Fileprocess
      • Format
      • Functional Programming
      • I Python
      • Magic
      • Math
      • Os
      • Others
      • Pandas
      • Python Datastructure
      • Python操作数据库
      • Streamlit
      • Time
    • Pytorch
      • Dive Into DL Py Torch
        • 02 Softmax And Classification
        • 03 Mlp
        • 04 Underfit Overfit
        • 05 Gradient Vanishing Exploding
        • 06 Text Preprocess
        • 07 Language Model
        • 08 Rnn Basics
        • 09 Machine Translation
        • 10 Attention Seq 2 Seq
        • 11 Transformer
        • 12 Cnn
        • 14 Batchnorm Resnet
        • 15 Convexoptim
        • 16 Gradientdescent
        • 17 Optim Advance
    • Spark
      • Pyspark
        • pyspark之填充缺失的时间数据
      • Spark
    • SQL
      • 数据库
      • Hive Sql
      • MySQL实战45讲
    • Tensor Flow
      • TensorFlow入门
  • Common
  • NLP知识体系
Powered by GitBook
On this page
  • 资料
  • paper
  • 概述
  • Iteration Based Methods - 词向量表示Word2vec
  • Evaluation of Word Vectors
  • About Project
  • Word Window分类与神经网络
  • Dependency Parsing

Was this helpful?

  1. Course Note
  2. Cs 224 N 2019

Stanford NLP

Previous[Lecture 06 The Probability Of A Sentence Recurrent Neural Networks And Language Models](CourseNote/cs224n2019/Lecture 06 The probability of a sentence Recurrent Neural Networks and Language Models.md)NextDeep Learning Book Goodfellow

Last updated 3 years ago

Was this helpful?

资料

课程主页: https://web.stanford.edu/class/cs224n /

中文笔记: http://www.hankcs.com/nlp/cs224n-introduction-to-nlp-and-deep-learning.html

视频:https://www.bilibili.com/video/av30326868/?spm_id_from=333

http://www.mooc.ai/course/494

学习笔记:http://www.hankcs.com/nlp/cs224n-introduction-to-nlp-and-deep-learning.html

实验环境推荐使用Linux或者Mac系统,以下环境搭建方法皆适用:

· Docker环境配置: https://github.com/ufoym/deepo · 本地环境配置: https://github.com/learning511/cs224n-learning-camp/blob/master/environment.md

训练营: https://github.com/learning511/cs224n-learning-camp

清华大学NLP实验室总结的机器阅读论文和数据:

https://github.com/thunlp/RCPapers?utm_source=wechat_session&utm_medium=social&utm_oi=804719261191909376

重要的一些资源:

深度学习斯坦福教程:

廖雪峰python3教程:

github教程:

==数学工具==

斯坦福资料:

中文资料:

  • 大学数学课本(从故纸堆里翻出来^_^)

==编程工具==

斯坦福资料:

中文资料:

  • 廖雪峰python3教程(链接地址:

paper

1.A Simple but Tough-to-beat Baseline for Sentence Embeddings

Sanjeev Arora, Yingyu Liang, Tengyu Ma Princeton University In submission to ICLR 2017

2.Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski

3.Distributed Representations of Words and Phrases and their ComposiRonality (Mikolov et al. 2013)

4..GloVe: Global Vectors for Word Representation (Pennington et al. (2014)

Word Vector Analogies: SyntacRc and Semantic examples from http://code.google.com/p/word2vec/source/browse/trunk/questionswords.txt

Word vector distances and their correlation with human judgments Example dataset: WordSim353 http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/

5.Improving Word Representations Via Global Context And Multiple Word Prototypes (Huang et al. 2012)

  1. Gather fixed size context windows of all occurrences of the word (for instance, 5 before and 5 after)

  2. Each context is represented by a weighted average of the context words’ vectors (using idf-weighting)

  3. Apply spherical k-means to cluster these context representations.

  4. Finally, each word occurrence is re-labeled to its associated cluster and is used to train the word representation for that cluster.

6.Bag of Tricks for Efficient Text Classification Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov Facebook AI Research

● fastText is often on par with deep learning classifiers ● fastText takes seconds, instead of days ● Can learn vector representations of words in different languages (with performance better than word2vec!)

概述

NLP levels

作为输入一共有两个来源,语音与文本。所以第一级是语音识别和OCR或分词(事实上,跳过分词虽然理所当然地不能做句法分析,但字符级也可以直接做不少应用)。接下来是形态学,援引《统计自然语言处理》中的定义:

形态学(morphology):形态学(又称“词汇形态学”或“词法”)是语言学的一个分支,研究词的内部结构,包括屈折变化和构词法两个部分。由于词具有语音特征、句法特征和语义特征,形态学处于音位学、句法学和语义学的结合部位,所以形态学是每个语言学家都要关注的一门学科[Matthews,2000]。

下面的是句法分析和语义分析,最后面的在中文中似乎翻译做“对话分析”,需要根据上文语境理解下文。

自然语言处理应用

一个小子集,从简单到复杂有:

  • 拼写检查、关键词检索……

  • 文本挖掘(产品价格、日期、时间、地点、人名、公司名)

  • 文本分类

  • 机器翻译

  • 客服系统

  • 复杂对话系统

在工业界从搜索到广告投放、自动\辅助翻译、情感舆情分析、语音识别、聊天机器人\管家等等五花八门。

深度学习是表示学习的一部分,用来学习原始输入的多层特征表示

Iteration Based Methods - 词向量表示Word2vec

Two algorithms

Skip-grams (SG) Predict context words given target (position independent)

Continuous Bag of Words (CBOW) Predict target word from bag-of-words context

Two (moderately efficient) training methods

Hierarchical softmax

Negative sampling

hierarchical softmax tends to be better for infrequent words, while negative sampling works better for frequent words and lower dimensional vectors.

主要思路:

遍历整个语料库中的每个词

预测每个词的上下文:$p(o|c)=\frac{exp(u_o^Tv_c)}{\sum_{w=1}^v exp(u_w^Tv_c)}$

然后在每个窗口中计算梯度做SGD

word2vec存在的问题

  1. word2vec在计算梯度时会遇到梯度稀疏的问题

可以考虑每次只更新那些真实出现的单词的向量。

解决方案有两种:一种是需要有一个稀疏矩阵更新操作,每次只更新向量矩阵的某些列。第二种方法是为单词建立到词向量的hash映射。

If you have millions of word vectors and do distributed computing, it is important to not have to send gigantic updates around!

  1. word2vec的归一化因子部分计算复杂度太高

解决方法:使用负采样(negative sampling)实现skip-gram。具体做法是,对每个正例(中央词语及上下文中的一个词语)采样几个负例(中央词语和其他随机词语),训练binary logistic regression(也就是二分类器)。

目标函数:

$J_t(θ)=logσ(u^T_ov_c)+∑_{i=1}^kE_{j∼P(w)}[logσ(−u^T_jv_c)]$

这里t是某个窗口,k是采样个数,P(w)是一个unigram分布

这个目标函数就是要最大化中央词与上下文的相关概率,最小化与其他词语的概率。

$P(w)=U(w)^{3/4}/Z$

这样使得不那么常见的单词被采样的次数更多。

word2vec将窗口视作训练单位,每个窗口或者几个窗口都要进行一次参数更新。要知道,很多词串出现的频次是很高的。能不能遍历一遍语料,迅速得到结果呢?

早在word2vec之前,就已经出现了很多得到词向量的方法,这些方法是基于统计共现矩阵的方法。如果在窗口级别上统计词性和语义共现,可以得到相似的词。如果在文档级别上统计,则会得到相似的文档(潜在语义分析LSA)。

基于窗口的共现矩阵

在某个窗口范围内,两个词共同出现的次数组成的矩阵。

根据这个矩阵,的确可以得到简单的共现向量。但是它存在非常多的局限性:

  • 当出现新词的时候,以前的旧向量连维度都得改变

  • 高纬度(词表大小)

  • 高稀疏性

解决办法:低维向量

用25到1000的低维稠密向量来储存重要信息。如何降维呢?

方法:SVD,但是类似于the,he,has这样的词频次太高

改进

  • 限制高频词的频次,或者干脆停用词

  • 根据与中央词的距离衰减词频权重

  • 用皮尔逊相关系数代替词频

SVD的问题

  • 计算复杂度高:对n×m的矩阵是$O(mn^2)$

  • 不方便处理新词或新文档

  • 与其他DL模型训练套路不同

Count based vs direct prediction

这些基于计数的方法在中小规模语料训练很快,有效地利用了统计信息。但用途受限于捕捉词语相似度,也无法拓展到大规模语料。

而NNLM, HLBL, RNN, Skip-gram/CBOW这类进行预测的模型必须遍历所有的窗口训练,也无法有效利用单词的全局统计信息。但它们显著地提高了上级NLP任务,其捕捉的不仅限于词语相似度。

综合两者优势:GloVe

##高级词向量表示Global Vectors for Word Representation (GloVe)

这种模型的目标函数是:

$J =\frac12\sum_{i,j=1}^W f(P_{ij})(\mu_i^Tv_j - logP_{ij})^2$

这里的Pij是两个词共现的频次,f是一个max函数:

优点是训练快,可以拓展到大规模语料,也适用于小规模语料和小向量。

这里面有两个向量u和v,它们都捕捉了共现信息,怎么处理呢?试验证明,最佳方案是简单地加起来:

$X_{final}=U+V$

相对于word2vec只关注窗口内的共现,GloVe这个命名也说明这是全局的(我觉得word2vec在全部语料上取窗口,也不是那么地local,特别是负采样)。

评测方法

有两种方法:Intrinsic(内部) vs extrinsic(外部)

Intrinsic:专门设计单独的试验,由人工标注词语或句子相似度,与模型结果对比。好处是计算速度快,但不知道对实际应用有无帮助。有人花了几年时间提高了在某个数据集上的分数,当将其词向量用于真实任务时并没有多少提高效果。

Extrinsic:通过对外部实际应用的效果提升来体现。耗时较长,不能排除是否是新的词向量与旧系统的某种契合度产生。需要至少两个subsystems同时证明。这类评测中,往往会用pre-train的向量在外部任务的语料上retrain。

Intrinsic word vector evaluation

也就是词向量类推,或说“A对于B来讲就相当于C对于哪个词?”。这可以通过余弦夹角得到:

a:b :: c:? $d=argmax_i\frac{(x_b-x_a+x_c)^Tx_i}{||x_b-x_a+x_c||}$

这种方法可视化出来,会发现这些类推的向量都是近似平行的

word2vec还可以做语法上的类比,比如slow——slower——slowest这种比较级形式

实验中,GloVe的效果显著地更好。另外,高纬度并不一定好。而数据量越多越好。

调参

窗口是否对称(还是只考虑前面的单词),向量维度,窗口大小

大约300维,窗口大小8的对称窗口效果挺好的,考虑到成本。

对GloVe来讲,迭代次数越多越小,效果很稳定

适合word vector的任务,比如单词分类。有些不太适合的任务,比如情感分析。

消歧,中心思想是通过对上下文的聚类分门别类地重新训练。

we have looked at two main classes of methods to find word embeddings.

The first set are count-based and rely on matrix factorization (e.g. LSA, HAL). While these methods effectively leverage global statistical information, they are primarily used to capture word similarities and do poorly on tasks such as word analogy, indicating a sub-optimal vector space structure.

The other set of methods are shallow window-based (e.g. the skip-gram and the CBOW models), which learn word embeddings by making predictions in local context windows. These models demonstrate the capacity to capture complex linguistic patterns beyond word similarity, but fail to make use of the global co-occurrence statistics.

Glove:Using global statistics to predict the probability of word j appearing in the context of word i with a least squares objective

1.Co-occurrence Matrix

$X_{ij}$ :the number of times word j occur in the context of word i

2.Least Squares Objective

$J = \sum_{i=1}^W\sum_{j=1}^W f(X_{ij})(\mu_j^Tv_i - logX_{ij})^2$

In conclusion, the GloVe model efficiently leverages global statistical information by training only on the nonzero elements in a wordword co-occurrence matrix, and produces a vector space with meaningful sub-structure. It consistently outperforms word2vec on the word analogy task, given the same corpus, vocabulary, window size, and training time. It achieves better results faster, and also obtains the best results irrespective of speed.

Evaluation of Word Vectors

Intrinsic Evaluation

Intrinsic evaluation of word vectors is the evaluation of a set of word vectors generated by an embedding technique (such as Word2Vec or GloVe) on specific intermediate subtasks (such as analogy completion).

Intrinsic evaluation: • Evaluation on a specific, intermediate task • Fast to compute performance • Helps understand subsystem • Needs positive correlation with real task to determine usefulness

A popular choice for intrinsic evaluation of word vectors is its performance in completing word vector analogies.

Extrinsic Evaluation

Extrinsic evaluation of word vectors is the evaluation of a set of word vectors generated by an embedding technique on the real task at hand.

Extrinsic evaluation: • Is the evaluation on a real task • Can be slow to compute performance • Unclear if subsystem is the problem, other subsystems, or internal interactions • If replacing subsystem improves performance, the change is likely good

Most NLP extrinsic tasks can be formulated as classification tasks.

About Project

project types:

  1. Apply existing neural network model to a new task

  2. Implement a complex neural architecture

  3. Come up with a new neural network model

  4. Theory of deep learning, e.g. optimization

Apply Existing NNets to Tasks

  1. Define Task: • Example: Summarization

  2. Define Dataset

    1. Search for academic datasets • They already have baselines • E.g.: Document Understanding Conference (DUC)

    2. Define your own (harder, need more new baselines) • If you’re a graduate student: connect to your research • Summarization, Wikipedia: Intro paragraph and rest of large article • Be creative: Twitter, Blogs, News

  3. Define your metric • Search online for well established metrics on this task • Summarization: Rouge (Recall-Oriented Understudy for Gisting Evaluation) which defines n-gram overlap to human summaries

  4. Split your dataset! • Train/Dev/Test • Academic dataset often come pre-split • Don’t look at the test split until ~1 week before deadline! (or at most once a week)

  5. Establish a baseline • Implement the simplest model (often logistic regression on unigrams and bigrams) first • Compute metrics on train AND dev • Analyze errors • If metrics are amazing and no errors: done, problem was too easy, restart :)

  6. Implement existing neural net model • Compute metric on train and dev • Analyze output and errors • Minimum bar for this class

  7. Always be close to your data! • Visualize the dataset • Collect summary statistics • Look at errors • Analyze how different hyperparameters affect performance

  8. Try out different model variants • Soon you will have more options • Word vector averaging model (neural bag of words) • Fixed window neural model • Recurrent neural network • Recursive neural network • Convolutional neural network

A New Model -- Advanced Option

• Do all other steps first (Start early!) • Gain intuition of why existing models are flawed • Talk to researcher/mentor, come to project office hours a lot • Implement new models and iterate quickly over ideas • Set up efficient experimental framework • Build simpler new models first • Example Summarization: • Average word vectors per paragraph, then greedy search • Implement language model (introduced later) • Stretch goal: Generate summary with seq2seq!

Project Ideas • Summarization • NER, like PSet 2 but with larger data Natural Language Processing (almost) from Scratch, Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa, http://arxiv.org/abs/1103.0398 • Simple question answering, A Neural Network for Factoid Question Answering over Paragraphs, Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher and Hal Daumé III (EMNLP 2014) • Image to text mapping or generation, Grounded Compositional Semantics for Finding and Describing Images with Sentences, Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng. (TACL 2014) or Deep Visual-Semantic Alignments for Generating Image Descriptions, Andrej Karpathy, Li Fei-Fei • Entity level sentiment • Use DL to solve an NLP challenge on kaggle, Develop a scoring algorithm for student-written short-answer responses, https://www.kaggle.com/c/asap-sas

Another example project: Sentiment • Sentiment on movie reviews: http://nlp.stanford.edu/sentiment/ • Lots of deep learning baselines and methods have been tried

And here are some NLP datasets:

https://web.stanford.edu/class/cs224n/project.html

Word Window分类与神经网络

Dependency Parsing

Dependency Grammar and Dependency Structure

Parse trees in NLP, analogous to those in compilers, are used to analyze the syntactic structure of sentences.

There are two main types of structures used - constituency structures and dependency structures.

Constituency Grammar uses phrase structure grammar to organize words into nested constituents.

Dependency structure of sentences shows which words depend on (modify or are arguments of) which other words.

Figure 1: Dependency tree for the sentence "Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas"

1.1 Dependency Parsing

Dependency parsing is the task of analyzing the syntactic dependency structure of a given input sentence S. The output of a dependency parser is a dependency tree where the words of the input sentence are connected by typed dependency relations.

there are two subproblems in dependency parsing:

  1. Learning: Given a training set D of sentences annotated with dependency graphs, induce a parsing model M that can be used to parse new sentences.

  2. Parsing: Given a parsing model M and a sentence S, derive the optimal dependency graph D for S according to M.

1.2 Transition-Based Dependency Parsing

Transition-based dependency parsing relies on a state machine which defines the possible transitions to create the mapping from the input sentence to the dependency tree.

1.3 Greedy Deterministic Transition-Based Parsing

1.4 Neural Dependency Parsing

莫烦机器学习教程: /

深度学习经典论文:

斯坦福cs229代码(机器学习算法python徒手实现):

博客:

线性代数(链接地址: )

概率论(链接地址: )

凸函数优化( )

随机梯度下降算法(链接地址: /)

机器学习中的数学基本知识(链接地址: )

统计学习方法(链接地址: )

Python复习(链接地址: )

TensorFlow教程(链接地址: )

)

莫烦TensorFlow教程(链接地址: /)

Sequence Tagging: and

and

http://deeplearning.stanford.edu/wiki/index.php/UFLDL%E6%95%99%E7%A8%8B
https://www.liaoxuefeng.com/article/001432619295115c918a094d8954bd493037b03d27bf9a9000
https://www.liaoxuefeng.com/wiki/0013739516305929606dd18361248578c67b8067c8c017b000
http://morvanzhou.github.io/tutorials
https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap
https://github.com/nsoojin/coursera-ml-py
https://blog.csdn.net/dukuku5038/article/details/82253966
http://web.stanford.edu/class/cs224n/readings/cs229-linalg.pdf
http://101.96.10.44/web.stanford.edu/class/cs224n/readings/cs229-prob.pdf
http://101.96.10.43/web.stanford.edu/class/cs224n/readings/cs229-cvxopt.pdf
http://cs231n.github.io/optimization-1
https://www.cnblogs.com/steven-yang/p/6348112.html
http://vdisk.weibo.com/s/vfFpMc1YgPOr
http://web.stanford.edu/class/cs224n/lectures/python-review.pdf
https://github.com/open-source-for-science/TensorFlow-Course#why-use-tensorflow
https://www.liaoxuefeng.com/article/001432619295115c918a094d8954bd493037b03d27bf9a9000
https://morvanzhou.github.io/tutorials/machine-learning/tensorflow
Kaggle Datasets
Named Entity Recognition
Chunking
Dependency Parsing
Quora Question Pairs
Sentence-Level Sentiment Analysis
Document-Level Sentiment Analysis
Textual Entailment
Machine Translation (Ambitious)
Yelp Reviews
WikiText Language Modeling
Fake News Challenge
Toxic Comment Classification
1540300272862
1540898521791
1540906029954
1540906170932
1529482908860