人類語言的內容和結構的分布式表示曾在上世紀80年代短暫的繁榮,但它很快就消失了,而在過去20年一直占據著繼續(xù)使用的語言明確表示,盡管在這些范疇的元素運用概率或權重表示。然而,在過去的五年里已經看到了復蘇,隨著高度成功的采用分布式向量空間表示,經常在“神經元”或“深度學習”模式的背景下。一個偉大的成功已經發(fā)給字表示,我也會看一些我們近期工作的其他人更好地理解文字表述,以及他們如何可以被認為是全球矩陣分解的,而且,更類似于傳統(tǒng)文學。但是,計算機在職研究生需要的不僅僅是字表示:我們需要明白,是做出來的話較大的語言單位,已經少得多解決的問題。我將討論使用分布式表示的樹結構遞歸神經網絡模型,展示他們如何能夠提供語義相似性,情緒,句法解析結構和邏輯蘊涵復雜的語言模型。
克里斯托弗·曼寧是斯坦福大學計算機科學和語言學的教授。他的研究目標是可以智能化的過程,理解和生成人類的語言材料的計算機。曼寧專注于機器學習方法來解決計算語言學的問題,包括句法分析,計算語義和語用學,文字推理,機器翻譯和深層次的學習NLP。他是ACM研究員,一AAAI研究員,以及ACL研究員,并與人合著領先統(tǒng)計自然語言處理和信息教科書。
原文:Distributed representations of human language content and structure had a brief boom in the 1980s, but it quickly faded, and the past 20 years have been dominated by continued use of categorical representations of language, despite the use of probabilities or weights over elements of these categorical representations. However, the last five years have seen a resurgence, with highly successful use of distributed vector space representations, often in the context of "neural" or "deep learning" models. One great success has been distributed word representations, and I will look at some of our recent work and that of others on better understanding word representations and how they can be thought of as global matrix factorizations, much more similar to the traditional literature. But we need more than just word representations: We need to understand the larger linguistic units that are made out of words, a problem which has been much less addressed. I will discuss the use of distributed representations in tree-structured recursive neural network models, showing how they can provide sophisticated linguistic models of semantic similarity, sentiment, syntactic parse structure, and logical entailment.