
(1.浙江大学外国语言文化与国际交流学院,浙江 杭州 310058; 2.杭州师范大学外语学院,浙江 杭州 311121)

现代汉语句子; 扩展模式; 语法模型; 汉语句子特征; 自然语言处理

Modern Chinese Sentence Extended Pattern Grammar Model for Natural Language Processing][J
WANG Xiaoying1,FENG Zhiwei2,ZHANG Dan1,QU Yunhua1*

(phrase structure grammar)是乔姆斯基用数学方法研究自然语言和人工语言的语法理论,其基本思想是句子由短语结构组成.短语结构分为两大类型:名词性短语结构(NP)和谓词性短语结构(VP),S代表句子,S=NP+VP.短语结构语法能够识别出句子的语序、层次和词类信息.方立等[9-11]介绍了短语结构语法在汉语中的应用; 也有一些学者利用短语结构语法对汉语进行分析,姚小烈[12]探索了汉语“的”字结构,郑友阶[13]考察了汉

modern Chinese sentence; extended pattern; grammar model; Chinese sentence features; natural language processing

DOI: 10.6043/j.issn.0438-0479.201805007


由于现有适用于自然语言处理的语法分析体系在分析汉语句式时无法准确体现汉语句子的特点,导致对汉语句子本体研究不够深入,限制了汉语自然语言处理各种应用的精度和速度.鉴于此,为了服务自然语言处理和汉语句式研究,提出构建一种新的分析汉语句式的语法体系——现代汉语句子的扩展模式语法模型.本模型对Susan Hunston提出的模式语法进行扩展,植入广义话题理论,对二者进行整合,凸显汉语句子特征,并设置了标点句分析模块和话题非自足句分析模块.现代汉语句子的扩展模式语法模型能够准确而全面地描述和归纳汉语句式规则,体现汉语句式中虚词与实词之间的限定关系,反映句式的线性序列,并提高汉语流水句的分析质量.

The existing grammar analysis system for natural language processing cannot accurately reflect characteristics of Chinese sentences when we analyze Chinese sentence patterns,resulting in inadequate theoretical studies on Chinese grammar and limiting the precision and speed of Chinese natural language processing applications.In view of this,to serve natural language processing and Chinese sentence pattern research,we propose to construct a new grammar system for analyzing Chinese sentence patterns, which is called the modern Chinese sentence extended pattern grammar model.This model extends the pattern grammar proposed by Susan Hunston and implants the generalized topic theory.This model,with a punctuation sentence analysis module and a topic insufficient sentence analysis module,integrates these two theories and highlights Chinese sentence features.This model can accurately and comprehensively describe and generalize Chinese sentence patterns,reflecting the interaction between functional words and content words,the linear sequence of sentence patterns,and improving the analytical quality of Chinese flowing sentences.