|本期目录/Table of Contents|

[1]杨 山,杨雅婷*,温正阳,等.基于语义空间的抽取式单文档摘要方法[J].厦门大学学报(自然科学版),2019,58(02):237-242.[doi:10.6043/j.issn.0438-0479.201811014]
 YANG Shan,YANG Yating*,WEN Zhengyang,et al.An extractive text summarization method based on semantic space[J].Journal of Xiamen University(Natural Science),2019,58(02):237-242.[doi:10.6043/j.issn.0438-0479.201811014]
点击复制

基于语义空间的抽取式单文档摘要方法(PDF/HTML)
分享到:

《厦门大学学报(自然科学版)》[ISSN:0438-0479/CN:35-1070/N]

卷:
58卷
期数:
2019年02期
页码:
237-242
栏目:
自然语言处理计算方法
出版日期:
2019-03-27

文章信息/Info

Title:
An extractive text summarization method based on semantic space
文章编号:
0438-0479(2019)02-0237-06
作者:
杨 山123杨雅婷123*温正阳4米成刚13
1.中国科学院新疆理化技术研究所,新疆 乌鲁木齐 830011; 2.中国科学院大学计算机科学与技术学院,北京 100049; 3.新疆民族语音语言信息处理实验室,新疆 乌鲁木齐 830011; 4.乌鲁木齐市公安局网安支队,新疆 乌鲁木齐 830011
Author(s):
YANG Shan123YANG Yating123*WEN Zhengyang4MI Chenggang13
1.The Xinjiang Technical Institute of Physics & Chemistry,Chinese Academy of Sciences,Urumqi 830011,China; 2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China; 3.Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,China; 4.Urumqi Public Security Bureau Detachment of Network Security,Urumqi 830011,China
关键词:
文本摘要 Word2Vec TextRank 词频-逆文本频率指数 句子-原文相似度 序列到序列
Keywords:
text summarization Word2Vec TextRank term frequency-inverse document frequency(TF-IDF) sentence-text similarity seq2seq
分类号:
TP 391
DOI:
10.6043/j.issn.0438-0479.201811014
文献标志码:
A
摘要:
目前的抽取式单文档摘要方法未考虑原文中句子和原文语义信息相关度,针对该问题,提出一种基于语义空间的抽取式单文档摘要方法.首先,利用Word2Vec训练词向量以获取语义空间,并基于该语义空间表示句子和原文; 然后,基于余弦相似度计算句子与原文相似度值,并使用TextRank和词频-逆文本频率指数(TF-IDF)模型计算原文中句子的权重; 最后,将相似度值与权重相结合得到句子的最终权重值.实验结果表明,该模型摘要质量优于基于深度学习的基线系统.
Abstract:
Current methods of extractive text summarization ignore the semantic relevance between sentences and source texts.To solve this problem,we propose an extractive text summarization method based on semantic space.First,this method obtains the semantic space based on word vector trained by Word2Vec,and expresses sentences and original texts based on this semantic space.Then,the similarity value between each sentence and the source text is calculated by cosine similarity,and the weight of each sentence is calculated by TextRank and term frequency-inverse document frequency(TF-IDF).Finally,the final weight of each sentence is obtained by combining the similarity value and the weight calculated above.Experimental results demonstrate that our method performs more satisfactorily than the baseline system based on deep learning does.

参考文献/References:

[1] ALLAHYARI M,POURIYEH S,ASSEFI M,et al.Text summarization techniques:a brief survey[J].International Journal of Advanced Computer Science & Applications,2017,8(10):397-405.
[2] LOPYREV K.Generating news headlines with recurrent neural networks[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1512.01712.
[3] MA S,SUN X,XU J,et al.Improving semantic relevance for sequence-to-sequence learning of Chinese social media text summarization[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1706.02459.
[4] CHOPRA S,AULI M,RUSH A M.Abstractive sentence summarization with attentive recurrent neural networks[C]∥Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.San Diego:ACL,2016:93-98.
[5] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]∥Advances in Neural Information Processing Systems.Montréal:[s.n.],2014:3104-3112.
[6] MORATANCH N,CHITRAKALA S.A survey on extractive text summarization[C]∥2017 International Conference of Computer,Communication and Signal Processing(ICCCSP).Tamilnadu:IEEE,2017:1-6.
[7] SEKI Y.Sentence extraction by tf/idf and position weighting from newspaper articles[EB/OL].[ 2018-06-08].http:∥research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/NTCIR3-TSC-SekiY.pdf.
[8] MIHALCEA R.TextRank:bringing order into texts[C] ∥Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.Barcelona:[s.n.],2004:404-411.
[9] BRIN S,PAGE L.The anatomy of a large-scale hypertextual web search engine[J].Computer Networks and ISDN Systems,1998,30(1/2/3/4/5/6/7):107-117.
[10] RUSH A M,CHOPRA S,WESTON J.A neural attention model for abstractive sentence summarization[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1509.00685.
[11] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1301.3781.
[12] MANNING C D,RAGHAVAN P,SCHüTZA H.Introduction to information retrieval[M].Cambridge:Cambridge University Press,2008.
[13] VAN DER MAATEN L.Accelerating t-SNE using tree-based algorithms[J].The Journal of Machine Learning Research,2014,15(1):3221-3245.
[14] XU C Z,LIU D.Chinese text summarization algorithm based on word2vec[C]∥Journal of Physics:Conference Series.Boracay:IOP Publishing,2018,976(1):012006.
[15] HU B,CHEN Q,ZHU F.LCSTS:a large scale Chinese short text summarization dataset[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1506.05865.
[16] LIN C Y.Rouge:a package for automatic evaluation of summaries[EB/OL].[2018-06-08].http:∥www.aclweb.org/anthology/W04-1013.
[17] GU J,LU Z,LI H,et al.Incorporating copying mechanism in sequence-to-sequence learning[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1603.06393.[1] ALLAHYARI M,POURIYEH S,ASSEFI M,et al.Text summarization techniques:a brief survey[J].International Journal of Advanced Computer Science & Applications,2017,8(10):397-405.
[2] LOPYREV K.Generating news headlines with recurrent neural networks[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1512.01712.
[3] MA S,SUN X,XU J,et al.Improving semantic relevance for sequence-to-sequence learning of Chinese social media text summarization[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1706.02459.
[4] CHOPRA S,AULI M,RUSH A M.Abstractive sentence summarization with attentive recurrent neural networks[C]∥Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.San Diego:ACL,2016:93-98.
[5] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]∥Advances in Neural Information Processing Systems.Montréal:[s.n.],2014:3104-3112.
[6] MORATANCH N,CHITRAKALA S.A survey on extractive text summarization[C]∥2017 International Conference of Computer,Communication and Signal Processing(ICCCSP).Tamilnadu:IEEE,2017:1-6.
[7] SEKI Y.Sentence extraction by tf/idf and position weighting from newspaper articles[EB/OL].[ 2018-06-08].http:∥research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/NTCIR3-TSC-SekiY.pdf.
[8] MIHALCEA R.TextRank:bringing order into texts[C] ∥Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.Barcelona:[s.n.],2004:404-411.
[9] BRIN S,PAGE L.The anatomy of a large-scale hypertextual web search engine[J].Computer Networks and ISDN Systems,1998,30(1/2/3/4/5/6/7):107-117.
[10] RUSH A M,CHOPRA S,WESTON J.A neural attention model for abstractive sentence summarization[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1509.00685.
[11] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1301.3781.
[12] MANNING C D,RAGHAVAN P,SCHüTZA H.Introduction to information retrieval[M].Cambridge:Cambridge University Press,2008.
[13] VAN DER MAATEN L.Accelerating t-SNE using tree-based algorithms[J].The Journal of Machine Learning Research,2014,15(1):3221-3245.
[14] XU C Z,LIU D.Chinese text summarization algorithm based on word2vec[C]∥Journal of Physics:Conference Series.Boracay:IOP Publishing,2018,976(1):012006.
[15] HU B,CHEN Q,ZHU F.LCSTS:a large scale Chinese short text summarization dataset[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1506.05865.
[16] LIN C Y.Rouge:a package for automatic evaluation of summaries[EB/OL].[2018-06-08].http:∥www.aclweb.org/anthology/W04-1013.
[17] GU J,LU Z,LI H,et al.Incorporating copying mechanism in sequence-to-sequence learning[EB/OL].[2018-06-08].https:∥arxiv.org/pdf/1603.06393.

备注/Memo

备注/Memo:
收稿日期:2018-11-11 录用日期:2019-01-12
基金项目:国家自然科学基金(U1703133); 中科院“西部之光”人才培养引进计划(2017-XBQNXZ-A-005); 中国科学院青年创新促进会项目(2017472)
*通信作者:yangyt@ms.xjb.ac.cn
更新日期/Last Update: 1900-01-01