#Paper Reading# Multi-Document Summarization via Sentence-Level Semantic Analysis and SMF
2017-03-19 08:55
344 查看
论文题目:Multi-Document Summarization via Sentence-Level Semantic Analysis and Symmetric Matrix Factorization
论文地址:http://dl.acm.org/citation.cfm?id=1390387
论文发表于:SIGIR 2008(CCF A类会议)
论文大体内容:
本文提出一个基于sentence level的语义分析(SLSS)与对称NMF(SNMF)的多文档摘要方法,能够更好的考虑语义层面的关系,已达到一个更好的效果。
1、方法的整体过程如下:
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/4c5c4d6b5326af320c6317905667e406)
2、构造sentence level的句子相似矩阵(∈R^(S*S))
①将文档拆分成句子;
②句子划分为frames(每个动词以及其附近的词组成一个frame),
③对frames中的terms进行词性标注(semantic role),以此判断两个terms是否related;
④计算各个frame下,semantic role的相似度(terms的交集累加);
⑤计算frames的相似度(累加);
⑥计算sentences的相似度(最大值),从而得出sentence level的相似矩阵;
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/aeb4264590aae41a78de0c14219507e4)
3、SNMF(对称NMF)聚类
①目标方程如下;
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/6ad0797db2c7c2ba38b6c3f7dd655b3c)
②使用拉格朗日展开,加上KKT,梯度下降,得出更新方程如下;
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/fb7e7e56e8f020e7411504ce4b55a374)
4、SNMF的特性
①近正交性;
②等价于谱聚类的一种形式(谱聚类将对象映射成无向图中的点,对象之间的相似度做为点之间的边的权值,然后基于一些准则设计出合适的图划分算法[1]。Normalized Cuts是其中一种);
③等价于Kernel K-means
5、对每个类别的句子进行打分排序,同时考虑两个因素(Mp):
①与同一类的其它句子的平均相似度(M1);
②句子与给定主题的相似度(M2);
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/e1002937ba245b6b3c124822d5188738)
实验
6、Dataset:DUC2005,DUC2006
7、Baseline:
①LeadBase:直接对所有句子排序;
②Random:随机选句子;
③LSA:使用其他学者提出的基于LSA的方法;
④NMFBase:使用其他学者提出的基于NMF的方法;
8、对比实验,分别用不同的方法来处理三个关键步骤:
①句子相似矩阵(SLSS,keyword-based);
②聚类(SNMF,K-means,NMF);
③句子排序方法(Mp,M1,M2);
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/a498bc784b8a7bed5a237dad62948776)
9、评测方法:ROUGE
10、实验结果
①SLSS与keyword-based对比(SLSS较好);
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/10c299d58b5a4d56653fb62f1a8d73e3)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/8c60c4b1c806d66583d37eda03719d26)
②聚类方法对比(SNMF较好);
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/09a8038045473394fdc046593dd8d2b1)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/6e12fed9e5b3b5997a3089a776b59ce8)
③句子排序方法对比(λ取0.7最好);
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/4746169a203e2b27401677dd034a1c89)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/3720c9c300116088744edae2af247929)
④各种方法对比(SLSS+SNMF+Mp最好);
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/0c074025a65793183e1485285d986465)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201703/ff6ad428c6d6196b37fa69699dad802e)
参考资料:
[1]、http://blog.sciencenet.cn/blog-798994-862473.html
以上均为个人见解,因本人水平有限,如发现有所错漏,敬请指出,谢谢!
论文地址:http://dl.acm.org/citation.cfm?id=1390387
论文发表于:SIGIR 2008(CCF A类会议)
论文大体内容:
本文提出一个基于sentence level的语义分析(SLSS)与对称NMF(SNMF)的多文档摘要方法,能够更好的考虑语义层面的关系,已达到一个更好的效果。
1、方法的整体过程如下:
2、构造sentence level的句子相似矩阵(∈R^(S*S))
①将文档拆分成句子;
②句子划分为frames(每个动词以及其附近的词组成一个frame),
③对frames中的terms进行词性标注(semantic role),以此判断两个terms是否related;
④计算各个frame下,semantic role的相似度(terms的交集累加);
⑤计算frames的相似度(累加);
⑥计算sentences的相似度(最大值),从而得出sentence level的相似矩阵;
3、SNMF(对称NMF)聚类
①目标方程如下;
②使用拉格朗日展开,加上KKT,梯度下降,得出更新方程如下;
4、SNMF的特性
①近正交性;
②等价于谱聚类的一种形式(谱聚类将对象映射成无向图中的点,对象之间的相似度做为点之间的边的权值,然后基于一些准则设计出合适的图划分算法[1]。Normalized Cuts是其中一种);
③等价于Kernel K-means
5、对每个类别的句子进行打分排序,同时考虑两个因素(Mp):
①与同一类的其它句子的平均相似度(M1);
②句子与给定主题的相似度(M2);
实验
6、Dataset:DUC2005,DUC2006
7、Baseline:
①LeadBase:直接对所有句子排序;
②Random:随机选句子;
③LSA:使用其他学者提出的基于LSA的方法;
④NMFBase:使用其他学者提出的基于NMF的方法;
8、对比实验,分别用不同的方法来处理三个关键步骤:
①句子相似矩阵(SLSS,keyword-based);
②聚类(SNMF,K-means,NMF);
③句子排序方法(Mp,M1,M2);
9、评测方法:ROUGE
10、实验结果
①SLSS与keyword-based对比(SLSS较好);
②聚类方法对比(SNMF较好);
③句子排序方法对比(λ取0.7最好);
④各种方法对比(SLSS+SNMF+Mp最好);
参考资料:
[1]、http://blog.sciencenet.cn/blog-798994-862473.html
以上均为个人见解,因本人水平有限,如发现有所错漏,敬请指出,谢谢!
相关文章推荐
- #Paper Reading# Multi-document Summarization Based on Cluster Using Non-negative Matrix
- #Paper Reading# Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
- Multi-modal Sentence Summarization with Modality Attention and Image Filtering 论文笔记
- #Paper Reading# A Neural Attention Model for Abstractive Sentence Summarization
- #Paper Reading# Recent Advances in Document Summarization
- #Paper Reading# Joint Matrix Factorization and Manifold-Ranking for Topic-Focused Multi-Document Sum
- Lazarus Reading XML- with TXMLDocument and TXPathVariable
- Sentiment Analysis and Opinion Mining (3)- Document Sentiment Classification
- Paper Reading - Snap and ask: Answering Multimodal Question by Naming Visual Instance
- High Level Microarray Analysis - Clustering and Classification (Practice)
- Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning
- Stack and heap overflow detection at runtime via behavior analysis and Pin
- Paper reading on Sentiment Analysis
- [Paper Reading] Label Efficient Learning of Transferable Representations acrosss Domains and Tasks
- Deep learning----------Multi-Stage multi-level architecture analysis
- #Paper Reading# SumView: A Web-based engine for summarizing product reviews and customer opinions
- [文章摘要]Semantic Enrichment and Analysis of Movement Data:Probably it is just Starting!
- Object Oriented Analysis and Design Using UML A Whitepaper by Mark Collins-Cope of Ratio Group.
- 论文解读-<Instance-aware Semantic Segmentation via Multi-task Network Cascades>
- NAACL 2013 Paper Mining User Relations from Online Discussions using Sentiment Analysis and PMF