Exploring the Power of Links in Data Mining-韩家炜演讲摘录
2008-01-01 14:04
477 查看
韩家炜(Jiawei Han),数据挖掘的泰斗级人物,大名如雷贯耳,今日有幸能一睹真人风采。见面第一感觉居然是此人年轻时肯定是个帅哥(汗!),当然,现在仍然是个精神矍铄的智者。
演讲的主题是:Exploring the Power of Links in Data Mining。报告主要讲了四篇论文,都是他的博士研究生Xiaoxin Yin完成。这些工作,大多是受到PageRank算法HITS等的影响导出的。利用数据间的连接关系,我们可以更有效的得出我们所关注的信息。这四篇论文提出的算法,在与其他相关算法的比较中,均显示出了较强的优越性。
1.CrossMine:在连接传播过程中,采用的是有控制的传播,有些比较弱的连接不考虑,这样,能在很好保持准确率的情况下,大大提高时间效率。在Relation少的时候,这种优势不明显,但当Relation多时,显示了强大的优越性。
2.User-Guided Clustering:类似于半监督的学习,用户提供认为重要的特征,然后再分类。这里把整个feature的一列作为特征考虑。而这个提供的特征只是作为soft hint,作为一种参考,我们还需要考虑其它的因素。
3.LinkClus:可以通过人们发的paper,找出各个会议间的相关性。同一个author发的不同会议间的联系强。原有的算法时间效率很差,这里利用了power law distribution of links。找出密集的links,因为密集的links比较少,所以只分析这些会有很大的效率提高。同时,绝大多数的性息被包含在这些密集的links中了,所以准确率也很好。
4.同名人发的paper怎么区分?特别是中国人,名称翻译成英文后,重名的很多,如王伟,有14个之多,如何区分他们,成了问题。这边用到了论文中合作者的信息(共同作者),首先训练的是那些很难重名的人,作为clean data。从他们出发,分类其它的。
最后讲了Xiaoxin Yin最近的研究方向:辨别网页上信息的真假。利用的是这样一个假设,真的信息只有一个,假的信息千变万化。
最后,再次向牛人致敬!
贴一下讲座的摘要,以及韩老的简历:
ABSTRACT
Algorithms like PageRank and HITS have been developed in late 1990s to
explore links among Web pages to discover authoritative pages and hubs.
Links have also been popularly used in citation analysis and social network
analysis. We show that the power of links can be explored thoroughly at
data mining in classification, clustering, information integration, and
other interesting tasks. Some recent results of our research that explore
the crucial information hidden in links will be introduced, including (1)
multi-relational classification, (2) user-guided clustering, (3) link-based
clustering, and (4) object distinction analysis. The power of links in
other analysis tasks will also be discussed in the talk.
------------------------
Short bio:
Jiawei Han, Professor, Department of Computer Science, University of
Illinois at Urbana-Champaign. He has been working on research into data
mining, data warehousing, database systems, data mining from spatiotemporal
data, multimedia data, stream and RFID data, Web data, social network data,
and biological data, with over 300 journal and conference publications. He
has chaired or served on over 100 program committees of international
conferences and workshops, including PC co-chair of 2005 (IEEE)
International Conference on Data Mining (ICDM), Americas Coordinator of
2006 International Conference on Very Large Data Bases (VLDB). He is also
serving as the founding Editor-In-Chief of ACM Transactions on Knowledge
Discovery from Data. He is an ACM Fellow and has received 2004 ACM SIGKDD
Innovations Award and 2005 IEEE Computer Society Technical Achievement
Award. His book "Data Mining: Concepts and Techniques" (2nd ed., Morgan
Kaufmann, 2006) has been popularly used as a textbook worldwide.
韩老的Home page:
http://www-faculty.cs.uiuc.edu/~hanj/
讲的四篇paper:
CrossMine: Efficient Classification from Multiple Heterogeneous Databases
Cross-Relational Clustering with User's Guidance
LinkClus: Efficient Clustering via Heterogeneous Semantic Links
Object Distinction: Distinguishing Objects with Identical Names by Link Analysis
他作的另一个演讲记录:
http://users.ir-lab.org/~bill_lang/blog10/archives/001166.html
演讲的主题是:Exploring the Power of Links in Data Mining。报告主要讲了四篇论文,都是他的博士研究生Xiaoxin Yin完成。这些工作,大多是受到PageRank算法HITS等的影响导出的。利用数据间的连接关系,我们可以更有效的得出我们所关注的信息。这四篇论文提出的算法,在与其他相关算法的比较中,均显示出了较强的优越性。
1.CrossMine:在连接传播过程中,采用的是有控制的传播,有些比较弱的连接不考虑,这样,能在很好保持准确率的情况下,大大提高时间效率。在Relation少的时候,这种优势不明显,但当Relation多时,显示了强大的优越性。
2.User-Guided Clustering:类似于半监督的学习,用户提供认为重要的特征,然后再分类。这里把整个feature的一列作为特征考虑。而这个提供的特征只是作为soft hint,作为一种参考,我们还需要考虑其它的因素。
3.LinkClus:可以通过人们发的paper,找出各个会议间的相关性。同一个author发的不同会议间的联系强。原有的算法时间效率很差,这里利用了power law distribution of links。找出密集的links,因为密集的links比较少,所以只分析这些会有很大的效率提高。同时,绝大多数的性息被包含在这些密集的links中了,所以准确率也很好。
4.同名人发的paper怎么区分?特别是中国人,名称翻译成英文后,重名的很多,如王伟,有14个之多,如何区分他们,成了问题。这边用到了论文中合作者的信息(共同作者),首先训练的是那些很难重名的人,作为clean data。从他们出发,分类其它的。
最后讲了Xiaoxin Yin最近的研究方向:辨别网页上信息的真假。利用的是这样一个假设,真的信息只有一个,假的信息千变万化。
最后,再次向牛人致敬!
贴一下讲座的摘要,以及韩老的简历:
ABSTRACT
Algorithms like PageRank and HITS have been developed in late 1990s to
explore links among Web pages to discover authoritative pages and hubs.
Links have also been popularly used in citation analysis and social network
analysis. We show that the power of links can be explored thoroughly at
data mining in classification, clustering, information integration, and
other interesting tasks. Some recent results of our research that explore
the crucial information hidden in links will be introduced, including (1)
multi-relational classification, (2) user-guided clustering, (3) link-based
clustering, and (4) object distinction analysis. The power of links in
other analysis tasks will also be discussed in the talk.
------------------------
Short bio:
Jiawei Han, Professor, Department of Computer Science, University of
Illinois at Urbana-Champaign. He has been working on research into data
mining, data warehousing, database systems, data mining from spatiotemporal
data, multimedia data, stream and RFID data, Web data, social network data,
and biological data, with over 300 journal and conference publications. He
has chaired or served on over 100 program committees of international
conferences and workshops, including PC co-chair of 2005 (IEEE)
International Conference on Data Mining (ICDM), Americas Coordinator of
2006 International Conference on Very Large Data Bases (VLDB). He is also
serving as the founding Editor-In-Chief of ACM Transactions on Knowledge
Discovery from Data. He is an ACM Fellow and has received 2004 ACM SIGKDD
Innovations Award and 2005 IEEE Computer Society Technical Achievement
Award. His book "Data Mining: Concepts and Techniques" (2nd ed., Morgan
Kaufmann, 2006) has been popularly used as a textbook worldwide.
韩老的Home page:
http://www-faculty.cs.uiuc.edu/~hanj/
讲的四篇paper:
CrossMine: Efficient Classification from Multiple Heterogeneous Databases
Cross-Relational Clustering with User's Guidance
LinkClus: Efficient Clustering via Heterogeneous Semantic Links
Object Distinction: Distinguishing Objects with Identical Names by Link Analysis
他作的另一个演讲记录:
http://users.ir-lab.org/~bill_lang/blog10/archives/001166.html
相关文章推荐
- 摘录![未完待续第一课] DataMinining学习 伊利诺伊大学Pattern Discovery in Data Mining 韩家炜
- The conversion of a datetime2 data type to a datetime data type resulted in an out-of-range value. 错误的原因及解决方案
- A comparative study of RNN for outlier detection in data mining
- PowerTip of the Day-Opening Current Folder in Explorer
- ADO.NET Entity Framework: The version of SQL Server in use does not support datatype 'datetime2'
- HOW TO: Change the Owner of a User-Defined Data Type That Is in Use in SQL Server 2000
- [QTP] Retrieves the value of the cell in the specified row of the parameter in the run-time Data Tab
- 评论数据库Win A Free Copy of Packt’s Managing Multimedia and Unstructured Data in the Oracle Database e-book
- 【罪犯画像】A Review of Data Mining Applications in Crime
- 23 In Recovery Manager (RMAN), you are taking image copies of the data files of your production data
- 83.Examine the data in the CUST_NAME column of the CUSTOMERS table.
- The very best Prospect of Mining in raymond mill
- This function has none of Deterministic,no sql,or reads sql data in its declaration and binary logging is enabled(you *might* want to use the less safe log_bin_trust_function_creators variable
- The Top Ten Algorithms in Data Mining
- The Data Mining of Lanzhou University of Finance and Economics
- InnoDB: auto-extending data file ./ibdata1 is of a different size 640 pages (rounded down to MB) than specified in the .cnf file: initial 768 pages, max 0 (relevant if non-zero) pages!
- The Power of Inline Views
- What are the layers of data description in R/3?
- The conversion of a varchar data type to a datetime data type resulted in an out-of-range value
- Error: The version of SQL Server in use does not support datatype 'datetime2