您的位置:首页 > 其它

KDDCUP历年主题

2016-04-24 11:06 495 查看
KDD Cup简介



KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM
Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners.

(由SIGKDD(ACM Special Interest Group on Knowledge Discovery and Data Mining)组织,每年一次的KDD竞赛,和SIGKDD国际会议同期举行。同时面向学术界和业界。 )

here
is the KDD Cup Center:


http://www.sigkdd.org/kddcup/index.php

历届KDD Cup的主题:

2015:用大数据预测MOOCer是否会“翘课”

2014:帮助一个慈善网站识别出那些格外激动人心的项目

2013:Determine whether an author has written a given paper

2012:(1)社交网络中的个性化推荐系统,(2)搜索广告系统的pTCR点击率预估

2011:(1)音乐评分预测(2)识别音乐是否被用户评分

2010:根据智能教学系统和学生之间的交互日志,来预测学生在数学题测验上的表现

2009:电信运营商客户行为预测

2008:乳腺癌早期检测问题

2007:预测电影评价问题

2006:医疗数据挖掘

2005:互联网用户查询分类

2004, 有指导分类的多种性能度量

2003, 网络挖掘及使用日志分析

2002, 生物信息及文本挖掘(分子生物学领域)

2001, 生物信息及医药(医药设计中的生物活性预测、预测基因/蛋白质的功能及定位)

2000, web挖掘任务(根据点击流及交易数据)

1999, 网络侵入侦测及报告

1998, 生成最佳直销名单

1997, 预测出最可能的善款捐赠人

KDD Cup 1997

http://www-aig.jpl.nasa.gov/public/kdd97/kdd_cup.html

Task

given data on past responders to fund-raising, predict most likely responders for new campaign

Dataset

321 fields/variables, Significant effort on data preprocessing

Participants

45 companies/institutions participated

16 contestants turned in their results

Shared 1-2 place

Charles Elkan, Ph.D. from University of California, San Diego (BNB, Boosted Naive Bayesian Classifier)

Urban Science Applications, Inc. (Gain, Direct Marketing Selection System)

3rd Place

Silicon Graphics, Inc (MineSet)

KDD Cup 1998

http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html

Task: the goal was to select the best list to mail a solicitation

Dataset: 95412 records and 481 fields

Participants: 21 teams completed the challenge and submitted results

1st place: Urban Science Applications, Inc. (Software GainSmarts)

2nd place: SAS Institute, Inc. (Software Enterprise Miner)

3rd place: Quadstone Limited (Software Decisionhouse)

KDD Cup 1999

1.Classifier learning contest

http://www-cse.ucsd.edu/users/elkan/clresults.html

The goal was to build a predictive model for identifying network intrusions.

24 entries were submitted.

2.Knowledge discovery "report" contest

http://www.cse.ucsd.edu/users/elkan/kdresults.html

The goal was to apply a range of knowledge discovery techniques to the same data used in the 1998 competition, and discover higher-level knowledge from data .

Co-winners

J. Georges and A.Milley (SAS)

S. Rosset and A. Inger (Amdocs, Israel).

Honorable mention

Paola Sebastiani, Marco Ramoni, and Alexander Crea of Bayesware Ltd.

KDD Cup 2000

http://www.ecn.purdue.edu/KDDCUP/

Task: The questions related to clickstream and purchase data from an e-tailer. Five questions.

Dataset: Obtained from Gazelle.com, a legwear and legcare Web retailer

Over 150 teams requested data, 30 teams submitted the answers.

Questions 1 & 5 Winner: Amdocs

Exploratory Data Analysis – SAS, S Plus

Classification Tree, Rules Extraction – Amdocs Business Insight Tool

Questions 2 & 3 Winner: Salford Systems

Question 4 Winner: e-steam

KDD Cup 2001

http://www.cs.wisc.edu/~dpage/kddcup2001/

Problems from bioinformaitcs

Data set 1

Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin (task 1)

Data set 2

Prediction of Gene/Protein Function (task 2) and Localization (task 3)

136 groups , 200 submissions

Task 1 winner (Thrombin)

Jie Cheng (Canadian Imperial Bank of Commerce).

Bayesian network learner and classifier

Task 2 winner (Function)

Mark-A. Krogel (University of Magdeburg).

Inductive Logic programming

Task 3 winner (Localization)

Hisashi Hayashi, Jun Sese, and Shinichi Morishita (University of Tokyo).

K nearest neighbor

KDD Cup 2002

http://www.biostat.wisc.edu/~craven/kddcup/

Two tasks from molecular biology domains

Task 1: construct models that can assist genome annotators by automatically extracting information from scientific articles

Task 2: learn models that characterize the behavior of individual genes in a hidden experimental setting.

Task 1 winner

Yizhar Regev and Michal Finkelstein

ClearForest and Celera, USA

Task 2 winner

Adam Kowalczyk and Bhavani Raskutti

Telstra Research Laboratories, Australia

Single Class SVM

KDD Cup 2003

http://www.cs.cornell.edu/projects/kddcup/

Data set

A very large archive of research papers

Citation structure and (partial) data on the downloading of papers by users

Task

Task 1: predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference

Task 2: a citation graph of a large subset of the archive from only the LaTex sources

Task 3: each paper's popularity will be estimated based on partial download logs

Task 4: devise their own questions

Task 1 :

Claudia Perlich, Foster Provost, Sofus Kacskassy

New York University

Task 2:

David Vogel

AI Insight Inc.

Task 3 :

Janez Brank and Jure Leskovec

Jozef Stefan Institute, Slovenija

Task 4 :

Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen

University of Massachusetts, Amherst, USA

KDD Cup 2004

http://kodiak.cs.cornell.edu/kddcup/

April 28 --- July 14, 2004

两个问题,数据分别来自

生物信息学

量子物理学

不同性能指标下的数据挖掘问题

有来自49个国家的注册 (including .com)

优胜者来自China, Germany, India, New Zealand, USA

优胜者一半来自公司,一半来自大学

Protein Winners:

Bernhard Pfahringer

University of Waikato, Computer Science Department

1st Place Overall

Yan Fu, RuiXiang Sun, Qiang Yang, Simin He, Chunli Wang, Haipeng Wang, Shiguang Shan, Junfa Liu, Wen Gao

Institute of Computing Technology, Chinese Academy of Sciences

Tied for 1st Place Overall

Honorable Mention for Squared Error

Honorable Mention for Average Precision

David S. Vogel, Eric Gottschalk, and Morgan C. Wang

MEDai / A.I. Insight / University of Central Florida

Tied for 1st Place Overall

Honorable Mention for Top-1 Accuracy

Dirk Dach, Holger Flick, Christophe Foussette, Marcel Gaspar, Daniel Hakenjos, Felix Jungermann, Christian Kullmann, Anna Litvina, Lars Michele, Katharina Morik, Martin Scholz, Siehyun
Strobel, Marc Twiehaus, Nazif Veliu

Artificial Intelligence Unit, University of Dortmund, Germany

Honorable Mention for Rank of Last

资源来自:http://huzhyi21.blog.163.com/blog/static/1007396200981534952541

2015届KDD
Cup

http://www.kddcup2015.com/information.html
2014届KDD
Cup

https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose/ http://www.datapub.cn/d/562defa1e4b05a46eeaad9ce
2013届KDD Cup

http://www.kaggle.com/c/kdd-cup-2013-author-paper-identification-challenge
http://blog.csdn.net/pf1492536/article/details/9162667
2012届KDD
Cup

Track1任务:社交网络中的个性化推荐系统

根据腾讯微博中的用户属性(User Profile)、SNS社交关系、在社交网络中的互动记录(retweet、comment、at)等,以及过去30天内的历史item推荐记录,来预测接下来最有可能被用户接受的推荐item列表

Track2任务:搜索广告系统的pTCR点击率预估

提供用户在腾讯搜索的查询词(query)、展现的广告信息(包括广告标题、描述、url等),以及广告的相对位置(多条广告中的排名)和用户点击情况,以及广告主和用户的属性信息,来预测后续时间用户对广告的点击情况

数据集:http://www.kddcup2012.org/c/kddcup2012-track1/data

论文:http://www.kddcup2012.org/workshop

2011届KDD
Cup

Track1任务:音乐评分预测

根据用户在雅虎音乐上item的历史评分记录,来预测用户对其他item(包括歌曲、专辑等)的评分和实际评分之间的差异RMSE(最小均方误差)。同时提供的还有歌曲所属的专辑、歌手、曲风等信息

Track2任务:识别音乐是否被用户评分

每个用户提供6首候选的歌曲,其中3首为用户已评分数据,另3首是该用户未评分,但是出自用户中整体评分较高的歌曲。歌曲的属性信息(专辑、歌手、曲风等)也同样提供。参赛者给出二分分类结果(0/1分类),并根据整体准确率计算最终排名

数据集:http://kddcup.yahoo.com/datasets.php#

论文:http://kddcup.yahoo.com/workshop.php

2010届KDD
Cup
http://www.datapub.cn/d/55d6bed7e4b022099bb3e532
2009届KDD Cup

法国电信运营商Orange的大规模数据中,积累了大量客户的行为记录。竞赛者需要设计一个良好的客户关系管理系统(CRM),用快速、稳定的方法,预测客户三个维度的属性,包括:1、忠诚度:用户切换运营商的可能性(Churn);2、购买欲:购买新服务的可能性(Appetency);3、增值性:客户升级或追加购买高利润产品的可能性(Up-selling)。结果用AUC曲线来评估

数据集:http://www.sigkdd.org/kddcup/index.php

论文:http://jmlr.csail.mit.edu/proceedings/papers/v7/

2008届KDD Cup
http://www.kdd.org/kdd-cup/view/kdd-cup-2008/Data
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: