您的位置:首页 > 其它

【分享】Stanford Dataset全集之Citation networks

2013-08-22 14:55 465 查看
cit-HepPh(1)

Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j.
If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.

The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section.

The data was originally released as a part of 2003 KDD Cup.

 
Dataset statistics
Nodes34546
Edges421578
Nodes in largest WCC34401 (0.996)
Edges in largest WCC421485 (1.000)
Nodes in largest SCC12711 (0.368)
Edges in largest SCC139981 (0.332)
Average clustering coefficient0.2962
Number of triangles1276868
Fraction of closed triangles0.1457
Diameter (longest shortest path)12
90-percentile effective diameter5
 

cit-HepPh(2)

Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j.
If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.

The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-TH section.

The data was originally released as a part of 2003 KDD Cup.

 
Dataset statistics
Nodes27770
Edges352807
Nodes in largest WCC27400 (0.987)
Edges in largest WCC352542 (0.999)
Nodes in largest SCC7464 (0.269)
Edges in largest SCC116268 (0.330)
Average clustering coefficient0.3295
Number of triangles1478735
Fraction of closed triangles0.1196
Diameter (longest shortest path)14
90-percentile effective diameter5.4
 

cit-Patents

U.S. patent dataset is maintained by the National Bureau of Economic Research. The data set spans 37 years (January 1, 1963 to December 30, 1999), and includes all the utility patents granted during that period, totaling 3,923,922 patents. The citation graph
includes all citations made by patents granted between 1975 and 1999, totaling 16,522,438 citations. For the patents dataset there are 1,803,511 nodes for which we have no information about their citations (we only have the in-links).

 

Dataset statistics
Nodes3774768
Edges16518948
Nodes in largest WCC3764117 (0.997)
Edges in largest WCC16511741 (1.000)
Nodes in largest SCC1 (0.000)
Edges in largest SCC0 (0.000)
Average clustering coefficient0.0919
Number of triangles7515023
Fraction of closed triangles0.06714
Diameter (longest shortest path)22
90-percentile effective diameter9.4
 
    数据堂免费提供数据挖掘数据集下载:http://www.datatang.com/data/44126
      数据堂-国内科研数据免费下载平台
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息