您的位置:首页 > 理论基础 > 计算机网络

网络数据集

2013-12-29 20:51 134 查看

Network data

This page contains links to some network data sets I've compiled over the years. All of these are free for scientific use to the best of my knowledge, meaning that the original authors have already made the data
freely available, or that I have consulted the authors and received permission to the post the data here, or that the data are mine. If you make use of any of these data, please cite the original sources.
The data sets are in GML format. For a description of GML see here. GML can be read by many network analysis
packages, including Gephi and Cytoscape. I've written a simple parser in C that will read the files into a data structure.
It's available here. There are many features of GML not supported by this parser, but it will read the files in this repository just fine. There is a Python
parser for GML available as part of the NetworkX package here and another in the igraph
package, which can be used from C, Python, or R. If you know of or develop other software (Java, C++, Perl, R, Matlab, etc.) that reads GML, let me know.

Data sets

Zachary's karate club: social network of friendships between 34 members of a karate club
at a US university in the 1970s. Please cite W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33,
452-473 (1977).
Les Miserables: coappearance network of characters in the novel Les
Miserables. Please cite D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA (1993).
Word adjacencies: adjacency network of common adjectives and nouns in the novel David
Copperfield by Charles Dickens. Please cite M. E. J. Newman, Phys. Rev. E 74,
036104 (2006).
American College football: network of American football games between Division IA colleges
during regular season Fall 2000. Please cite M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. USA 99,
7821-7826 (2002).
Dolphin social network: an undirected social network of frequent associations between
62 dolphins in a community living off Doubtful Sound, New Zealand. Please cite D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Behavioral Ecology and Sociobiology 54,
396-405 (2003). Thanks to David Lusseau for permission to post these data on this web site.
Political blogs: A directed network of hyperlinks between weblogs on US politics, recorded
in 2005 by Adamic and Glance. Please cite L. A. Adamic and N. Glance, "The political blogosphere and the 2004 US Election", in Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem (2005). Thanks to Lada Adamic for permission to post these data
on this web site.
Books about US politics: A network of books about US politics published around the time
of the 2004 presidential election and sold by the online bookseller Amazon.com. Edges between books represent frequent copurchasing of books by the same buyers. The network was compiled by V. Krebs and is unpublished, but can found on Krebs' web
site. Thanks to Valdis Krebs for permission to post these data on this web site.
Neural network: A directed, weighted network representing the neural network of
C. Elegans. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please cite D. J. Watts
and S. H. Strogatz, Nature 393, 440-442 (1998). Original experimental data taken from
J. G. White, E. Southgate, J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London 314,
1-340 (1986).
Power grid: An undirected, unweighted network representing the topology of the Western States
Power Grid of the United States. Data compiled by D. Watts and S. Strogatz and made available on the web here. Please
cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).
Condensed matter collaborations 1999: weighted network of coauthorships between scientists
posting preprints on the Condensed Matter E-Print Archive between Jan 1, 1995 and December 31, 1999. Please cite M. E.
J. Newman, The structure of scientific collaboration networks, Proc. Natl. Acad. Sci. USA 98,
404-409 (2001).
Condensed matter collaborations 2003: updated network of coauthorships between scientists
posting preprints on the Condensed Matter E-Print Archive. This version includes all preprints posted between Jan 1,
1995 and June 30, 2003. The largest component of this network, which contains 27519 scientists, has been used by several authors as a test-bed for community-finding algorithms for large networks; see for example J. Duch and A. Arenas, Phys.
Rev. E 72, 027104 (2005). These data can be cited as M. E. J. Newman, Proc. Natl. Acad.
Sci. USA 98, 404-409 (2001).
Condensed matter collaborations 2005: updated network of coauthorships between scientists
posting preprints on the Condensed Matter E-Print Archive. This version includes all preprints posted between Jan 1,
1995 and March 31, 2005. Please cite M. E. J. Newman, Proc. Natl. Acad. Sci. USA 98, 404-409
(2001).
Astrophysics collaborations: weighted network of coauthorships between scientists posting
preprints on the Astrophysics E-Print Archive between Jan 1, 1995 and December 31, 1999. Please cite M. E. J. Newman, Proc.
Natl. Acad. Sci. USA 98, 404-409 (2001).
High-energy theory collaborations: weighted network of coauthorships between scientists
posting preprints on the High-Energy Theory E-Print Archive between Jan 1, 1995 and December 31, 1999. Please cite M. E.
J. Newman, Proc. Natl. Acad. Sci. USA 98, 404-409 (2001).
Coauthorships in network science: coauthorship network of scientists working on network
theory and experiment, as compiled by M. Newman in May 2006. A figure depicting the largest component of this network can be found here.
These data can be cited as M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
Internet: a symmetrized snapshot of the structure of the Internet at the level of
autonomous systems, reconstructed from BGP tables posted by the University of Oregon Route Views Project. This snapshot was created
by Mark Newman from data for July 22, 2006 and is not previously published.

Other sources of network data

There are a number of other pages on the web from which you can download network data. Here are a few that I am aware of:

UCINet data sets: Social network data sets released with the UCINet software by
Steve Borgatti et al.
Pajek data sets: Example data sets released with the Pajek software by Vladimir Batagelj
and Andrej Mrvar.
Indiana University data sets: A set of very large data sets, including some non-network data sets, compiled
by the School of Library and Information Science at Indiana University. Network data sets include the NBER data set of US patent citations and a data set of links between articles in the on-line encyclopedia Wikipedia.
Duncan Watts' data sets: Data compiled by Prof. Duncan Watts and collaborators at Columbia University, including
data on the structure of the Western States Power Grid and the neural network of the worm C. Elegans.
Laszlo Barabasi's data sets: Data compiled by Prof. Albert-Laszlo Barabasi and collaborators at the
University of Notre Dame, including web data and biochemical networks.
Alex Arenas's data sets: Data compiled by Prof. Alexandre Arenas and collaborators at Universidad
Rovira i Virgili, including metabolic network data and the network from their study of the collaboration patterns of jazz musicians.
Stanford Large Network Dataset Collection: A substantial collection of data sets describing very large networks,
including social networks, communications networks, and transportation networks.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: