Knowledge Discovery Laboratory Knowledge Discovery Laboratory
Dataset: HEP-Th
Home

People

Publications

Projects

Software

Data

Data model
About attributes
Databases
   HEP-Th
   Can-o-sleep
   Mobile Social Networks
   DBLP

News

Data characteristics:
  • over 42,000 objects
  • over 500,000 links
  • 39 object attributes
  • 15 link attributes
Additional information:
The HEP-Th database presents information on papers in theoretical high-energy physics. The data in this dataset were derived from the abstract and citation files provided for the 2003 KDD Cup competition. The original datasets are from arXiv, an electronic archive of research papers physics and selected other sciences, and the SLAC SPIRES-HEP database, a comprehensive catalog of high-energy particle physics literature compiled by the Stanford Linear Accelerator Center. KDL won first place in the open division of the 2003 KDD Cup competition for its identification and analysis of publication patterns in the data, as presented in "Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics".

See the README for additional information on the HEP-Th database.

Acknowledgments:
Please include the following acknowledgment in all publications that describe work using this database:

The PROXIMITY HEP-Th database is based on data from the arXiv archive and the Stanford Linear Accelerator Center SPIRES-HEP database provided for the 2003 KDD Cup competition with additional preparation performed by the Knowledge Discovery Laboratory, University of Massachusetts Amherst.
Preparation of the PROXIMITY HEP-Th database was supported by Lawrence Livermore National Laboratory and the Department of Energy under contract number W7405-ENG-48.
FeedbackPrivacyDisclaimer