TimeMines: Constructing Timelines with Statistical Models of Word Usage
R. Swan and D. Jensen (2000). TimeMines: Constructing timelines with statistical models of word usage. Papers of the ACM SIGKDD 2000 Workshop on Text Mining. pp. 73-80. (also appeared as CIIR Technical Report IR-202, University of Massachusetts, Department of Computer Science).
- Abstract
- We present a system, TimeMines, that automatically generates timelines from date-tagged free text corpora. Our system detects, ranks, and groups semantic features based on their statistical properties. We use these features to discover sets of related stories that deal with a single topic. Our system requires free text with explicit date tags. We have used our system to generate overview timelines, indicating the most important topics in the corpus, how much coverage they receive, and their timespans. Evaluations of TimeMines on two different news corpora show that the patterns found by our system are contained within the data, and the topics found correspond to the top news stories.
- Text
- A PDF version of this paper is available.