My research interweaves threads of basic science, practical technology, and assessments of the design and impact of those technologies. This page describes some of the general research areas that characterize my research. Three related pages provide guides to each thread. All of these pages link to papers, talks, and other resources that provide more detail.
Relational knowledge discovery My current research focuses on relational knowledge discovery constructing useful statistical models from data about complex relationships among people, places, things, and events. New developments in this area are vital because of the growing interest in mining huge data sets drawn from the Web, telecommunications and computer networks, relational databases, object-oriented databases, and other sources of structured and semi-structured data. A good general overview of this work can be found in a paper that Jen Neville and I presented at a 2002 National Academy of Sciences symposium. A good example of applying this work can be found in our winning entry to the 2002 KDD Cup competition, completed by a KDL team led by Amy McGovern. Our research in this area has identified how relational tasks raise new statistical challenges for learning algorithms as well as new opportunities for building relational models and using collective inference. To address these challenges and opportunities, my research group has built new algorithms for relational knowledge discovery (including relational dependency networks, relational probability trees, relational Bayesian classifiers, and relational multiple instance learning) and other complementary tools such as the query language QGraph. To facilitate the early research in this area, I organized three research symposia: The 1998 AAAI Fall Symposium on AI and Link Analysis (with Henry Goldberg), the AAAI 2000 Workshop on Learning Statistical Models from Relational Data (with Lise Getoor), and the IJCAI 2003 Workshop on Learning Statistical Models from Relational Data (also with Lise Getoor).
Statistical inference in induction algorithms Algorithms for knowledge discovery and data mining create unique challenges for accurate statistical inferences about induced models. My most general work in this area concerns the statistical effects of multiple comparison algorithms. Other work has addressed the unique challenges raised by algorithms for relational learning and the remarkable linear relationship between the size of decision trees built by common algorithms and the size of the training set used to build them. Throughout my research career, I have studied randomization tests, an exceedingly general and robust tool for estimating the sampling distribution of statistics used by induction algorithms.
Evaluation of systems for knowledge discovery and machine learning Systems for relational knowledge discovery have a wide variety of applications in government, business, and science. The design of such systems, and their performance characteristics, can have a profound effect on their social, political, and institutional impacts. I have given several tutorials on how to evaluate systems for knowledge discovery and machine learning, spoken and written about data mining and counter-terrorism, and assessed how to use AI techniques to detect money laundering. Evaluation methods for studying and evaluating AI systems are essential for good research and development. Toward that end, Paul Cohen and I created Evaluation of Intelligent Systems (EIS), a website on empirical methods for studying the behavior of AI systems (now somewhat outdated).
Learning in multiagent systems Increasing numbers of computer systems are being structured as sets of interacting agents, where each agent has some independent capability for learning, reasoning, and acting. This creates the potential for social pathologies, situations where individually beneficial behavior produces systemwide behavior that is pathological. Such pathologies result from interdependencies among the actions of individual agents. Such interdependencies can be learned, a topic I have explored with several colleagues. See our AAAI99 paper and a subsequent technical report.
Managing knowledge discovery processes The process of knowledge discovery is often a complex, contingent process. A paper I wrote with several students and colleagues addresses how to use a process programming language to coordinate multiple human and automated agents for knowledge discovery. A paper at 1997 AAAI Spring Symposium on AI in Knowledge Management explored the special challenges of managing inductive knowledge in collaborative environments.