Automatic Identification of Quasi-Experimental Designs
for Discovering Causal Knowledge
D. Jensen, A. Fast, B. Taylor, and M. Maier (2008). Automatic identification of quasi-experimental designs for discovering causal knowledge. Proceedings of the Fourteenth
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Abstract
- Researchers in the social and behavioral sciences routinely rely on
quasi-experimental designs to discover knowledge from large
databases. Quasi-experimental designs (QEDs) exploit fortuitous
circumstances in non-experimental data to identify situations
(sometimes called “natural experiments”) that provide the
equivalent of experimental control and randomization. QEDs
allow researchers in domains as diverse as sociology, medicine,
and marketing to draw reliable inferences about causal
dependencies from non-experimental data. Unfortunately,
identifying and exploiting QEDs has remained a painstaking
manual activity, requiring researchers to scour available databases
and apply substantial knowledge of statistics. However, recent
advances in the expressiveness of databases, and increases in their
size and complexity, provide the necessary conditions to
automatically identify QEDs. In this paper, we describe the first
system to discover knowledge by applying quasi-experimental
designs that were identified automatically. We demonstrate that
QEDs can be identified in a traditional database schema and that
such identification requires only a small number of extensions to
that schema, knowledge about quasi-experimental design encoded
in first-order logic, and a theorem-proving engine. We describe
several key innovations necessary to enable this system, including
methods for automatically constructing appropriate experimental
units and for creating aggregate variables on those units. We show
that applying the resulting designs can identify important causal
dependencies in real domains, and we provide examples from
academic publishing, movie making and marketing, and peerproduction
systems. Finally, we discuss the integration of QEDs
with other approaches to causal discovery, including joint
modeling and directed experimentation.
- Text
- A PDF version of this paper is available.