TY - JOUR

T1 - Statistical inference for exploratory data analysis and model diagnostics

AU - Buja, Andreas

AU - Cook, Dianne

AU - Hofmann, Heike

AU - Lawrence, Michael

AU - Lee, Eun Kyung

AU - Swayne, Deborah F.

AU - Wickham, Hadley

PY - 2009/11/13

Y1 - 2009/11/13

N2 - We propose to furnish visual statistical methods with an inferential framework and protocol, modelled on confirmatory statistical testing. In this framework, plots take on the role of test statistics, and human cognition the role of statistical tests. Statistical significance of 'discoveries' is measured by having the human viewer compare the plot of the real dataset with collections of plots of simulated datasets. A simple but rigorous protocol that provides inferential validity is modelled after the 'lineup' popular from criminal legal procedures. Another protocol modelled after the 'Rorschach' inkblot test, well known from (pop-)psychology, will help analysts acclimatize to random variability before being exposed to the plot of the real data. The proposed protocols will be useful for exploratory data analysis, with reference datasets simulated by using a null assumption that structure is absent. The framework is also useful for model diagnostics in which case reference datasets are simulated from the model in question. This latter point follows up on previous proposals. Adopting the protocols will mean an adjustment in working procedures for data analysts, adding more rigour, and teachers might find that incorporating these protocols into the curriculum improves their students' statistical thinking. This journal is

AB - We propose to furnish visual statistical methods with an inferential framework and protocol, modelled on confirmatory statistical testing. In this framework, plots take on the role of test statistics, and human cognition the role of statistical tests. Statistical significance of 'discoveries' is measured by having the human viewer compare the plot of the real dataset with collections of plots of simulated datasets. A simple but rigorous protocol that provides inferential validity is modelled after the 'lineup' popular from criminal legal procedures. Another protocol modelled after the 'Rorschach' inkblot test, well known from (pop-)psychology, will help analysts acclimatize to random variability before being exposed to the plot of the real data. The proposed protocols will be useful for exploratory data analysis, with reference datasets simulated by using a null assumption that structure is absent. The framework is also useful for model diagnostics in which case reference datasets are simulated from the model in question. This latter point follows up on previous proposals. Adopting the protocols will mean an adjustment in working procedures for data analysts, adding more rigour, and teachers might find that incorporating these protocols into the curriculum improves their students' statistical thinking. This journal is

KW - Cognitive perception

KW - Permutation tests

KW - Rotation tests

KW - Simulation

KW - Statistical graphics

KW - Visual data mining

UR - http://www.scopus.com/inward/record.url?scp=73349124770&partnerID=8YFLogxK

U2 - 10.1098/rsta.2009.0120

DO - 10.1098/rsta.2009.0120

M3 - Article

C2 - 19805449

AN - SCOPUS:73349124770

SN - 1364-503X

VL - 367

SP - 4361

EP - 4383

JO - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

JF - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

IS - 1906

ER -