Advanced data analytics: Statistical methods for robust inferences in psychological research & industry

Renato Frey, University of Zurich

The seminar has ended.

Overview

Drawing robust inferences from data is of paramount importance for psychologists working in research and industry (e.g., for jobs at the intersection with data science). Yet, several of the statistical methods and practices traditionally employed in psychology have turned out to be problematic in this respect: On the one hand, data analysts may be unaware of their own "researchers' degrees of freedom" and the consequences of high flexibility when making analytic choices (e.g., applying overly flexible methods to small datasets). On the other hand, some traditional methods may be difficult to grasp intuitively due to their complexity or lack of transparency, leading to fundamental misconceptions about their inner workings (e.g., how to interpret a confidence interval). These and related issues hamper the robustness of the inferences drawn in data analytics, and are partly also responsible for the replication crisis in psychology. In this course we will discuss and implement conceptual tools and statistical methods such as preregistered analysis plans, prospective design / parameter recovery analyses, and multiverse analyses – aimed at fostering robust and transparent inferences in psychological research and applied jobs.

Learning goals

To be able to identify red flags in the analysis pipeline that indicate potential threats for robust inferences.
To know about ways of avoiding such threats upfront (e.g., designing preregistered analysis plans).
To be able to implement advanced statistical methods to identify these threats (e.g., prospective design / parameter recovery analyses) and avoid such threats during actual data analysis (e.g., by using multiverse analyses).

Requirements

Regular and active participation
Prior to the sessions: Read the literature and prepare for the discussion
During the sesssions: Ask questions and actively engage in the discussion!
Project work and presentation

Sessions

Welcome session

Various:

Preparation:

What do you consider "advanced data analytics"?
Why did you choose this course?
What kind of knowledge and novel skills would you like to have acquired by the end of the semester? Try to define three personal learning goals.

Researchers' degrees of freedom

Literature:

Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411–421.

Flexible models: Overfitting

Literature:

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393

Project work & presentations

Planning and registering the analysis pipeline

Literature:

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
Hardwicke, T. E., & Ioannidis, J. P. A. (2018). Mapping the universe of registered reports. Nature Human Behaviour, 1. https://doi.org/10.1038/s41562-018-0444-y
Szollosi, A., Kellen, D., Navarro, D., Shiffrin, R., van Rooij, I., Van Zandt, T., & Donkin, C. (2021). Is preregistration worthwhile? https://doi.org/10.31234/osf.io/x36pz

p < .05? Insights through simulation

Literature:

Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157–1164. https://doi.org/10.3758/s13423-013-0572-3

Introduction to Bayesian estimation methods

Literature:

Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14(7), 293–300. https://doi.org/10.1016/j.tics.2010.05.001

Prospective design aka "Bayesian power analysis"

Literature:

Chapter 13 in Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS and STAN (2nd ed.). Academic Press.
Frey, R. (2020). Decisions from experience: Competitive search and choice in kind and wicked environments. Judgment and Decision Making, 15(2), 282–303.

Project work & presentations

Into the multiverse!

Literature:

Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712. https://doi.org/10.1177/1745691616658637
Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “ﬁshing expedition” or “p-hacking” and the research hypothesis was posited ahead of time.

Specification curve analysis

Literature:

Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Specification curve: Descriptive and inferential statistics on all reasonable specifications. SSRN Electronic Journal, 1–18. https://doi.org/10.2139/ssrn.2694998
Frey, R., Richter, D., Schupp, J., Hertwig, R., & Mata, R. (2021). Identifying robust correlates of risk preference: A systematic approach using specification curve analysis. Journal of Personality and Social Psychology, 120(2), 538–557. https://doi.org/10.1037/pspp0000287

Project work & presentations

Parameter recovery analyses

Literature:

van Ravenzwaaij, D., Dutilh, G., & Wagenmakers, E.-J. (2011). Cognitive model decomposition of the BART: Assessment and application. Journal of Mathematical Psychology, 55(1), 94–105. https://doi.org/10.1016/j.jmp.2010.08.010

Advanced data analytics: Statistical methods for robust inferences in psychological research & industry

Renato Frey, University of Zurich

The seminar has ended.

Overview

Learning goals

Requirements

Sessions

Welcome session

Preparation:

Researchers' degrees of freedom

Flexible models: Overfitting

Project work & presentations

Planning and registering the analysis pipeline

p < .05? Insights through simulation

Introduction to Bayesian estimation methods

Prospective design aka "Bayesian power analysis"

Project work & presentations

Into the multiverse!

Specification curve analysis

Project work & presentations

Parameter recovery analyses

Presentations of exercises