Advanced data analytics: Statistical methods for robust inferences in psychological research & industry
Renato Frey, University of Zurich
Dieses Seminar ist beendet.
Overview
Drawing robust inferences from data is of paramount importance for psychologists working in research and industry (e.g., for jobs at the intersection with data science). Yet, several of the statistical methods and practices traditionally employed in psychology have turned out to be problematic in this respect: On the one hand, data analysts may be unaware of their own "researchers' degrees of freedom" and the consequences of high flexibility when making analytic choices (e.g., applying overly flexible methods to small datasets). On the other hand, some traditional methods may be difficult to grasp intuitively due to their complexity or lack of transparency, leading to fundamental misconceptions about their inner workings (e.g., how to interpret a confidence interval). These and related issues hamper the robustness of the inferences drawn in data analytics, and are partly also responsible for the replication crisis in psychology. In this course we will discuss and implement conceptual tools and statistical methods such as preregistered analysis plans, prospective design / parameter recovery analyses, and multiverse analyses – aimed at fostering robust and transparent inferences in psychological research and applied jobs.
Learning goals
- To be able to identify red flags in the analysis pipeline that indicate potential threats for robust inferences.
- To know about ways of avoiding such threats upfront (e.g., designing preregistered analysis plans).
- To be able to implement advanced statistical methods to identify these threats (e.g., prospective design / parameter recovery analyses) and avoid such threats during actual data analysis (e.g., by using multiverse analyses).
Requirements
- Regular and active participation
- Prior to the sessions: Read the literature and prepare for the discussion
- During the sesssions: Ask questions and actively engage in the discussion!
- Project work and presentation
Sessions
20.09.2022 - Welcome session
Various:
Preparation:
- What do you consider "advanced data analytics"?
- Why did you choose this course?
- What kind of knowledge and novel skills would you like to have acquired by the end of the semester? Try to define three personal learning goals.
27.09.2022 - Researchers' degrees of freedom
Literature:
- Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411–421.
04.10.2022 - Flexible models: Overfitting
Literature:
- Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393
11.10.2022 - Planning and registering the analysis pipeline
Literature:
- Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
- Hardwicke, T. E., & Ioannidis, J. P. A. (2018). Mapping the universe of registered reports. Nature Human Behaviour, 1. https://doi.org/10.1038/s41562-018-0444-y
- Szollosi, A., Kellen, D., Navarro, D., Shiffrin, R., van Rooij, I., Van Zandt, T., & Donkin, C. (2021). Is preregistration worthwhile? https://doi.org/10.31234/osf.io/x36pz
18.10.2022 - p < .05? Insights through simulation
Literature:
- Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157–1164. https://doi.org/10.3758/s13423-013-0572-3
25.10.2022 - Introduction to Bayesian estimation methods
Literature:
- Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14(7), 293–300. https://doi.org/10.1016/j.tics.2010.05.001
01.11.2022 - Prospective design aka "Bayesian power analysis"
Literature:
- Chapter 13 in Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS and STAN (2nd ed.). Academic Press.
- Frey, R. (2020). Decisions from experience: Competitive search and choice in kind and wicked environments. Judgment and Decision Making, 15(2), 282–303.
08.11.2022 - Presentations of exercises
15.11.2022 - Into the multiverse!
Literature:
- Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712. https://doi.org/10.1177/1745691616658637
- Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time.
22.11.2022 - Specification curve analysis
Literature:
- Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Specification curve: Descriptive and inferential statistics on all reasonable specifications. SSRN Electronic Journal, 1–18. https://doi.org/10.2139/ssrn.2694998
- Frey, R., Richter, D., Schupp, J., Hertwig, R., & Mata, R. (2021). Identifying robust correlates of risk preference: A systematic approach using specification curve analysis. Journal of Personality and Social Psychology, 120(2), 538–557. https://doi.org/10.1037/pspp0000287
29.11.2022 - Project work
06.12.2022 - Project work
13.12.2022 - Parameter recovery analyses
Literature:
- van Ravenzwaaij, D., Dutilh, G., & Wagenmakers, E.-J. (2011). Cognitive model decomposition of the BART: Assessment and application. Journal of Mathematical Psychology, 55(1), 94–105. https://doi.org/10.1016/j.jmp.2010.08.010
20.12.2022 - Presentations of exercises