Benchmarks and Replicability in Software Engineering Research: Challenges and Opportunities
Benchmarks in software engineering research have propelled technological advances and scientific discoveries and, together with a push towards more transparency and artifact evaluations, they have improved velocity, reproducibility, and comparability. In this talk, I will first discuss the importance and inherent trade-offs of curating and using benchmarks in empirical research, highlighting lessons learned from building and maintaining Defects4J. Then, I will reflect on the progress our research community has made towards shared data, reusable research artifacts, and replicability of scientific results. Finally, I will outline remaining pitfalls and challenges around ever-increasing artifact complexity, artifact verification, underspecified research designs, and the garden of forking paths problem. I will conclude with a discussion on the importance of replicability (in contrast to repeatability/reproducibility, as defined by the ACM) and how applying software engineering principles and best practices to empirical science and research artifacts addresses many of the outlined pitfalls and challenges.
Fri 20 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
09:00 - 10:00 | |||
09:00 60mKeynote | Benchmarks and Replicability in Software Engineering Research: Challenges and Opportunities Keynotes |