Coursera - Reproducible Research - Week 1 - Concepts and Ideas

replication - the ultimate standard for strengthening scientific evidence using findings gathered using independent investigators, data, analytical methods, laboratories, instruments, etc.

Replication is particularly important in studies that can impact broad or regulatory decisions.

What's wrong with replication?

Some studies cannot be replicated due to lack of

Reproducible research uses analytic data and code so that others may reproduce findings. Why do we need it?

An example of reproducible research is air pollution and health research. We are estimating small (but important) health effects in the presence of much stronger signals. Results inform substantial policy decisions and affect many stakeholders. EPA regulations can cost billions of dollars. Complex statistical methods are needed and subjected to intense scrutiny.

What do we need for reproducible research?

Who are the players in reproducibility?

Authors

Readers

Challenges

In reality ...

Authors

Readers

Literate (Statistical) Programming

Literate programming is a general concept that requires

  1. a documentation language (human readable)
  2. a programming language (machine readable)

Sweave (main website)

Sweave has many limitations, though. knitr is an alternative (more recent) package.

knitr (main website)

Summary

Published January 18, 2015