Coursera - Reproducible Research - Week 1 - Structure of a Data Analysis

Steps in a data analysis

  1. Define the question
  2. Define the ideal data set
  3. Determine what data you can access
  4. Obtain the data
  5. Clean the data
  6. Exploratory data analysis
  7. Statistical prediction/modeling
  8. Interpret results
  9. Challenge results
  10. Synthesize/write up results
  11. Create reproducible code

You will have either a surplus or insufficient information in order to solve your problems. Defining a question as narrowly as possible will help to reduce the noise in solving your problem.

e.g.,
Start with a general question: Can I automatically detect e-mails that are spam or not? Make it concrete: Can I use quantitative characteristics of the emails to classify them as spam?

Defining the ideal data set

Determine what data you can access

Obtain the data

Clean the data

Subsampling our data set

Exploratory data analysis

Statistical prediction/modeling

Interpret results

Challenge results

Synthesize/write-up results

Lastly, create reproducible code using Markdown, knitr, Rstudio. It will make the evidence for your conclusions much more powerful.

Published January 18, 2015