New dream? I guess I'll invest in myself.
Today I enrolled in the Coursera Data Science Specialization track. Here are some notes from the course I took today and a few accompanying thoughts.
The key challenge in data science is that you're either in a situation where you:
- don't have enough information to solve your problem and need to seek it out or
- are overwhelmed with a surplus of information and need to filter out the erroneous parts.
According to McKinsey, "Big Data is the next frontier for innovation, competition, and productivity."
statistics - the science of learning from data
To me, Big Data is a buzzword. It's truly the application of statistics to solving complex problems in a complex world by turning information into actionable insights. Something interesting that I found, though, is that there are difference kinds of data science depending on the field you're working in. While this seems intuitive, the labels were helpful. Biostatistics, data science, machine learning, natural language processing, signal processing, business analytics, econometrics, and statistical process control are all branches of the same tree.
They also showed this Venn diagram:
and defined what it is exactly that data scientists do:
- define the question of interest
- define the ideal data set
- get and clean the data
- explore the data (plots, clustering, find patterns)
- perform statistical prediction/modeling
- interpret, challenge, and finally synthesize the results in a reproducible way
- share the results
The primary statistical analysis tool in this track will be R. The lecturer, Dr. Jeff Leek, went over some important R functions using the function
access the help file
search help files
get arguments
see the function body
During this time, I installed R and Rstudio before delving into how to ask questions and find answers the smart way, based on a document by two software engineers. I e-mailed this document to my work account to review in the morning.
After that, he went over the Data Science Specialization track by overviewing:
- Getting Data
- Exploratory Data Analysis (graphing and plotting data in R)
- Reproducible Research
- Statistical Inference
- Regression Models
- Practical Machine Learning
- Building Data Products
Then it was time for the Weekly Quiz, which was pretty straightforward until the last question: What are R packages that provide machine learning functionality? After some quick Googling, I Ctrl+F'd my way to:
Well, that was fun. Looking forward to starting the next course tomorrow.