R for data science
The best place to start learning the tidyverse is R for Data Science (R4DS for short), an O’Reilly book written by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. It’s designed to take you from knowing nothing about R or the tidyverse to having all the basic tools of data science at your fingertips. You can read it online for free, or buy a physical copy.
We highly recommend pairing R4DS with the Posit cheatsheets. These cheatsheets have been carefully designed to pack a lot of information into a small amount of space. You can keep them handy at your desk and quickly jog your memory when you get stuck. Most of the cheatsheets have been translated into multiple languages.
Books
-
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse by Chester Ismay and Albert Y. Kim. “Help! I’m new to R and RStudio and I need to learn them! What do I do?” If you’re asking yourself this, this book is for you.
-
ggplot2: Elegant Graphics for Data Science by Hadley Wickham. Goes into greater depth into the ggplot2 visualisation system.
-
R for Data Science (1e): Exercise Solutions by Jeffrey B. Arnold. Work in progress.
-
Data Manipulation in R by Steph Locke. Covers data manipulation in a tidyverse way.
Workshops
- Data Wrangling in the Tidyverse by Jumping Rivers. This course will show you how you can use R to efficiently clean and wrangle your data into a format that’s ready for analysis. You will learn about the Tidyverse, what tidy data really is, and how to practically achieve it with packages such as dplyr, tidyr, lubridate, and forcats.
- Learn R for Data Analysis by Locke Data. Attend this two day course to get hands-on with the R programming language. Learn how to connect to different data sources, wrangle the data into the shape you need, visualise it, and compile everything into reports.
Teaching materials
Data Science in a Box contains the complete materials for teaching a semester-long introductory data science course. The “box” contains materials for an undergraduate level introductory data science course, such as slide decks, homework assignments, guided labs, sample exams, a final project assignment, as well as materials for instructors such as pedagogical tips, information on computing infrastructure, technology stack, and course logistics. The website exposes the source materials that live in a GitHub repository and use datasets from the dsbox package.