Data Challenge Lab Home

Essentials of relational data [wrangle]

(Builds on: Exploratory data analysis (1D))
(Leads to: dplyr and databases)

It is extremely rare to only require a single table of data for an analysis. Far more often you will need to combine together multiple sources of information. Interconnected datasets are often called relational because you need to care about the relationships between the datasets.

Here you’ll first learn about the keys that define the relationship. You’ll then learn about mutating joins, so called because their primary impact is to add new columns, like a mutate(). It’s also useful to learn about the filtering joins, semi_join() and anti_join(), which work primarily like a filter(), restricting the rows.

Readings