(Builds on: Parsing basics)
# Libraries library(tidyverse) library(googlesheets) # Parameters # URL for Gapminder example url_gapminder <- "https://docs.google.com/spreadsheets/d/1BzfL0kZUz1TsI5zxJF1WNF01IxvC67FbOJUiiGMZ_mQ/"
In this reading, we’ll show you how to use the googlesheets package by Jenny Bryan to (surprise!) extract data from Google sheets. Google sheets are a surprisingly useful way to collect and collaboratively work with data, and the googlesheets package makes it easy to pull that data into R. One useful workflow involves using Google forms to collect form data into a Google sheet, and then using the googlesheets package to extract and analyze that data in R.
Each Google sheet has a sheet key. You’ll need this key to load a sheet with googlesheets. Here’s how to get the sheet key from a sheet’s URL.
sheet_key <- extract_key_from_url(url_gapminder) sheet_key
##  "1BzfL0kZUz1TsI5zxJF1WNF01IxvC67FbOJUiiGMZ_mQ"
Once you have the sheet key, you can use it to create a googlesheets object.
gs <- gs_key(sheet_key) class(gs)
##  "googlesheet" "list"
You can also list all the worksheets in a single Google sheet with
##  "Africa" "Americas" "Asia" "Europe" "Oceania"
To read in a worksheet, use
gs_read() takes a
googlesheets object and, optionally, the name of a specific worksheet.
asia <- gs %>% gs_read(ws = "Asia")
## Accessing worksheet titled 'Asia'. ## Parsed with column specification: ## cols( ## country = col_character(), ## continent = col_character(), ## year = col_double(), ## lifeExp = col_double(), ## pop = col_double(), ## gdpPercap = col_double() ## )
## # A tibble: 396 x 6 ## country continent year lifeExp pop gdpPercap ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 386 more rows
Accessing private sheets requires you to authenticate to Google. Authentication is done with this command.
# Give googlesheets permission to access spreadsheet gs_auth()
A page will open in your browser and you’ll be prompted to log into
Google. Once you’ve logged in, googlesheets will create a file called
.httr-oauth in your current directory. NEVER CHECK THIS INTO GIT OR
UPLOAD IT TO GITHUB. To ensure that you don’t actually push your
gs_auth() will add
.httr-oauth to the
file in your current directory.
.httr-oauth file allows you to avoid logging into Google in the
future. You don’t upload this file to GitHub because if someone else had
it, they could use it to access your Google files.
A common error you’ll encounter involves a googlesheets function being
unable find the
.httr-oauth file. If you’re using an RStudio project,
your working directory is usually the top level of the project, and so
gs_auth() will put
.httr-oauth in the top-level of the directory.
However, you’ll often be working with an R Markdown document or R script
located in a subfolder, and so the googlesheets function will look for
.httr-oauth in that subfolder. To avoid this problem, you can copy
.httr-oauth over to your subfolder, so that you have one
the top level of the project and one in your subfolder.
Once authenticated into Google, you now need to find the sheet key for
the relevant Google sheet. Like we did earlier, you can find the sheet’s
URl and then call
extract_key_from_url() like we did earlier.
With private sheets, another option is to use
gs_ls() to see all the
sheets to which you have access.
gs_ls() orders by modification time,
with the most recently modified sheets first.
gs_ls() %>% view()
Locate the relevant sheet and copy its sheet key. Create a variable for
the key in your parameters (e.g.,
sheet_key <- "XdkfjOkdjk"), and then
read in the sheet using
df <- gs_key(sheet_key) %>% gs_read()