Uses the wbstats package to download recent country-level data from the World Bank (https://data.worldbank.org).

download_wbank_data(
  vars = c("SP.POP.TOTL", "AG.LND.TOTL.K2", "EN.POP.DNST", "EN.URB.LCTY",
    "SP.DYN.LE00.IN", "NY.GDP.PCAP.KD"),
  labels = c("population", "land_area_skm", "pop_density", "pop_largest_city",
    "life_expectancy", "gdp_capita"),
  var_def = FALSE,
  silent = FALSE,
  cached = FALSE
)

Arguments

vars

Specify the data items that you want to retrieve.

labels

Give somewhat more informative variable names for the output data frame. Has to match the length of vars and needs to contain valid variable names.

var_def

Do you want to retrieve a data frame containing the World Bank data definitions along with the actual data? Defaults to FALSE.

silent

Whether you want the function to send some status messages to the console. Might be informative as downloading will take some time and thus defaults to TRUE.

cached

Whether you want to download the cached version of the data from the tidycovid19 Github repository instead of retrieving the data from the authorative source. Downloading the cached version is faster and the cache is updated daily. Defaults to FALSE.

Value

If var_def = FALSE, a data frame containing the data and a timestamp variable indicating the time of data retrieval. Otherwise, a list including the data frame with the data followed by a data frame containing the variable definitions.

Examples

df <- download_wbank_data(silent = TRUE, cached = TRUE) df %>% dplyr::select(country, population) %>% dplyr::arrange(-population)
#> # A tibble: 217 × 2 #> country population #> <chr> <dbl> #> 1 India 1417173173 #> 2 China 1412175000 #> 3 United States 333287557 #> 4 Indonesia 275501339 #> 5 Pakistan 235824862 #> 6 Nigeria 218541212 #> 7 Brazil 215313498 #> 8 Bangladesh 171186372 #> 9 Russian Federation 144236933 #> 10 Mexico 127504125 #> # ℹ 207 more rows
lst <- download_wbank_data(silent = TRUE, cached = TRUE, var_def = TRUE) lst[[1]] %>% tidyr::pivot_longer(5:10, names_to = "wbank_variable", values_to = "values") %>% dplyr::group_by(wbank_variable) %>% dplyr::summarise(non_na = sum(!is.na(values)))
#> # A tibble: 6 × 2 #> wbank_variable non_na #> <chr> <int> #> 1 gdp_capita 210 #> 2 land_area_skm 216 #> 3 life_expectancy 210 #> 4 pop_density 216 #> 5 pop_largest_city 153 #> 6 population 217