download_wbank_data.Rd
Uses the wbstats
package to download recent country-level data from
the World Bank (https://data.worldbank.org).
download_wbank_data( vars = c("SP.POP.TOTL", "AG.LND.TOTL.K2", "EN.POP.DNST", "EN.URB.LCTY", "SP.DYN.LE00.IN", "NY.GDP.PCAP.KD"), labels = c("population", "land_area_skm", "pop_density", "pop_largest_city", "life_expectancy", "gdp_capita"), var_def = FALSE, silent = FALSE, cached = FALSE )
vars | Specify the data items that you want to retrieve. |
---|---|
labels | Give somewhat more informative variable names for the output
data frame. Has to match the length of |
var_def | Do you want to retrieve a data frame containing the World Bank
data definitions along with the actual data? Defaults to |
silent | Whether you want the function to send some status messages to
the console. Might be informative as downloading will take some time
and thus defaults to |
cached | Whether you want to download the cached version of the data
from the tidycovid19 Github repository instead of retrieving the
data from the authorative source. Downloading the cached version is
faster and the cache is updated daily. Defaults to |
If var_def = FALSE
, a data frame containing the
data and a timestamp
variable indicating the time of data
retrieval. Otherwise, a list including the data frame with the
data followed by a data frame containing the variable definitions.
df <- download_wbank_data(silent = TRUE, cached = TRUE) df %>% dplyr::select(country, population) %>% dplyr::arrange(-population)#> # A tibble: 217 × 2 #> country population #> <chr> <dbl> #> 1 India 1417173173 #> 2 China 1412175000 #> 3 United States 333287557 #> 4 Indonesia 275501339 #> 5 Pakistan 235824862 #> 6 Nigeria 218541212 #> 7 Brazil 215313498 #> 8 Bangladesh 171186372 #> 9 Russian Federation 144236933 #> 10 Mexico 127504125 #> # ℹ 207 more rowslst <- download_wbank_data(silent = TRUE, cached = TRUE, var_def = TRUE) lst[[1]] %>% tidyr::pivot_longer(5:10, names_to = "wbank_variable", values_to = "values") %>% dplyr::group_by(wbank_variable) %>% dplyr::summarise(non_na = sum(!is.na(values)))#> # A tibble: 6 × 2 #> wbank_variable non_na #> <chr> <int> #> 1 gdp_capita 210 #> 2 land_area_skm 216 #> 3 life_expectancy 210 #> 4 pop_density 216 #> 5 pop_largest_city 153 #> 6 population 217