Merges data from the Johns Hopkins University CSSE team on the spread of the SARS-CoV-2 virus and the Covid-19 pandemic (, case data provided by the ECDC (, the ACAPS governmental measures database (, the Oxford Covid-19 Government Respoonse Tracker (, Mobility Trends Reports provided by Apple related to Covid-19 (, Google COVID-19 Community Mobility Reports (, Google Trends Covid-19 related search volume (, data on tests, vaccinations and hospitalizations as collected by the Our World in Data team, (, and from the World Bank ( intro a country-day data frame. Variable definitions are provided by the data frame tidycovid19_variable_definitions that also reports on the current status of the data.

  wbank_vars = c("SP.POP.TOTL", "AG.LND.TOTL.K2", "EN.POP.DNST", "EN.URB.LCTY",
    "SP.DYN.LE00.IN", "NY.GDP.PCAP.KD"),
  wbank_labels = c("population", "land_area_skm", "pop_density", "pop_largest_city",
    "life_expectancy", "gdp_capita"),
  search_term = "coronavirus",
  silent = FALSE,
  cached = FALSE



Specify the World Bank data items that you want to retrieve.


Give somewhat more informative World Bank variable names for the output data frame. Has to match the length of wbank_vars and needs to contain valid variable names.


Google Trends serch term. Defaults to "coronavirus".


Whether you want the function to send some status messages to the console. Might be informative as downloading will take some time and thus defaults to TRUE.


Whether you want to download the cached version of the data from the tidycovid19 Github repository instead of retrieving the data from the authorative source. Downloading the cached version is faster and the cache is updated daily. Defaults to FALSE.


A data frame containing the data, organized by country and date. It includes a timestamp variable indicating the time of data retrieval.


See the documentation of the separate download functions of the package for more detail.


df <- download_merged_data(silent = TRUE, cached = TRUE) df %>% dplyr::group_by(iso3c) %>% dplyr::filter(population > 10e6) %>% dplyr::summarise( cases_per_1e5_pop = max(1e5*(confirmed/population), na.rm = TRUE), soc_dist_measures = max(soc_dist, na.rm = TRUE), .groups = "drop" ) %>% dplyr::filter(cases_per_1e5_pop >= 1000) %>% ggplot2::ggplot(mapping = ggplot2::aes(x = cases_per_1e5_pop, y = soc_dist_measures)) + ggplot2::geom_point() + ggrepel::geom_label_repel(ggplot2::aes(label = iso3c)) + ggplot2::scale_x_continuous(trans='log10', labels = scales::comma)