Technical Description
For our project, we will use data on global tourism and related variables and indicators to study the effect tourism has on countries around the world. The main tourism-specific data that we will be using comes from the United Nations World Tourism Organization (UNWTO). We will draw data from their “Tourism Statistics Database”, from which we will use five tables containing tourism indicators for each country. The tables have data for Inbound Tourism, Domestic Tourism, Outbound Tourism, Tourism Industries, and Employment. Inbound, Domestic, and Outbound Tourism tables have data on arrivals, accommodation, expenditure, and departures. We will focus on inbound arrivals and employment. The data provides indicators by country, with values (in thousands) for each indicator from 1995-2022. Countries are listed in the same column as the indicators, with the same indicators for each country following in the rows after the country name. Therefore, to make this data useful for graphing, we will need to do quite a bit of engineering to organize indicators. The data is collected yearly from countries by the UNWTO, with the most recent data update taking place October 2023. However, our data is yearly and thus only goes until 2022, since the full data for 2023 is not yet available. Further, the data requires that each individual country accurately and routinely tracks tourism data and then reports it to the United Nations, and thus there is a large amount of missing data for many countries, especially in earlier years. In fact, the UN did not outline their International Recommendations for Tourism Statistics (IRTS 2008) until 2008, and the World Tourism Organization only became an agency of the UN in 2004, so data was not as regulated before these years.
To study the effects that tourism has on the economy, climate, and happiness, we will use supplemental data. For economic factors, we will study employment from the UNWTO data, and will also use data from the International Monetary Fund (IMF) to analyze the relationship between tourism and GDP. The data is a simple table where rows are countries, columns are the years from 1980 to 2028, with data from 2023 to 2028 being projected values.The values in the table represent actual GDP values for each country and year. The data comes from the IMF’s World Economic Outlook report, which is updated yearly. The data mapper from which we are pulling data only includes some indicators from the entire report. Data is collected as accurately as possible by IMF employees, and historical data is updated as more information becomes available. Though the employees of IMF gather data as meticulously as possible, of course there are still gaps. Similarly to tourism data, much of the data is missing in earlier years. Our downloaded data was updated October 2023.
Our climate data also comes from the International Monetary Fund, from their Climate Change Dashboard, which offers global data on many climate change indicators. We will be studying Climate-related Disasters data. This data is organized similarly to the GDP data, with rows being for countries and columns for years 1980-2022. The values in the table indicate the number of climate-related disasters that occurred in each country for each year. Each country has multiple rows, since for each there is a row for drought, temperature, flood, wildfire, storm, landslide, and total. The data was last updated in April 2023, but data is updated when there is availability of new indicators or methodology, which does not have a set schedule. Again, similar to GDP, data is collected and released by IMF employees to the best of their ability.
Finally, we will use happiness data from the World Happiness Report, which gathers data from the Gallup World Poll surveys from 2020 to 2022. Unfortunately, the average sample size from the poll is 1,000 people per country, and thus the data is far from conclusive. However, the data offers multiple “happiness indicators” from the poll for any participating countries. Some examples are Social support data, a Healthy Life score, a value for “Freedom to make life choices”, and more. Data is quantitative, using an average score from the Gallup poll. The report has been released most years since 2012, but we will focus on the most recent report, with data from the 2023 Gallup poll.
For all data, we will download excel or csv files and read them as data frames. The indicator data (economy, climate, happiness) will not need much engineering beyond missing values. UNWTO data will need to be pivoted. Then, data will be joined by country name to conduct our analysis and create graphs.