An Examination of Wikipedia Documentation Across the Best National Parks in the World

by Katherine Patetta

 

I have been to a number of national parks across the United States growing up, but as I begin to travel outside of the US more, I am finding myself wanting to visit national parks in different countries. Although I have only been to a few in other countries, it is interesting to see how different the national parks are run compared to the ones in the US. I hope to translate this experience to this class. Thus, for my final project, I want to analyze how documentation on national parks vary based on whether the national park is located in the United States or in another country.

In order to analyze national parks located in different countries, I selected eight of the most popular national parks in the world – four of which are found in the US. The four national parks that are located in the US are Yosemite National Park, Glacier National Park, Grand Canyon National Park, and Yellowstone National Park. The remaining four are as follows: Kruger National Park in South Africa, Banff National Park in Canada, Torres del Paine National Park in Chile, and Plitvice Lakes National Park in Croatia. The data on each of these national parks is pulled from their respective Wikipedia page. Through my code, I was able to order the pages how I desired. I arranged the pages strategically with the four United States’ national parks first, then the four non-US parks. However, the two sections of parks based on their location has no specific order.


National Parks - Yosemite, Glacier, Grand Canyon, Yellowstone, Torres del Paine, Plitvice Lakes, Kruger, Banff (from top to bottom, left to right)

Before beginning any analysis of the different Wikipedia pages, I hypothesized that the national parks located in the US would have more, and possibly better, documentation – such as more sections, more images, and perhaps an older history of pages. I based this hypothesis off of my experiences visiting national parks in the US and outside of the US. The US National Park Service seems to be a very well run agency that does a good job managing and promoting the parks. From my experiences thus far, I have observed that non-US national parks do not have as good of infrastructure compared to the US.

My findings were very different than expected. After exploring the results more thoroughly, my hypothesis was quickly disproved, and I was left with many questions. I want to begin my analysis with my first assumption – the Wikipedia pages of the US national parks would be longer. To see it more visually, I want to show my observations in a table. Based on the most recent (2018) Wikipedia pages, the following Table 1 shows the page of the national park, along with the number of words and sections.


Table 1 - Word and Section Count

While I assumed that the Wikipedia pages for the US national parks would be longer, meaning more words and sections, the results show a completely different story. The US national parks’ word count ranged from about 1500-2200, with the exception of Grand Canyon at only 402 words. Additionally, the sections ranged from 15-30 for the US parks.

I was very surprised to see how little the word count for the Grand Canyon was, especially because this national park is one of the most often visited parks in the US. Indeed, the first paragraph even notes how frequently visited the park is – “The park…received more than six million recreational visitors in 2017, which is the second highest count of all American national parks after Great Smoky Mountains National Park.” The three other parks in the US had nothing noted about yearly visitation in their first paragraphs. Perhaps the word count explains how the national park is mainly just two different rims overlooking a canyon. While I think there is much more to this national park than that, it might not be as diverse in different geographic features, or animals and plants, as the other national parks.

The word count for national parks outside of the US varied again, ranging from about 750-1600. While this data set is a bit too small to make any ultimate conclusions, this range is below the range for the US, disregarding the Grand Canyon as an outlier. Although this finding supports my hypothesis, I think it is important to look at the number of sections. Despite having a lower word count average, the national parks outside of the US had a significant amount of sections on their Wikipedia pages. Three of the pages had 34 sections, with the fourth at 16 sections. Across all pages, the number of main sections averaged around 11 sections. However, it seems like the non-US parks had more subsections. For example, Yellowstone only had 11 main sections with very few subsections, totaling 19 sections. These subsections included a more in depth history, geology, and biology and ecology. From experience, this makes sense because the geographic features, as well as the animals, make the national park so special. However, when I looked at the non-US parks and their subsections, they were very specific. For example, Kruger had 4 subsections identifying specific vegetation to the park and Plitvice Lakes had 7 subsections describing the creation of the rock. While all of this thorough information appears important to each of the respective national parks, I am surprised that Yellowstone does not have more subsections describing more specifically the science behind the geology, such as the geysers. I looked into the content in the subsections to see if authors compacted any information, but they did not.

Another prediction that I had related to the page histories. I presumed that the non-US parks would have more recently developed Wikipedia pages, contrary to the US parks, which I predicted would have more dated pages. For example, I thought that the oldest page for Torres del Paine might have been 2005, whereas the oldest page for Yosemite might have been 2002. Wikipedia was created in 2001, so nothing would have dated back before then. The following Table 2 shows the year of the earliest Wikipedia page created for each national park.


Table 2 - Page Histories

Each of the pages were created fairly early from 2001-2003, except for Torres del Paine National Park. I originally thought that the more recent page creation might have been due to how secluded the national park is. However, this reasoning would not make sense for the other national parks since Kruger National Park is also a bit secluded in South Africa. Torres del Paine had very minimal information on its Wikipedia page until around 2012 when the number of characters almost doubled. I am not sure why the page drastically improved during this year, but the number of main sections increased dramatically from 7 to 12 sections.

I made another interesting observation on the changes of the first paragraph over time for the Grand Canyon page. The first paragraph was practically the same from 2002-2015, with a few more added quantitative figures describing the area of the park over the years. However, in 2016, the first paragraph added a whole sentence relating to the number of annual visitors: “As of 2015, the park received more than five and a half million recreational visitors, which is the second highest count of all U.S. national parks after Great Smoky Mountains National Park.” Each year, this sentence was edited to fit the specific year with an updated number. However, I did a little bit of research outside of Wikipedia. GrandCanyon.com states that Grand Canyon National Park has been averaging about 4.5 million recreation visitors since 1992. These figures are different than what is written on the Wikipedia pages. While this was only one other source, I find it hard to believe that there should be such a large disparity between one source and Wikipedia.

My findings overall disproved my hypothesis. While I initially believed that the US national parks would have more and better documentation on Wikipedia, my findings did not have enough concrete trends to uphold my hypothesis. Overall, the number of words varied drastically for each of the Wikipedia pages, as well as the number of sections. Additionally, the page histories and page creations did not differ between the two groups as expected. I was able to see the how Wikipedia structures its knowledge as well. It was interesting to see slight variations across each individual Wikipedia page that I selected. While most sections consisted of paragraphs of information, others had tables or just bullet points. Additionally, the pages with the greatest number of sections not only had many subsections, but sub-subsections as well.