The second aim for doing this tutorial is to show you a quick way of mining data via APIs (this time we use the World Bank API) and using a tool like Sublime Text to quickly clean up your data to produce a nice clean CSV file.
Ok, let's get started!
World Bank Data API: Let's build some data
For some of your Assessment 2 and 3 stories, you can use sources like the World Bank. The World Bank has a very big open data base for countries around the world. In this tutorial, we build a data set for countries by their names, capitals, income levels, latitude and longitude. We use the data to build our first web-based interactive map. Do the following:
- Go to the World Bank API page. API stands for Application Programming Interface, and enables programmatic building of data. Many other data sources have their own APIs. For example, Twitter has its own API, as does Flickr. Maybe some of you would want to experiment with these others for social media data based stories! You can do calls with the API to mine data as JSON or XML variables. In this instance, we do a direct call and build up a rather simple data set, non-programmatically.
- On the World Bank API page, to the right, you will see the whole API menu. Click on API: Country Queries. As the name suggests, Country Queries will allow us to build data based on countries.
- On the Country Queries page, let's explore the different ways in which data calls can be made. For example, the request format for a single country (Brazil) is shown as:
- In this instance, we don't want a single country. Instead, we want ALL the countries. So, we copy and paste the following into a browser window (note we have just removed the br):
- You can see the following:
- You can see that there is a total of 304 results, but there are only 50 entries per page. By the way, this is an XML file, but we don't need to worry about that. Just note that there is a universal structure of mark-ups for the data - that makes it really easy to clean up and put it into a form that we want. We now modify our query as:
- Now we have all the 304 results on the one page. If you would like to explore other ways of building queries, start by looking at the API:Documentation and API:Basic Call Structure items on the API menu. They will show you other ways of constructing data calls.
- Now do a cmd-A to copy the entire data on the page, and paste it into Sublime Text. We will explore the wonders Sublime Text can perform in just a few minutes!
- Select the first few lines as shown and delete them.
- Now do a "Find All" (cmd-f) on all the regular expressions or phrases you see. For example, the first one is <wb:country id=
- When you Find All and delete, in Sublime, basically you can "batch delete" these expressions, so you need to do only a few of these searches and deletes.
- Let's look at the next one, which are the closing tags for the country id and a few other tags, i.e., ">. When you find all and delete this next regular expression, that is ">, you can also insert a comma in batch mode (remember we are building a CSV file). Just delete and then put a comma to insert commas into all of these selected places. Let's do this interactively.
- Great, so we do this for all the regular tags that contain the data, and finally, we just insert the following header row for our CSV file at the very top (basically putting back as column names what we just deleted from the markup file):
- Finally, save two versions - one as a txt file, and one as a CSV. Open the CSV file in Excel:
- Hmm, still some cleaning up to do - some of the results are "aggregates" of countries, so we need to delete these. Some of the countries have names like "The Bahamas", which we need to fix, else we have an extra latitude hanging out. Basically, we notice and use this pattern: (1) Delete every row that doesn't have a lat-long, and (2) Amend every row in which a lat is sticking out or sticking in. Let's do this quickly. Finally, also delete the AdminRegion_id and AdminRegion_name columns, we don't want at this stage to deal with lots of missing values.
- We have our nice clean CSV data file!
- Now, select the longitude and latitude columns, and format them to a number format with 4 decimals, since we would like to have a standard format for these.
- What you have just done is inserted quotes around the A2 entry. Now drag out the little green box to extend this to all rows and columns, except the longitude and latitude columns. Copy the latitude and longitude columns as is (taking care that you have flipped them, so copy the correct ones). Then, use the same Excel trick, and in cell AA2, type the following, and drag it out to apply to all the rows:
- Congratulations! We're ready with our data. If the whole process has really irritated you, then, a simple option would be to write a Python script to do all this automatically. The thumb rule for data processing, cleaning and building that I follow is: (a) If the data set is small, and I only have to use it the one time for one visualisation, hooray for Sublime Text and Excel. (b) If the data is large, or I have to do the same process for multiple data sets over and over, write a Python Script. Most of the time, we do have to build or clean data, since we hardly ever receive it in the format we want. So, knowing how to do this quickly is a valuable skill to have!
- We import a base-map layer from Mapbox.
- On top of this base-map layer, we superpose our data. In this case, the data is in the form of latitude and longitude as points.
First, we will need to make a Mapbox account. Click on the link and sign up. Its a simple process. And then, just click on the "Explore Mapbox" link.
Click on Styles and then click on any of the given Styles. The usual ones are "Mapbox Streets", "Mapbox Light", or "Mapbox Dark", depending on your design choice. Here, I have chosen Mapbox Light. The two tabs "Mapbox" and "Leaflet" give us the links to embed in our visualisation HTML file.
We are ready for the final and most exciting part of our experience today. This part of the tutorial draws from the Leaflet quickstart tutorial. Do the following:
Base Map + Data = Vis
In this final part of the tutorial, we will add the World Bank Data onto the Base Map. The aim of the story or visualisation is to explore the geographic/spatial organisation of rich and poor countries around the world. Our visualisation should finally be able to show whether spatial clusters exist between rich and poor countries: does the world have a clustering of riches? To achieve this, we make the following design decisions:
- To develop a colour based representation for income levels, where each income level category corresponds to a particular colour.
- To choose a visual mapping by shape: i.e., each country is represented as a circle, where the size and colour of each circle corresponds to the same variable: the income level. Note that this choice is not unique: you are actually using two features, colour and shape, to correspond to the same data element, the income level. For example, in some sense, you are saying that "bigger and redder" means "richer". However, you could also try other mappings for a richer story. For example, if you added population data to your CSV file, then you could have the size of the circle correspond to population, and the income level correspond to the colour. Then, you can explore the clustering two ways: do countries with high populations also have the lowest income levels? (Visual map: large circles of a light colour, versus small circles of a dark colour.) Do they cluster geographically? So, the choice of this mapping is the most crucial design decision in a visualisation: it can make or break a visualisation! Think hard, and choose wisely!
- Here is a link to what we are going to build today, and this is what it is going to look like:
To start putting data on our base map layer, we do the following steps:
- This is what it looks like after you've pasted it all in:
- Recall that we added commas after the square brackets when we were building our data? This was because when we paste it here, each sub-array is an element of the bigger array. So, don't forget to remove that extra comma from the last country entry, which is Zimbabwe. That is, the end of your array should look like this:
- Ok, now for some colour coding on the Income Levels. This is an example of what we call an "ordinal" scale, where the sequence of data goes from lowest to highest in some sense, but there is no known exact numerical variation. Here is the list for the income levels, from the world bank website. Each country belongs one of these. I choose an arbitrary colour code here, going from Dark Orange (High Income) to Dark Blue (Low Income), and Not classified as Black:
- HIC: High income coded as rgb(255,0,0).
- UMC: Upper middle income coded as rgb(255,165,0).
- MIC: Middle income coded as rgb(180,180,70).
- LMY: Low & middle income coded as rgb(70,180,70).
- LMC: Lower middle income coded as rgb(0,100,0).
- LIC: Low income coded as rgb(0,0,100).
- INX: Not classified coded as rgb(0,0,0).
Now we create three empty variables, (i) for colour, called C, (ii) for the size of the circle, called S, and (iii) for the interactive popup text, called popup.
- Next, make a for loop, and create a "description list" using the HTML markup <dl></dl> to put in the data features we want in our pop up. Here, I have put the country name, the capital city, and the income level. But you can put other features too.
- Next, make a for loop, iterate over each of our array values, and use if else statements to assign a colour and size (radius of circle) to each country, based on its income level:
- We are now plot these. Each country is plotted by its latitude and longitude, the plot type is circleMarker, with the properties as shown. Note how we are using the colour C and the size S. Finally, we use two methods (i) binPopup to which we pass the popup variable, and finally (ii) we add all this to the map variable, by using the addTo method.
- The final step is to add a legend. We use the ideas from this Leaflet tutorial to add a legend. Just for information, this linked tutorial is for generating Choropleth maps, which you could also do if you had geoJSON data to plot polygons (instead of point data as lat-long, as we have here). The idea remains the same: import a base map and add vector data to it. We might have a follow up tutorial on how to produce geoJSON files in coming weeks.
- To add a legend, we first use the method L.control, which is a base method for implementing all controls on a map. Create the control, put it in a variable called legend, and position it at the bottomright of the map.
- Now, to build the legend, we need colour and size definitions. We note from our data set, that only 4 out of the 7 Income level codes have data. You would have noticed by now, that when we plotted our data, we could see circles of only 4 sizes and colours. So, we add these as functions into our code. Just for fun, look at this new "cleaner" (and self-explanatory) way of defining an if-else statement:
- Now, we need to add the legend to the map. For this, we need to create a DOM element that will contain our legend information, and add it to the map. Look carefully at the code below, it needs a bit of careful chunking, thinking and understanding. First up, we use the onAdd method, which basically takes in a map and creates a DOM element and adds them to the map panes. Using this idea, we define a function that takes in our map. Inside the function, we define a div variable, which uses the method DomUtil.create to generate an HTML div, with the classes as info and legend.
- Now, make an array called country_leg with the names of the Income Levels.
- Now, in a for loop, we add <i></i> tags (HTML special text) and country name elements to the div. Note here, that we are using the div.innerHTML method to do this, which basically sets or gets the properties for the descendants of div. If this is too confusing, we will take a minute and discuss this in class. When we add the <i></i> tags, what we are simply doing is calling upon the functions getColor and getSize we defined earlier to set the colour and size of the legend items. The stuff in yellow quotes is basically adding the format for the "style" of the <i></i> elements. Again, if this is too confusing, we will take a minute and dissect this carefully. When learning something new and confusing, one trick I employ is to try and type everything out myself, bit by bit, instead of copying and pasting it - this helps me to really break down the code, and ensure that the next time I do this, with some differences, I have built a robust capacity to do exactly that!
- Ok, almost there! The div is returned, with all the elements and then added onto the map (see code picture above).
- In the end, we need to set styles for our legend entries. So, we go back to the head and style section of our HTML file, and put in the following styles: we will step through these in class one by one.
Head off to the HTML file link, and open in Chrome to see the full result! Congratulations on generating your first interactive map!
Sarkar, S. and Hussein, D.A., 2017, D3 Tutorials for Information Visualisation Design Studio, University of Sydney.
Email: firstname.lastname@example.org, email@example.com