The Beef Bowl Map — Part I: GIS Matters
A step-by-step tutorial on mapping beef bowl chain stores in Japan from scratch with Python and GIS
Ever heard of beef bowls (Gyudon, 牛丼)? They are bowls of rice covered with beef (and other ingredients if you want). In Japan, you could easily find stores selling this comfort food which belongs to these three major chains — Yoshinoya (吉野家), Matsuya (松屋) and Sukiya (すき家).
As a geographer, one thing you might immediately want to ask is “How are they distributed among the territory of Japan?” Here’s a map showing the locations of three major chains the Kanto and Kansai area.
This does not show much information except seeing tons of points located in the metropolitan area. The total number of stores of Sukiya, Yoshinoya and Matsuya in Japan are about 1,930, 1,210 and 940 respectively. Since the majority of the stores are packed inside the metropolitan area, we could see nothing but points squeezing around Tokyo and Osaka. So here is another version of the map, indicating which beef bowl chain is the major chain* in every 2km hexagon, with the size of the points proportional to the total number of beef bowl stores in the hexagon…
* I will define this term later in this article
In case you are only interested in one single component, the following two maps break the two components down. This one displays the points proportional to the total number of stores…
And this one differentiates the points by colour only, which indicates the major chain.
How are these maps made?
Long story short, three parts:
Part 0: Scrap the data from the companies’ website
Part I: Some basic GIS analysis
Part II: Present the information in a (so-called) visually appealing manner
Note: I assume you have some elementary knowledge and experience of GIS. And I am more than astonished if you do not have experiences in GIS and still read this article (Let me know why, please).
Chronologically, Part 0 goes first. Yet, I prefer to put the emphasis on GIS and cartography. Part 0 will be an appendix after I wrote Part I and Part II — to be honest, I believe most geographers are interested in making maps rather than looking into the computer code. What’s more, who will waste time reading the code when the dataset is provided for you — Yep, I put the dataset online. Go to the GitHub link below and download the rawdata.csv files as you wish to use it for anything you want.
OK. Time to start mapping.
1) Add and play around the data
Let’s add of the XY data of the stores to take a glance at the distribution of these beef bowl chain stores. Add XY Data is the one you need in ArcMap / ArcGIS Pro. Choose matsuya_rawdata.csv as the input table, select lon as the X Field and lat as the Y Field, select WGS 1984 as the Coordinate System. Export the XY data into shapefile format named Matsuya. Repeat the above process for sukiya_rawdata.csv and yoshinoya_rawdata.csv.
I read John Nelson ’s blog (a cartographer in ESRI) quite often. And I dare say it is a pool of inspirations for me to brainstorm various cartography and GIS projects. Several tips are available like the one below…
Pro Tip: If you have two overlapping dense point layers they will inevitably overlap each other. But if you take the bottom layer, and paste a copy of it on top at 50% opacity, then you get a better, fairer, visualization of the mixing.
From Starbucks vs Dunkin’ Donuts Mega Map Blast
I followed the instruction and make a copy of the bottom layer. And here is the result:
It’s kind of fun to see how these stores are distributed across Japan. Yet, it seems we are not showing something meaningful but a population map (more chain stores implies more people living around there). It would be fun to map the sales per capita of each store, but we do not have data on the sales revenue of each store (lack of data is the second most frustrating issue after unknown runtime errors). So, maybe why don’t we aggregate the points within a boundary to see which chain has the largest number of stores in the area?
2) Aggregate the points into bins
Why aggregate points?
Presenting a large number of dots on the map is, in most cases, not a good idea. What you end up is a pile of small dots trying their best to distract the viewers from understanding the story you are mapping. We human eyes cannot process so much information within a short time. So, no, don’t present too much data — they drag your reader away from the information and story you would like to present (CS101: Difference between data and information).
Besides, to conduct some basic analysis, you need a “boundary” or “bin” to quantify those ratios and numbers related to area. The area could be district boundary, census boundary, municipal boundary. If you believe these wouldn’t perform a nice analysis, a “net” of bins could be used instead.
I choose hexagonal bins in making this map.
Why hexagonal bins?
Square bins (i.e. Fishnet grids) may be intuitively the first choice to choose bins — we all have experiences in handling those raster grids. Yet bins could be in many shapes, with square bin just being the one we are most familiar with. This time, I choose hexagonal bins in this project since it could be a much better bin compared to fishnet. You could read this blog to grasp an understanding of why it is a better choice.
Use the Generate Tessellation tool to create a tessellated hexagon feature class as the basis to bin up the stores. Choose extent would be the feature class of Japan Boundary (it’s available in ArcGIS Online) and a size of 2 square kilometers.
And… we got tons of hexagons floating on the Pacific Oceans, on the Mount Fuji, or somewhere in the middle of nowhere. That’s because the function will create polygons anywhere in the extent we defined, from the top-left to the bottom-right. Only around 2,000 hexagons got stores inside their boundary and there are about 686,000(!) hexagons generated from the tessellation process.
In order not to let the computer wasting time in finding points in those useless hexagons, we have to discard those redundant hexagons before summarising the points. Use Select By Location to select hexagons that will be used. For Relationship, choose Contains. Read this StackExchange question in case you do not know how to select by location when you have multiple source layers. Export the select hexagons to a new feature class named hex_bins (or something you like).
After that, use Summarise Within to count the number of stores in each hexagon. The Input Polygons would be the hex_bins (the hexagonal fishnet), while the Input Summary Features will be Matsuya (or the point layer name of Matsuya stores we created initially). Make sure Keep all input polygons is checked.
You will get a new hexagon net layer that looks about the same as our input, expect a new column named Count of Points would appear in the attribute table. This shows the number of Matsuya stores within each hexagon. Create an extra field with Field Name Matsuya and Data Type short. Copy the count number to this number using the Calculate Field function.
How about Sukiya and Yoshinoya? Repeat the above process for the point layers of Sukiya and Yoshinoya, using the hexagon fishnet produced in the previous step. For the last step, name the output feature class as hex_bins_MYS.
In the end, you will get a hexagon feature class with three columns of attributes indicating the number of stores of each brand.
3) Defining the “major” chains
We got the number of stores in every hexagon. Now it’s time to think about what information we would (and could) present. In this project, I want to find the dominant chain in the area. In other words, we want to let the readers know which chain has more stores than others in every bin.
However, how major should a chain store be to be major? What is the rule to win the battle in each hexagon? This time, I define the term major chain as the chain that has the largest number of stores in the area. Imagine there are 12 Matsuya, 2 Yoshinoya and 1 Sukiya in an area. In this case, Matsuya will be the major chain.
We do not consider the magnitude in terms of how many more stores does the major chain have than the others. 1 more? Major. 5 more? Major. 100 more? Still major. If there are 12 Matsuya, 8 Yoshinoya and 13 Sukiya, Sukiya would be the major chain, even though it has only one more store than Matsuya.
Some data may (and in most cases, will) loss when processing data into information. It is somehow unavoidable, your job is to tell the audiences how is your conversion process done. If you put the process in a black box, not only the readers will confuse how do you get the result, but also you, after a few months, will forget what the steps and process are.
The tricky part happens when more than one chain has the maximum number, making both of them the major. Say, we have 3 Matsuya, 3 Yoshinoya and 1 Sukiya. Who should win the major chain competition? Well… maybe both. Making both chains the winner seems to be the easiest way to solve this conflict. This definition also allows us to show the result graphically in a rather easy manner. If Matsuya is blue, Sukiya is Yellow, Yoshinoya is Red (It should be orange), for a bin where both Matsuya and Sukiya have the largest amount of stores, we could label the hexagon in green and this is intuitive.
4) Finding the major chains
Let’s start the fight by counting the total number of stores in the municipality. Just add a new column called Total and add the number from the three chains to get the total. Remember, Calculate Field is in most cases you need when editing the attribute table. Select Python 3 as the Expression Type. Apply the following equation to calculate the total number of chain stores in the hexagon:
!Matsuya! + !Yoshinoya! + !Sukiya!
Then, add a new field MainChain to store the brand name of the chain with the largest amount of stores. The tricky thing here is that we would like to return the name of the column, instead of the value of that column. This is a little bit overkill if you solely depend on Field Calculator. Also, MAX function in field calculator only returns one value, while there could have two or more major chains. We need to twist our workflow a little bit to get things done…
First of all, get the maximum number of stores in each row — add a column named Max_No for storing the largest number of stores. The max function is the one you need. Copy the formula below to the Field Calculator if needed:
max(!Matsuya!,!Sukiya!,!Yoshinoya!)
Then, we will use the following notations to indicate the major chain(s):
1. “M — ”: Matsuya
2. “-Y-”: Yoshinoya
3. “ — S”: Sukiya
4. “MY-”: Matsuya and Yoshinoya
5. “M-S”: Matsuya and Sukiya
6. “-YS”: Yoshinoya and Sukiya
7. “MYS”: All of them
We aim to loop for every column to see if the number of stores in that chain is the same as the maximum value we found earlier. If yes, we jot down the name. Think of the pseudocode below:
for rows in table:
for chain in row:
if row[chain] == row[max]:
Mainchain += first character of chain name
else:
Mainchain += '-'
Following is the python code. Paste the following code into the arcpy interpreter window, click enter, and let the program do the work. It’s not a nice-looking code as some of the values are hard-coded (like using row[3]
as a representation of “Max_No”
).
Paste the script above to the ArcPy window:
After running the script, we got a column with 7 possible values indicating the major chains.
5) Adjusting symbology
In the symbology panel, choose Unique Values under the form of Primary symbology and select MainChain as in the Field 1 form. Play with the colours until you are satisfied with the colour scheme. The count of each category is also available when you click on More on the top right corner of the class panel, then select Show Count.
Sukiya is, in most cases, the major chain in the area. Not a surprising result since we know Sukiya has nearly double amount of stores than Matsuya and Yoshinoya.
After adjusting the symbology, your interface will look something like this.
And zooming out to south Japan will be like this…
Pit Stop
Finally, we have finished the analysis part of this mini GIS project. In the next story, I will move on to the cartography techniques on how to finalise the map.
Let’s have a small recap of what we have done in this tutorial:
1. Import XY table into a point shapefile
2. Create a hexagon fishnet and use it to summarise the points
3. Find the number of stores for every chain in each bin
4. Create a column to identify the major chain(s)
The next tutorial will focus on cartography, on how to present the above analysis results in a graphically-satisfying manner (again, so-called).
Take my warmest thanks if you’ve read to the end. It is my first time a write a blog on either GIS, cartography or programming. Forgive my bad English (I know. That’s why I start blogging.) or anything unclear.
Feel free to take a glance at all the codes I wrote for this mini-project if you are interested. Code reviews are more than welcome and I treasure every opportunity for improving my code.