Simple GIS Data Visualisation in R

Juan Pablo Herrera

2018-08-19

Introduction

Working with spatial data and particulary visualising it can be very useful for a data scientist since it allows you to tell a more compelling story about the problem that you are working on. There are multiple different packages in R that allow you to not only visualise data in a geographical context, but also, to create and manipulate spatial data. In this guide we will focus on the visualisation part, particularly working with shape files and and georeferenced dots.

Setup

Before we start please make sure that all the packages are installed and loaded in R.

Required packages

# Installs the required packages
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("rgdal")
#install.packages("tmap")
#install.packages("ggmap")

# Loads the required required packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(rgdal)
#> Loading required package: sp
#> rgdal: version: 1.3-4, (SVN revision 766)
#>  Geospatial Data Abstraction Library extensions to R successfully loaded
#>  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
#>  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
#>  GDAL binary built with GEOS: FALSE 
#>  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
#>  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
#>  Linking to sp version: 1.3-1
library(tmap)
library(ggmap)

Required files

The required files for this particular guide can be found in data.gov.au, and data.nsw.gov.au. Specifically for this guide we will be using a Shape file containing the administrative boundaries for the NSW Local Government Areas and a data file containing incidents reported between January 2013 and March 2016, where the incident occurred at an outdoor or public place within the Sydney Local Government Area.

# Downloads and unzips the folder containing the shape file with the polygons for the NSW LGAs
download.file("https://data.gov.au/dataset/f6a00643-1842-48cd-9c2f-df23a3a1dc1e/resource/696e169d-5749-4a15-a0a9-79e637e7c391/download/nswlgapolygonshp.zip", destfile="NSWLGA.zip")
unzip("NSWLGA.zip")

# Loads the csv file containing the locations for the outdoor crimes in the sydney area
crime <- read.csv("https://data.nsw.gov.au/data/dataset/608147c1-354e-473f-a35c-8e05b8ed6f84/resource/8fa12ba1-bfba-42d5-bfc7-037604b75b7c/download/gstrategic-policyinformationdata-nsw2016-dataset-uploadsboscarselected-outdoor-crimes-in-sydney-lga-")
head(crime)
#>   FID OBJECTID bcsrgrp                               bcsrcat lganame
#> 1   0        1 Assault Non-domestic violence related assault  Sydney
#> 2   1        2 Assault Non-domestic violence related assault  Sydney
#> 3   2        3 Assault Non-domestic violence related assault  Sydney
#> 4   3        5 Assault Non-domestic violence related assault  Sydney
#> 5   4        6 Assault Non-domestic violence related assault  Sydney
#> 6   5        9 Assault Non-domestic violence related assault  Sydney
#>         locsurb             locprmc1 locpcode bcsrgclat bcsrgclng
#> 1       REDFERN OUTDOOR/PUBLIC PLACE     2016 -33.89239  151.2148
#> 2        SYDNEY OUTDOOR/PUBLIC PLACE     2000 -33.86770  151.2098
#> 3 WOOLLOOMOOLOO OUTDOOR/PUBLIC PLACE     2011 -33.87267  151.2191
#> 4 WOOLLOOMOOLOO OUTDOOR/PUBLIC PLACE     2011 -33.87026  151.2202
#> 5   SURRY HILLS OUTDOOR/PUBLIC PLACE     2010 -33.88007  151.2150
#> 6     HAYMARKET OUTDOOR/PUBLIC PLACE     2000 -33.88243  151.2067
#>   bcsrgccde incyear incmonth  incday incsttm eventyr eventmth poisex
#> 1 Intersect    2012   August  Monday   16:00    2013 February       
#> 2 Intersect    2012  October Tuesday   18:00    2013 February       
#> 3   Address    2013  January Tuesday    1:30    2013  January       
#> 4 Intersect    2013  January Tuesday    3:00    2013  January       
#> 5 Intersect    2013  January Tuesday   12:51    2013  January      M
#> 6  Landmark    2013  January Tuesday    2:00    2013  January       
#>    poi_age uniqueID
#> 1  0.00000 50658277
#> 2  0.00000 53061821
#> 3  0.00000 50001248
#> 4  0.00000 49962948
#> 5 50.33196 49970181
#> 6  0.00000 49591182

Working With Shape Files in R

Loading the Shape file in R

To load the previously downloaded Shape file containing the polygons for the NSW LGAs we will use the function readOGR() that is part of the package rgdal.

# Reads the Shape file and loads it into R
NSWLGA <- readOGR(dsn = "NSW_LGA_POLYGON_shp", layer = "NSW_LGA_POLYGON_shp")
#> OGR data source with driver: ESRI Shapefile 
#> Source: "/Users/juanpablo/UTS/STDS/AT1/NSW_LGA_POLYGON_shp", layer: "NSW_LGA_POLYGON_shp"
#> with 197 features
#> It has 10 fields
NSWLGA_Df <- as.data.frame(NSWLGA@data)
head(NSWLGA_Df)
#>   LG_PLY_PID  DT_CREATE DT_RETIRE LGA_PID NSW_LGA_sh NSW_LGA__1
#> 0        649 2011/11/29      <NA>  NSW153 2011/11/29       <NA>
#> 1        650 2011/11/29      <NA>  NSW153 2011/11/29       <NA>
#> 2        651 2011/11/29      <NA>  NSW153 2011/11/29       <NA>
#> 3        652 2011/11/29      <NA>  NSW153 2011/11/29       <NA>
#> 4        653 2011/11/29      <NA>  NSW153 2011/11/29       <NA>
#> 5        654 2011/11/29      <NA>  NSW153 2011/11/29       <NA>
#>       NSW_LGA__2     NSW_LGA__3 NSW_LGA__4 NSW_LGA__5
#> 0 UNINCORPORATED UNINCORPORATED       <NA>          1
#> 1 UNINCORPORATED UNINCORPORATED       <NA>          1
#> 2 UNINCORPORATED UNINCORPORATED       <NA>          1
#> 3 UNINCORPORATED UNINCORPORATED       <NA>          1
#> 4 UNINCORPORATED UNINCORPORATED       <NA>          1
#> 5 UNINCORPORATED UNINCORPORATED       <NA>          1

Simple Shape file visualisation

Now that we have our Shape file loaded in R we can start working with it. First, I would like to plot a very simple map of the NSW LGAs using the function plot().

# Plots a simple outline of the map 
## Plot already recognises the object as a map after having installed the rgdal package
plot(NSWLGA) 

Notice that the map plotted is very simple and doesn’t provide any insight or helps us to tell any data story for NSW.

Visualising especific polygons

The next step is to visualise specific LGAs of NSW. This may be useful in the case you want to put special emphasis in one location or area of NSW, for this particular case I am interested in plotting the polygons that correspond to the LGAs in the Sydney metropolitan area.

# Create a logical object with TRUE values for the LGAs in the Sydney metropolitan area 
sydmetro <- NSWLGA$LGA_PID %in% c("NSW335", "NSW283", "NSW268", "NSW264", "NSW272", "NSW282", "NSW332", 
                                "NSW315", "NSW295", "NSW325", "NSW270", "NSW234", "NSW229", "NSW331", 
                                "NSW258", "NSW117", "NSW275", "NSW232", "NSW329", "NSW213", "NSW314", 
                                "NSW217", "NSW216", "NSW181", "NSW196", "NSW298", "NSW200", "NSW175", 
                                "NSW294", "NSW180") 

# Plots a map of the LGAs that are part of the Sydney Metropolitan Area
plot(NSWLGA[sydmetro, ], col = "deepskyblue2", main = "Sydney Metro", sub = "GRS 80")

Lets add some color

Now that we know how to plot shape files and know how to focus on spcific polygons, we can then fill the polygons with different colors to represent for example the quantiles of a specific distribution. In this particular case we will be using the function qtm() that is included in the package tmap to plot a map filled with the quantiles of a normal distribution.

# First we create a new column in our NSWLGA data frame containing values from a normal distribution
set.seed(574)
NSWLGA_Df$Rnorm <- rnorm(nrow(NSWLGA_Df))

# Then we add the new column to the Spatial Polygon Data Frame using the function left_join() from the package dplyr.
NSWLGA@data <- left_join(NSWLGA@data, NSWLGA_Df, by = "LG_PLY_PID")

# Finally we plot the map
qtm(NSWLGA[sydmetro,], "Rnorm", title = "Sydney Metro", sub ="GRS 80")
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
#> Variable "Rnorm" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Visualising Dot Distribution

In this part of the guide we will be focusing in the visual representation of points containing the coordinates for the crimes commited outdoors in the Sydney area.

Dot distribution using plot()

Just as before, first we will first start by plotting a simple representation of our data using the function plot without adding any other feature to our map.

plot(x = crime$bcsrgclng, y = crime$bcsrgclat, col = "coral1", main = "Sydney Outdoor Crimes")

Dot distribution using ggplot()

As you noticed in the map plotted before the inference that we can make using it is not much. Fortunately, using the function ggplot from the package ggplot2 we can add more features, filter the data that we want to represent and more.

crime %>% 
  filter(incyear == 2015) %>%
  ggplot() +
  geom_point(aes(x = bcsrgclng, y = bcsrgclat), color = "purple4", alpha=.03, size=1.1) +
  labs(title = "Outdoor Crimes in Sydney LGA", x = "Longitude", y = "Latitude")

Looks better right? What if now we want to make a different map for each crime group? Using ggplot we can easily do it!!

crime %>% 
  filter(incyear == 2015) %>%
  ggplot() +
  geom_point(aes(x = bcsrgclng, y = bcsrgclat), color = "red", alpha  = .09, size=1.1) +
  facet_wrap(~bcsrgrp, ncol = 2) +
  labs(title = "Outdoor Crimes in Sydney LGA \n by group", x = "Longitude", y = "Latitude")

Adding background for better visualisation

Finally, in order to really have an idea of where in Sydney the crimes are happening more frecuently we need to ad some sort of backgroung or reference, this also helps with the aestethic representation and gives our maps a better look. To do that, we will use the qmap() function that is part of the package ggmap and then we combine with the dots.

# First we need to get the map.
sydney_map <- qmap(location = "Sydney, Australia", zoom = 13, source = "google", type = "Hybrid")
#> Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Sydney,+Australia&zoom=13&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Sydney,%20Australia&sensor=false
#> Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
#> instead
sydney_map
#> Theme element panel.border missing
#> Theme element axis.line.x.bottom missing
#> Theme element axis.ticks.x.bottom missing
#> Theme element axis.line.x.top missing
#> Theme element axis.ticks.x.top missing
#> Theme element axis.line.y.left missing
#> Theme element axis.ticks.y.left missing
#> Theme element axis.line.y.right missing
#> Theme element axis.ticks.y.right missing
#> Theme element plot.title missing
#> Theme element plot.subtitle missing
#> Theme element plot.tag missing
#> Theme element plot.caption missing


# Then we add our point to the map, notice that the sintax is the same as the one used in ggplot.
sydney_map + 
  geom_point(data = crime, aes(x = bcsrgclng, y = bcsrgclat), color = "purple4", alpha=.03, size=1.1) +
  ggtitle("Sydney Outdoor Crime") +
  theme(plot.title = element_text(size = 36))
#> Warning: Removed 166 rows containing missing values (geom_point).
#> Theme element panel.border missing
#> Theme element axis.line.x.bottom missing
#> Theme element axis.ticks.x.bottom missing
#> Theme element axis.line.x.top missing
#> Theme element axis.ticks.x.top missing
#> Theme element axis.line.y.left missing
#> Theme element axis.ticks.y.left missing
#> Theme element axis.line.y.right missing
#> Theme element axis.ticks.y.right missing
#> Theme element plot.subtitle missing
#> Theme element plot.tag missing
#> Theme element plot.caption missing

Final Notes

We have learnt how to plot simple maps to visualise geographic information easily, and now we are able to tell a more interesting story about our data, but it is worth mentioning that the packages used to create this guide are very powerful and offer a lot more funtionality than the ones used here. For example rgdal offers the possibility of estimating areas, merging Shape files, do coordinates projections and so on. With ggplot2 we can plot Shape files too and use all the aestechics that the package has to offer, but it wasn’t included in this guide because the qtm() offered us a more direct and quicker approach.

References

Deparment of Industry, Innovation and Science. (2014, September 9). NSW Local Government Areas - PSMA Administrative Boundaries. Retrieved from https://data.gov.au/dataset/nsw-local-government-areas

Dervieux, C. (2017, July 4) plotting polygon shapefiles. Retreived from https://github.com/tidyverse/ggplot2/wiki/plotting-polygon-shapefiles

Lovelace, R. Introductory tutorial on graphical display of geographical information in R. Retreived from https://github.com/Robinlovelace/Creating-maps-in-R

NSW Bureau of Crime and Statistics and Research. (2017, July 27). Coordinate level data for non-domestic assaults and robberies occurring in Sydney LGA in outdoor and public places. Retreived from https://data.nsw.gov.au/data/dataset/non-domestic-assaults-sydney-lga

Prabhakaran, S. Top 50 ggplot2 Visualizations - The Master List (With Full R Code). Retreived from http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

Sharp Sight. (2015, January 6). Mapping Seattle Crime, Retreived from https://www.sharpsightlabs.com/blog/mapping-seattle-crime/