Working with spatial data and particulary visualising it can be very useful for a data scientist since it allows you to tell a more compelling story about the problem that you are working on. There are multiple different packages in R that allow you to not only visualise data in a geographical context, but also, to create and manipulate spatial data. In this guide we will focus on the visualisation part, particularly working with shape files and and georeferenced dots.
Before we start please make sure that all the packages are installed and loaded in R.
# Installs the required packages
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("rgdal")
#install.packages("tmap")
#install.packages("ggmap")
# Loads the required required packages
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
library(rgdal)
#> Loading required package: sp
#> rgdal: version: 1.3-4, (SVN revision 766)
#> Geospatial Data Abstraction Library extensions to R successfully loaded
#> Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
#> Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
#> GDAL binary built with GEOS: FALSE
#> Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
#> Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
#> Linking to sp version: 1.3-1
library(tmap)
library(ggmap)
The required files for this particular guide can be found in data.gov.au, and data.nsw.gov.au. Specifically for this guide we will be using a Shape file containing the administrative boundaries for the NSW Local Government Areas and a data file containing incidents reported between January 2013 and March 2016, where the incident occurred at an outdoor or public place within the Sydney Local Government Area.
# Downloads and unzips the folder containing the shape file with the polygons for the NSW LGAs
download.file("https://data.gov.au/dataset/f6a00643-1842-48cd-9c2f-df23a3a1dc1e/resource/696e169d-5749-4a15-a0a9-79e637e7c391/download/nswlgapolygonshp.zip", destfile="NSWLGA.zip")
unzip("NSWLGA.zip")
# Loads the csv file containing the locations for the outdoor crimes in the sydney area
crime <- read.csv("https://data.nsw.gov.au/data/dataset/608147c1-354e-473f-a35c-8e05b8ed6f84/resource/8fa12ba1-bfba-42d5-bfc7-037604b75b7c/download/gstrategic-policyinformationdata-nsw2016-dataset-uploadsboscarselected-outdoor-crimes-in-sydney-lga-")
head(crime)
#> FID OBJECTID bcsrgrp bcsrcat lganame
#> 1 0 1 Assault Non-domestic violence related assault Sydney
#> 2 1 2 Assault Non-domestic violence related assault Sydney
#> 3 2 3 Assault Non-domestic violence related assault Sydney
#> 4 3 5 Assault Non-domestic violence related assault Sydney
#> 5 4 6 Assault Non-domestic violence related assault Sydney
#> 6 5 9 Assault Non-domestic violence related assault Sydney
#> locsurb locprmc1 locpcode bcsrgclat bcsrgclng
#> 1 REDFERN OUTDOOR/PUBLIC PLACE 2016 -33.89239 151.2148
#> 2 SYDNEY OUTDOOR/PUBLIC PLACE 2000 -33.86770 151.2098
#> 3 WOOLLOOMOOLOO OUTDOOR/PUBLIC PLACE 2011 -33.87267 151.2191
#> 4 WOOLLOOMOOLOO OUTDOOR/PUBLIC PLACE 2011 -33.87026 151.2202
#> 5 SURRY HILLS OUTDOOR/PUBLIC PLACE 2010 -33.88007 151.2150
#> 6 HAYMARKET OUTDOOR/PUBLIC PLACE 2000 -33.88243 151.2067
#> bcsrgccde incyear incmonth incday incsttm eventyr eventmth poisex
#> 1 Intersect 2012 August Monday 16:00 2013 February
#> 2 Intersect 2012 October Tuesday 18:00 2013 February
#> 3 Address 2013 January Tuesday 1:30 2013 January
#> 4 Intersect 2013 January Tuesday 3:00 2013 January
#> 5 Intersect 2013 January Tuesday 12:51 2013 January M
#> 6 Landmark 2013 January Tuesday 2:00 2013 January
#> poi_age uniqueID
#> 1 0.00000 50658277
#> 2 0.00000 53061821
#> 3 0.00000 50001248
#> 4 0.00000 49962948
#> 5 50.33196 49970181
#> 6 0.00000 49591182
To load the previously downloaded Shape file containing the polygons for the NSW LGAs we will use the function readOGR()
that is part of the package rgdal.
# Reads the Shape file and loads it into R
NSWLGA <- readOGR(dsn = "NSW_LGA_POLYGON_shp", layer = "NSW_LGA_POLYGON_shp")
#> OGR data source with driver: ESRI Shapefile
#> Source: "/Users/juanpablo/UTS/STDS/AT1/NSW_LGA_POLYGON_shp", layer: "NSW_LGA_POLYGON_shp"
#> with 197 features
#> It has 10 fields
NSWLGA_Df <- as.data.frame(NSWLGA@data)
head(NSWLGA_Df)
#> LG_PLY_PID DT_CREATE DT_RETIRE LGA_PID NSW_LGA_sh NSW_LGA__1
#> 0 649 2011/11/29 <NA> NSW153 2011/11/29 <NA>
#> 1 650 2011/11/29 <NA> NSW153 2011/11/29 <NA>
#> 2 651 2011/11/29 <NA> NSW153 2011/11/29 <NA>
#> 3 652 2011/11/29 <NA> NSW153 2011/11/29 <NA>
#> 4 653 2011/11/29 <NA> NSW153 2011/11/29 <NA>
#> 5 654 2011/11/29 <NA> NSW153 2011/11/29 <NA>
#> NSW_LGA__2 NSW_LGA__3 NSW_LGA__4 NSW_LGA__5
#> 0 UNINCORPORATED UNINCORPORATED <NA> 1
#> 1 UNINCORPORATED UNINCORPORATED <NA> 1
#> 2 UNINCORPORATED UNINCORPORATED <NA> 1
#> 3 UNINCORPORATED UNINCORPORATED <NA> 1
#> 4 UNINCORPORATED UNINCORPORATED <NA> 1
#> 5 UNINCORPORATED UNINCORPORATED <NA> 1
Now that we have our Shape file loaded in R we can start working with it. First, I would like to plot a very simple map of the NSW LGAs using the function plot()
.
# Plots a simple outline of the map
## Plot already recognises the object as a map after having installed the rgdal package
plot(NSWLGA)
Notice that the map plotted is very simple and doesn’t provide any insight or helps us to tell any data story for NSW.
The next step is to visualise specific LGAs of NSW. This may be useful in the case you want to put special emphasis in one location or area of NSW, for this particular case I am interested in plotting the polygons that correspond to the LGAs in the Sydney metropolitan area.
# Create a logical object with TRUE values for the LGAs in the Sydney metropolitan area
sydmetro <- NSWLGA$LGA_PID %in% c("NSW335", "NSW283", "NSW268", "NSW264", "NSW272", "NSW282", "NSW332",
"NSW315", "NSW295", "NSW325", "NSW270", "NSW234", "NSW229", "NSW331",
"NSW258", "NSW117", "NSW275", "NSW232", "NSW329", "NSW213", "NSW314",
"NSW217", "NSW216", "NSW181", "NSW196", "NSW298", "NSW200", "NSW175",
"NSW294", "NSW180")
# Plots a map of the LGAs that are part of the Sydney Metropolitan Area
plot(NSWLGA[sydmetro, ], col = "deepskyblue2", main = "Sydney Metro", sub = "GRS 80")
Now that we know how to plot shape files and know how to focus on spcific polygons, we can then fill the polygons with different colors to represent for example the quantiles of a specific distribution. In this particular case we will be using the function qtm()
that is included in the package tmap to plot a map filled with the quantiles of a normal distribution.
# First we create a new column in our NSWLGA data frame containing values from a normal distribution
set.seed(574)
NSWLGA_Df$Rnorm <- rnorm(nrow(NSWLGA_Df))
# Then we add the new column to the Spatial Polygon Data Frame using the function left_join() from the package dplyr.
NSWLGA@data <- left_join(NSWLGA@data, NSWLGA_Df, by = "LG_PLY_PID")
# Finally we plot the map
qtm(NSWLGA[sydmetro,], "Rnorm", title = "Sydney Metro", sub ="GRS 80")
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
#> Variable "Rnorm" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
In this part of the guide we will be focusing in the visual representation of points containing the coordinates for the crimes commited outdoors in the Sydney area.
Just as before, first we will first start by plotting a simple representation of our data using the function plot without adding any other feature to our map.
plot(x = crime$bcsrgclng, y = crime$bcsrgclat, col = "coral1", main = "Sydney Outdoor Crimes")
As you noticed in the map plotted before the inference that we can make using it is not much. Fortunately, using the function ggplot
from the package ggplot2 we can add more features, filter the data that we want to represent and more.
crime %>%
filter(incyear == 2015) %>%
ggplot() +
geom_point(aes(x = bcsrgclng, y = bcsrgclat), color = "purple4", alpha=.03, size=1.1) +
labs(title = "Outdoor Crimes in Sydney LGA", x = "Longitude", y = "Latitude")
Looks better right? What if now we want to make a different map for each crime group? Using ggplot we can easily do it!!
crime %>%
filter(incyear == 2015) %>%
ggplot() +
geom_point(aes(x = bcsrgclng, y = bcsrgclat), color = "red", alpha = .09, size=1.1) +
facet_wrap(~bcsrgrp, ncol = 2) +
labs(title = "Outdoor Crimes in Sydney LGA \n by group", x = "Longitude", y = "Latitude")
Finally, in order to really have an idea of where in Sydney the crimes are happening more frecuently we need to ad some sort of backgroung or reference, this also helps with the aestethic representation and gives our maps a better look. To do that, we will use the qmap()
function that is part of the package ggmap and then we combine with the dots.
# First we need to get the map.
sydney_map <- qmap(location = "Sydney, Australia", zoom = 13, source = "google", type = "Hybrid")
#> Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Sydney,+Australia&zoom=13&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
#> Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Sydney,%20Australia&sensor=false
#> Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
#> instead
sydney_map
#> Theme element panel.border missing
#> Theme element axis.line.x.bottom missing
#> Theme element axis.ticks.x.bottom missing
#> Theme element axis.line.x.top missing
#> Theme element axis.ticks.x.top missing
#> Theme element axis.line.y.left missing
#> Theme element axis.ticks.y.left missing
#> Theme element axis.line.y.right missing
#> Theme element axis.ticks.y.right missing
#> Theme element plot.title missing
#> Theme element plot.subtitle missing
#> Theme element plot.tag missing
#> Theme element plot.caption missing
# Then we add our point to the map, notice that the sintax is the same as the one used in ggplot.
sydney_map +
geom_point(data = crime, aes(x = bcsrgclng, y = bcsrgclat), color = "purple4", alpha=.03, size=1.1) +
ggtitle("Sydney Outdoor Crime") +
theme(plot.title = element_text(size = 36))
#> Warning: Removed 166 rows containing missing values (geom_point).
#> Theme element panel.border missing
#> Theme element axis.line.x.bottom missing
#> Theme element axis.ticks.x.bottom missing
#> Theme element axis.line.x.top missing
#> Theme element axis.ticks.x.top missing
#> Theme element axis.line.y.left missing
#> Theme element axis.ticks.y.left missing
#> Theme element axis.line.y.right missing
#> Theme element axis.ticks.y.right missing
#> Theme element plot.subtitle missing
#> Theme element plot.tag missing
#> Theme element plot.caption missing
We have learnt how to plot simple maps to visualise geographic information easily, and now we are able to tell a more interesting story about our data, but it is worth mentioning that the packages used to create this guide are very powerful and offer a lot more funtionality than the ones used here. For example rgdal offers the possibility of estimating areas, merging Shape files, do coordinates projections and so on. With ggplot2 we can plot Shape files too and use all the aestechics that the package has to offer, but it wasn’t included in this guide because the qtm()
offered us a more direct and quicker approach.
Deparment of Industry, Innovation and Science. (2014, September 9). NSW Local Government Areas - PSMA Administrative Boundaries. Retrieved from https://data.gov.au/dataset/nsw-local-government-areas
Dervieux, C. (2017, July 4) plotting polygon shapefiles. Retreived from https://github.com/tidyverse/ggplot2/wiki/plotting-polygon-shapefiles
Lovelace, R. Introductory tutorial on graphical display of geographical information in R. Retreived from https://github.com/Robinlovelace/Creating-maps-in-R
NSW Bureau of Crime and Statistics and Research. (2017, July 27). Coordinate level data for non-domestic assaults and robberies occurring in Sydney LGA in outdoor and public places. Retreived from https://data.nsw.gov.au/data/dataset/non-domestic-assaults-sydney-lga
Prabhakaran, S. Top 50 ggplot2 Visualizations - The Master List (With Full R Code). Retreived from http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
Sharp Sight. (2015, January 6). Mapping Seattle Crime, Retreived from https://www.sharpsightlabs.com/blog/mapping-seattle-crime/