Approaching geovisualization and remote sensing with GeoViews

Basel

Link: http://bit.ly/geopython-basel

09/05/2018 @ GeoPython

Giacomo Debidda

What is GeoViews?

A Python library that makes it easy to explore and visualize geographical, meteorological, oceanographic datasets.

  • GIS extension for Holoviews
  • Declarative API: annotate your data and let it visualize itself
  • Uses Cartopy for geographic projections
  • Uses Shapely for geometries
  • Allows to create plots from multidimesional dataset (gridded dataset)
  • Makes it easy to overlay layers in a visualization
  • Leverages many Python libraries: Pandas, GeoPandas, Xarray, Datashader, Matplotlib/Bokeh

GeoViews objects

GeoViews objects are just like HoloViews objects...

...with an associated geographic projection based on a Coordinate Reference System defined in cartopy.crs.

GeoViews provides the Feature (cartopy features) and Shape (shapely geometries) types.

Imports

We need shapely for geometries, cartopy and pyproj for projections, geopandas for spatial joins.

In [1]:
import os
import pyproj
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import holoviews as hv
import geoviews as gv
import geoviews.feature as gf
import shapely as shp
import cartopy.io.shapereader as shpreader
import cartopy.crs as ccrs  # Cartopy coordinate reference system
import cartopy.feature as cf
import matplotlib.pyplot as plt
from bokeh.models import WMTSTileSource
from bokeh.tile_providers import STAMEN_TERRAIN, STAMEN_TONER
# import bokeh.sampledata
# bokeh.sampledata.download()  # this will download to /home/jack/.bokeh/data/
from bokeh.sampledata.airport_routes import airports, routes
In [2]:
data_dir = os.path.join(os.getcwd(), 'data')
print(data_dir)
/home/jack/Repos/geopython-basel/data

Choose a plotting backend

GeoViews (and HoloViews) can use either a Matplotlib backend or a Bokeh backend.

Matplotlib
+ supports more projections
- less interactivity
- at the moment, it seems not possible to use Web Map Tile Services (WMTS)

Bokeh
+ better interactivity
+ can use Web Map Tile Services
- supports only the Mercator projection

Note: visual attributes have (slightly) different names in the plotting backends (e.g. facecolor in Matplotlib VS fill_color in Bokeh).

See here to know the projections supported by GeoViews.

In [3]:
hv.notebook_extension('bokeh', 'matplotlib')
%output backend='matplotlib'

opts (OptsMagic)

You can use the %%opts cell magic command to set some options (e.g. change the geographical projection).

opts is not a built-in magic command, it's provided by HoloViews.

See OptsMagic class in holoviews.ipython.magics.py and OptsSpec in holoviews.util.parser or call %%opts? in a notebook cell.

See how to customize plots in the HoloViews documentation.

In [4]:
%%output size=400
%%opts Feature [projection=ccrs.PlateCarree()]
gf.coastline
Out[4]:

Combining geographical features

You can create HoloViews layouts by combining and overlaying several GeoViews Feature elements.

You can use hv.Overlay or the * operator to combine several GeoViews features into a single HoloViews layout.

In [5]:
%%output size=200
%%opts Feature [projection=ccrs.Geostationary()]
hv.Overlay([gf.coastline, gf.borders]) + gf.ocean * gf.rivers * gf.lakes
Out[5]:

GeoViews Feature

A GeoViews Feature is a Cartopy feature with additional properties and methods.

In fact, gf.ocean is just a shortcut for gv.Feature(cf.OCEAN, group='Ocean').

See the docs for the Cartopy Feature interface.

Here are the features currently available in geoviews.feature.py:

borders   = Feature(cf.BORDERS, group='Borders')
coastline = Feature(cf.COASTLINE, group='Coastline')
land      = Feature(cf.LAND, group='Land')
lakes     = Feature(cf.LAKES, group='Lakes')
ocean     = Feature(cf.OCEAN, group='Ocean')
rivers    = Feature(cf.RIVERS, group='Rivers')

Create a GeoViews feature from Natural Earth

cartopy provides an interface to Natural Earth shapefiles.

Natural Earth $\rightarrow$ Cartopy $\rightarrow$ GeoViews

Let's create a feature for the graticules.

First, create a Cartopy feature.

scale is the dataset scale, i.e. one of '10m', '50m', or '110m'.

Corresponding to 1:10,000,000, 1:50,000,000, and 1:110,000,000 respectively.

In [6]:
cartopy_graticules = cf.NaturalEarthFeature(category='physical',
                                            name='graticules_30',
                                            scale='110m')
cartopy_graticules
Out[6]:
<cartopy.feature.NaturalEarthFeature at 0x7fec6c0ed6d8>

Then, use the Cartopy feature to create a GeoViews feature.

In [7]:
%%output size=200
%%opts Feature.Lines (facecolor='none' edgecolor='gray') [projection=ccrs.Robinson()]
graticules = gv.Feature(cartopy_graticules, group='Lines')
graticules
Out[7]:

Create a GeoViews Feature from any shapefile

Cartopy shapereader provides an interface for accessing the contents of a shapefile.

Shapefile $\rightarrow$ Cartopy $\rightarrow$ GeoViews

Shapefile: American Indian/Alaska Native Areas/Hawaiian Home Lands from United Stated Census Bureau.

In [8]:
# Note: this file takes ~10s to be processed
shp_filepath = os.path.join(data_dir, 'tl_2017_us_aiannh', 'tl_2017_us_aiannh.shp')
reader = shpreader.Reader(shp_filepath)
# Note: you need to know the cartopy CRS in which the provided geometries are defined.
cartopy_homelands = cf.ShapelyFeature(geometries=reader.geometries(), crs=ccrs.PlateCarree())
cartopy_homelands
Out[8]:
<cartopy.feature.ShapelyFeature at 0x7fec6bfbc5f8>
In [9]:
%%time
%%output size=400
%%opts Feature.Lines (facecolor='none' edgecolor='black') [projection=ccrs.PlateCarree()]
homelands = gv.Feature(cartopy_homelands)
homelands_style = {'facecolor': 'red', 'alpha': 1.0}
homelands.opts(style=homelands_style)
Out[9]:
CPU times: user 10.6 s, sys: 24 ms, total: 10.6 s
Wall time: 10.6 s

You can also avoid using Cartopy and read a shapefile with GeoPandas, or even directly with gv.Shape (a wrapper for Shapely's Shape).

Shapefile $\rightarrow$ GeoViews

Note: thanks to Shapely you don't need a GeoJSON (or TopoJSON) to build a geometry. You are using binary buffers to read the data.

In [10]:
%%time
ndoverlay = gv.Shape.from_shapefile(shp_filepath)
CPU times: user 8.49 s, sys: 56 ms, total: 8.54 s
Wall time: 8.55 s
In [11]:
%%time
%%output size=200
%%opts Shape (facecolor='red' alpha=1.0)
ndoverlay.relabel('American Indian/Alaska Native Areas/Hawaiian Home Lands')
Out[11]:
CPU times: user 27 s, sys: 96 ms, total: 27.1 s
Wall time: 27.1 s
In [12]:
%%time
%%output size=300
%%opts Feature [projection=ccrs.Robinson()]
layout = hv.Overlay([gf.ocean, gf.land, gf.coastline, graticules, homelands.opts(style=homelands_style)])
layout
Out[12]:
CPU times: user 36.5 s, sys: 32 ms, total: 36.5 s
Wall time: 36.5 s

All flights from Honolulu

Data:

airports and routes from bokeh.sampledata.airport_routes.

Code not showns in slides. See jupyter notebook.

In [15]:
src = airports[airports.City == 'Honolulu'].iloc[0]
src
Out[15]:
AirportID                              3728
Name         Honolulu International Airport
City                               Honolulu
Country                       United States
IATA                                    HNL
ICAO                                   PHNL
Latitude                            21.3187
Longitude                          -157.922
Altitude                                 13
Timezone                                -10
DST                                       N
TZ                         Pacific/Honolulu
Type                                airport
source                          OurAirports
Name: 311, dtype: object
In [19]:
geo_projection = ccrs.Geostationary(central_longitude=src.Longitude)
In [20]:
%%output size=400
%%opts Feature [projection=geo_projection]
%%opts Points (color='red' size=3) [projection=geo_projection]
%%opts Path (color='steelblue') [projection=geo_projection]
points = dataset.to(gv.Points, kdims=['Longitude', 'Latitude'], vdims=['Name'])
(gf.coastline * points * gv.Path(paths)).redim.range(
    Longitude=(src.Longitude-5, -50), Latitude=(src.Latitude-5, 80))
Out[20]:

Gridded Datasets

Data:

netCDF file (Air quality) from NOAA (National Oceanic and Atmospheric Administration).

In [21]:
netcdf_filepath = os.path.join(data_dir, 'air.mon.mean.nc')
ds_air = xr.open_dataset(netcdf_filepath)
ds_air
Out[21]:
<xarray.Dataset>
Dimensions:  (lat: 73, lon: 144, time: 840)
Coordinates:
  * lat      (lat) float32 90.0 87.5 85.0 82.5 80.0 77.5 75.0 72.5 70.0 67.5 ...
  * lon      (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 ...
  * time     (time) datetime64[ns] 1948-01-01 1948-02-01 1948-03-01 ...
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    description:    Data from NCEP initialized reanalysis (4x/day).  These ar...
    platform:       Model
    Conventions:    COARDS
    NCO:            20121012
    history:        Thu May  4 20:11:16 2000: ncrcat -d time,0,623 /Datasets/...
    title:          monthly mean air.sig995 from the NCEP Reanalysis
    References:     http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reana...
    dataset_title:  NCEP-NCAR Reanalysis 1
In [22]:
dataset = gv.Dataset(ds_air, label=ds_air.attrs['title'])
type(dataset)
Out[22]:
geoviews.element.geo.Dataset
In [23]:
img_air = dataset.to(gv.Image, kdims=['lon', 'lat'], dynamic=True)
type(img_air)
Out[23]:
holoviews.core.spaces.DynamicMap
In [24]:
%%time
%%opts Image [colorbar=True fig_size=400 xaxis=None yaxis=None] (cmap='viridis')
%%opts Feature [projection=ccrs.Robinson()]
dynamic_map = img_air * gf.borders * gf.coastline
dynamic_map
Out[24]:
CPU times: user 5.66 s, sys: 8 ms, total: 5.66 s
Wall time: 5.79 s

Web Map Tile Services (WMTS)

In order to use a map tiles you need to switch to a bokeh plotting backend.

In [25]:
tiles = {
    'OpenMap': WMTSTileSource(url='http://c.tile.openstreetmap.org/{Z}/{X}/{Y}.png'),
    'ESRI': WMTSTileSource(url='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}.jpg'),
    'Wikipedia': WMTSTileSource(url='https://maps.wikimedia.org/osm-intl/{Z}/{X}/{Y}@2x.png'),
    # 'STAMEN_TONER': STAMEN_TONER,
    'STAMEN_TERRAIN': STAMEN_TERRAIN
}

Note: at the moment, bokeh supports only the web Mercator projection. See here.

In [26]:
hv.notebook_extension('bokeh')
In [27]:
%%opts WMTS [width=450 height=250 xaxis=None yaxis=None]
# click on the "Wheel Zoom" tool and zoom out
hv.NdLayout(
    {name: gv.WMTS(wmts, extents=(0, -90, 360, 90), crs=ccrs.PlateCarree()) 
     for name, wmts in tiles.items()}, kdims=['Source']
).cols(2)
Out[27]:

Basel's City Districts, Parks and Cafes

Data:

Basel districts shapefile from OpenStreetMap (raw data, not the .osm file).

Basel parks and cafes shapefile from BBBike.

In [28]:
basel_somerc = gpd.read_file(os.path.join(data_dir, 'WE_StatWohneinteilungen', 'Wohnviertel.shp'))
print(basel_somerc.crs)
basel_somerc.head()
INFO:Fiona:Failed to auto identify EPSG: 7
{'proj': 'somerc', 'lat_0': 46.95240555555556, 'lon_0': 7.439583333333333, 'k_0': 1, 'x_0': 2600000, 'y_0': 1200000, 'ellps': 'bessel', 'units': 'm', 'no_defs': True}
Out[28]:
OBJID OBJECTID TXT ZTXT TYPE geometry
0 17136 1 7 07 Bruderholz POLYGON ((2612555.909 1264547.542, 2612560.552...
1 17139 2 6 06 Gundeldingen POLYGON ((2610887.105 1266551.468, 2610896.202...
2 17142 3 5 05 St. Alban POLYGON ((2612941.971 1267023.045, 2613000.085...
3 17145 4 4 04 Breite POLYGON ((2613683.78 1266891.09, 2613686.423 1...
4 17148 5 8 08 Bachletten POLYGON ((2610560.968 1266791.31, 2610571.054 ...

Convert 21781 (CH1903) to 4326 (WGS84)

You can use pyproj.Proj and reproject manually...

In [29]:
# https://gis.stackexchange.com/a/277488/119309
input_projection = pyproj.Proj(init='EPSG:21781')
output_projection = pyproj.Proj(init='EPSG:4326')
pyproj.transform(input_projection, output_projection, x=basel_somerc.crs['x_0'], y=basel_somerc.crs['y_0'])
Out[29]:
(37.37899200094729, 52.46327489367161)
In [30]:
polygons_somerc = [basel_somerc.loc[i, 'geometry'] 
                   for i in range(basel_somerc.shape[0])]
In [31]:
polygons_wgs84 = []
for poly in polygons_somerc:
    x, y = poly.exterior.coords.xy
    x_new, y_new = pyproj.transform(input_projection, output_projection, x, y)
    poly_new = shp.geometry.Polygon(list(zip(x_new, y_new)))
    polygons_wgs84.append(poly_new)
In [32]:
polygons_somerc[0]
Out[32]:
In [33]:
polygons_wgs84[0]
Out[33]:

...or use GeoPandas and its to_crs method.

In [34]:
basel_somerc.plot()
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fec5ad4def0>
In [35]:
basel_wgs84 = basel_somerc.to_crs(epsg=4326)
basel_wgs84.plot()
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fec5ac16cf8>

Add a centroid for each district

In [36]:
basel_wgs84['CentroidLongitude'] = basel_wgs84['geometry'].apply(lambda poly: poly.centroid.bounds[0])
basel_wgs84['CentroidLatitude'] = basel_wgs84['geometry'].apply(lambda poly: poly.centroid.bounds[1])
basel_wgs84.head()
Out[36]:
OBJID OBJECTID TXT ZTXT TYPE geometry CentroidLongitude CentroidLatitude
0 17136 1 7 07 Bruderholz POLYGON ((7.606348201652279 47.53293165390978,... 7.595473 47.532624
1 17139 2 6 06 Gundeldingen POLYGON ((7.584232669336745 47.55098643373825,... 7.592170 47.545054
2 17142 3 5 05 St. Alban POLYGON ((7.611547896278745 47.55519084800139,... 7.608627 47.546740
3 17145 4 4 04 Breite POLYGON ((7.621400475031203 47.55398888665725,... 7.617374 47.553016
4 17148 5 8 08 Bachletten POLYGON ((7.57990522852878 47.55314909549825, ... 7.570503 47.550043

Basel's parks

Apparently there are some issues with the shapefile I extracted from BBBike. Many shapely records are invalid.

Because of these invalid records, gv.Shape.from_shapefile does not work.

Luckily enough, you can use cartopy.shapereader.Reader and filter out these records with a simple condition.

In [37]:
shp_filepath = os.path.join(data_dir, 'basel_shapefiles', 'natural.shp')
reader = shpreader.Reader(shp_filepath)
In [38]:
parks_generator = (r for r in reader.records() if r.attributes['type'] == 'park')
parks = [p for p in parks_generator if p.geometry.area > 0]
In [39]:
park_geometries = [p.geometry for p in parks]
print(parks[0].attributes)
park_geometries[0]
{'osm_id': 12362783, 'name': 'Kannenfeldpark', 'type': 'park'}
Out[39]:
In [40]:
# some parks have no name. Maybe they are just small (private?) green areas...
', '.join([p.attributes['name'] for p in parks])
Out[40]:
'Kannenfeldpark, , Buremichelskopfanlage, Schützenmattpark, Helvetiaplatz, Benkenpark, Theodorsgraben-Anlage, Rosenfeldpark, Christoph Merian-Park, Dreirosenanlage, Elisabethenanlage, Oekolampadpark, Verkehrspark, , , St.Johanns-Platz, St.Johanns-Platz, St.Johanns-Platz, Totentanz, Petersplatz, Tierpark Lange Erlen, Wegmatten, Brüglingen, Raemelmatte, Park Im Grünen (Grün 80), , Dürrenmatten, , St Alban-Tor Park, Rosentalanlage, , Luftmatt, , , , St.Johanns-Park, , , Pruntrutermatte, , , , , , Forum, , , , , , , , , , , , , Gundeli Park, , , Spielplatz Rehhagstrasse, Holderstüdelipark'
In [41]:
%%opts Shape (fill_color='green')
gv.Shape(park_geometries[0]) * gv.Text(7.5704, 47.5653, parks[0].attributes['name'])
Out[41]:

Basel's Cafes

In [42]:
shp_filepath = os.path.join(data_dir, 'basel_shapefiles', 'points.shp')
reader = shpreader.Reader(shp_filepath)
In [44]:
cafes = [r for r in reader.records() if r.attributes['type'] == 'cafe']
f'There are {len(cafes)} cafes in Basel'
Out[44]:
'There are 74 cafes in Basel'
In [45]:
cafes[0].geometry.bounds
Out[45]:
(7.5938822, 47.5692456, 7.5938822, 47.5692456)
In [46]:
cafes_data = list(map(lambda x: (x.geometry.bounds[0], x.geometry.bounds[1], x.attributes['name']), cafes))
In [47]:
cafes_df = pd.DataFrame(cafes_data, columns=['Longitude', 'Latitude', 'Name'])
cafes_df.head()
Out[47]:
Longitude Latitude Name
0 7.593882 47.569246 Senat Maki's
1 7.588602 47.567340 Per Tutti
2 7.590183 47.567713 Ticino
3 7.588983 47.575878
4 7.592869 47.566456 Da Graziella
In [48]:
cafes_dataset = gv.Dataset(cafes_df, kdims=['Name'])
In [49]:
%%opts WMTS [width=800 height=600]
%%opts Shape (fill_color='green' fill_alpha=0.5)
%%opts Points (color='orange' line_color='black' size=7) [tools=['hover']]
hv.Overlay([
    gv.WMTS(tiles['ESRI'], crs=ccrs.PlateCarree()), 
    hv.NdOverlay({i: gv.Shape(pg, crs=ccrs.PlateCarree()) for i, pg in enumerate(park_geometries)}),
    cafes_dataset.to(gv.Points, kdims=['Longitude', 'Latitude'], vdims=['Name'], crs=ccrs.PlateCarree()),
])
Out[49]:
In [53]:
%%output size=600
hv.Overlay([
    hv.NdOverlay({i: shape_district(poly) for i, poly in enumerate(basel_wgs84.geometry)}),
    hv.NdOverlay({i: shape_park(poly) for i, poly in enumerate(park_geometries)}),
    hv.NdOverlay({i: text_centroid(lon, lat, name) for i, (lon, lat, name) in enumerate(list_centroid)}),
    cafe_points.opts(style={'color': 'orange', 'edgecolor': 'black', 's': 50}),
])
Out[53]:

Districts with the highest number of Cafes?

In [54]:
names = []
points = []
for i, cafe in cafes_df.iterrows():
    names.append(cafe.Name)
    points.append(shp.geometry.Point((cafe.Longitude, cafe.Latitude)))
    
cafes_gdf = gpd.GeoDataFrame({'Cafe': names, 'geometry': points}, crs={'init': 'epsg:4326', 'no_defs': True})
cafes_gdf.head()
Out[54]:
Cafe geometry
0 Senat Maki's POINT (7.5938822 47.5692456)
1 Per Tutti POINT (7.5886017 47.5673398)
2 Ticino POINT (7.5901833 47.567713)
3 POINT (7.5889833 47.5758776)
4 Da Graziella POINT (7.5928691 47.5664559)
In [55]:
districts_and_cafes = gpd.sjoin(basel_wgs84[['TYPE', 'geometry']], cafes_gdf, how="left", op='contains')
# districts_and_cafes.head()
In [56]:
cafes_count = districts_and_cafes.groupby('TYPE')[['Cafe']].count().rename(columns={'Cafe': 'CafeCount'})
cafes_in_basel = cafes_count.sort_values(by='CafeCount', ascending=False).reset_index()
# cafes_in_basel.head()
In [57]:
gdf = pd.merge(basel_wgs84[['TYPE', 'geometry']], cafes_in_basel)
gdf.sort_values(by='CafeCount', ascending=False).head()
Out[57]:
TYPE geometry CafeCount
18 Matthäus POLYGON ((7.598904359545449 47.57318222535969,... 12
5 Vorstädte POLYGON ((7.58481814196309 47.56691635597112, ... 12
1 Gundeldingen POLYGON ((7.584232669336745 47.55098643373825,... 11
10 St. Johann POLYGON ((7.572597585628209 47.57850891520469,... 7
9 Iselin POLYGON ((7.575575941753982 47.56093059715941,... 4
In [58]:
%%output size=400
%%opts Polygons (cmap='YlOrRd') [color_index='CafeCount' colorbar=True]
gv.Polygons(gdf, vdims=['TYPE', 'CafeCount'], label='Cafes in Basel districts')
Out[58]:

Big Data?

Use GeoViews (HoloViews) + Datashader + Matplotlib/Bokeh.

The combination HoloViews + Datashader lets you switch between Datashader and non-Datashader plots generated by Matplotlib or Bokeh.

NYC yellow taxi visualization @ hdf5-pydata-munich

NYC yellow taxi pickups

Datashader

Datashader is a data rasterization pipeline similar to a 3D graphics shading pipeline.

  1. Projection: Data $\rightarrow$ Scene
  2. Aggregation: Scene $\rightarrow$ Aggregates (an aggregate bin ends up in one pixel)
  3. Transformation + Colormapping: Aggregates $\rightarrow$ Image
  4. Embedding: Image $\rightarrow$ Plot

This approach allows you to plot a lot of data, and to avoid plotting pitfalls such as:

  • overplotting
  • oversaturation
  • undersampling
  • undersaturation
  • underutilized range
  • nonuniform colormapping

A few resources on Datashader

How Datashader works

PyViz: how to solve visualization problems with Python tools.

James Bednar's notebooks:

Guillem Duran Ballester's talk @ EuroPython 2017 with AirBnB data.

Johnny Chan's Bokeh app on UK road accidents.

Thank You!

GitHub repo @ http://bit.ly/geopython-basel.

I'm also on Twitter @jackdbd.