From Coal to Solar: Tracking Power With Machine Learning and Satellites

ODSC, September 2020

Future Energy Associates

What I'll cover

  1. The big picture
  2. Growth of geospatial data
  3. Coal -> needs to go
  4. Solar -> needs to be cheaper and better
  5. Geospatial ML pipeline with RasterVision

The Big picture

Photo by Vince Gx on Unsplash

Climate change
and net zero


What does that mean?

Glen Peters, CICERO,

Coal plants worldwide

Coal power for electricity contributed 30% of global CO2 emissions in 2018 (IEA).

Source: Carbon Brief,

Power plant data

The World Resources Institute has a great open dataset

Power plant data

      import pandas as pd
      import matplotlib.pyplot as plt
      power_plants = pd.read_csv("talk_data/global_power_plant_database.csv")
      f, ax = plt.subplots(figsize=(12, 8))
      plants = power_plants.groupby('primary_fuel')\
                            .sort_values('estimated_generation_gwh', ascending=False)\
      sns.barplot(x="estimated_generation_gwh", y="primary_fuel", data=plants,
                  label="Total", color="b")
      ax.set(ylabel="", xlabel="Estimated Generation 2018 in GWh")
      sns.despine(left=True, bottom=True)


Source: Carbon Tracker Initiative

Solar - part of the solution 🌞🌞🌞

Quiz - how much of the world's electricity was:

  • low-carbon
  • renewable
  • what about energy?

Powering the transition

Linking the Power and Transport Sectors—Part 1: The Principle of Sector Coupling, Robinius et al, Energies 2017, 10(7), 956;

Geospatial Analysis for energy

Geospatial terms

Spatial vs geospatial

Source: Dr Joanna Kuleszo @ UCL


e.g. *.tif, *.jpg, *.exif, *.png, *.gif

Geospatial Vector

e.g. *.geojson, *.shp, ...

Source: AutoGIS, University of Helsink, 'These materials are partly based on Shapely-documentation and Westra E. (2013), Chapter 3.'

Coordinate systems

        import geopandas as gpd
        gb_outline = gpd.read_file('greatbritain.shp')

Coordinate systems

        Projected CRS: EPSG:27700
        Name: OSGB 1936 / British National Grid
        Axis Info [cartesian]:
        - E[east]: Easting (metre)
        - N[north]: Northing (metre)
        Area of Use:
        - name: UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E
        - bounds: (-9.2, 49.75, 2.88, 61.14)
        Coordinate Operation:
        - name: British National Grid
        - method: Transverse Mercator
        Datum: OSGB 1936
        - Ellipsoid: Airy 1830
        - Prime Meridian: Greenwich

Coordinate systems


Coordinate systems


        Geographic 2D CRS: +init=epsg:4326 +type=crs
          Name: WGS 84
          Axis Info [ellipsoidal]:
          - lon[east]: Longitude (degree)
          - lat[north]: Latitude (degree)
          Area of Use:
          - name: World
          - bounds: (-180.0, -90.0, 180.0, 90.0)
          Datum: World Geodetic System 1984
          - Ellipsoid: WGS 84
          - Prime Meridian: Greenwich

Web Mercator

General lack of understanding that the Web Mercator differs from standard Mercator usage has caused considerable confusion and misuse. For all these reasons, the United States Department of Defense through the National Geospatial-Intelligence Agency has declared this map projection to be unacceptable for any official use.

from wikipedia Web Mercator projection

Python tools

  • geopandas - geospatial dataframes
  • shapely - manipulation of planar features, geometric objects, geometry operations
  • folium - leaflet.js maps in python
  • rasterio - access to raster data in python


Aerial data

Shot from planes.
Now also from drones.
Very high resolution (less than 0.25m per pixel)
Infrequent creation (ie low revisit rate)

Satellite data

Huge growth in availability

Resolution getting better all the time (less than 3m visual band becoming widely available - but paid, 10m visual through Sentinel)

Multi-spectral thanks to many different satellite sensors: good for SOx, NOx, Methane, CO2, etc

Landsat 8, launched 2013

By Orbital Sciences Corporation -, Public Domain,

Planet's cubesats,

In 2015, Planet had 92 satellites. In 2017 in launched 88 - the 'dove' constellation, promising 1 day revisit rate - global daily coverage!

Photo: Planet Labs CEO Will Marshall, credit: Steve Jurvetson

Satellite sensor bands


Getting Sentinel2 data

One option: Google Earth Engine

            conda update -c conda-forge earthengine-api
            # or 
            pip install earthengine-api --upgrade

            import ee

            def obtain_sentinel_imgs(collection='COPERNICUS/S2', 
                date_range, area, max_cc_pct=None):
            Selection of median, cloud-free image from a 
            collection of images in the Sentinel 2 dataset

            collection: str
            Name of the collection
            date_range: list
            Must be of the following form ['%Y-%m-%d','%Y-%m-%d']
            area: ee.geometry.Geometry
            Area of interest
            reduction: str
            Operation to be carried out over the aggregated image
            max_cc_pct: float
            Maximum cloud cover percentage to be returned

            sentinel_median: ee.image.Image


            col = (ee
            .filterDate(date_range[0], date_range[1])

            if max_cc_pct is not None:
            col = col.filter(

            return col

Working with imagery data


          array = xr.open_mfdataset(f'../data/imgs_eu/', combine='by_coords')

Cloud native imagery?

STAC: SpatioTemporal Asset Catalogs

Cloud Optimized GeoTIFF


Huai'an power station in Hebei province, 660 MW

Zhangjiakou coal power plant in Hebei province, 2.6 GW

Source: via Carbon Tracker via Planet

Matt Gray, Managing Director, Co-Head of Power & Utilities at Carbon Tracker,

Carbon Tracker,

Why do UK solar panels need locating?

Better forecasting

In the UK, better PV forecasts should save £1-10 million per year [1], and about 100,000 tonnes of CO2 per year.

[1]Taylor et al, 2016

UK PV over time


Source: Open Climate Fix

My plan to
locate solar PV

  1. Get image data set of the UK
  2. Use Open Street Map labels for solar panels to create a training set
  3. Use neural networks for semantic segmentation on all images to find the rest of the panels
  4. Write up and relax

Semantic segmentation?

Image source: CSAIL


What data can we use?

Academic license - use Digimap

Aerial Digimap product, licensed from Getmapping plc.

25cm vertical ortho-photography

2TB of imagery data...


Image age

Ok, but what about the labels?

Enter Open Street Map!

At the time, 15000 labels

Now... 240,000+ !

          # #

          solar_query = """
          area["name"="United Kingdom"]->.searchArea;
          out body;
          out skel qt;

        overpass_url = ""
        import requests
        osm_pv = requests.get(overpass_url, params={'data': solar_query})
        df = pd.DataFrame(osm_pv.json()['elements'])

Setting up a pipeline?

Enter RasterVision

An open source framework for deep learning on satellite and aerial imagery.

Thank you Azavea 👏

How to use RasterVision

See my experiment script here:


  • Docker image
  • nvidia-docker requires Linux for GPUs
  • Dependencies are a pain to manage otherwise
  • GPUs mandatory - model runs can easily take 12hrs+ depending on experiment


RasterVision takes on a lot of the work that would be reasonably boilerplate between geospatial analysis projects.

            import rastervision as rv

            # Experiment

            class SolarExperimentSet(rv.ExperimentSet):
            def exp_main(self, test=False):

              # experiment goes here

Experiment Steps

  • TASK


                  # Connect docker filepath mounted to my data directory
                  base_uri = join(raw_uri, '/labels')

                  # Experiment label, used to label config files
                  exp_id = 'pv-detection-1'

                  # Number of times passing a batch of images through the model
                  num_steps = 1e5
                  batch_size = 8
                  # Specify whether or not to make debug chips (a zipped sample of png chips
                  # that you can examine to help debug the chipping process)
                  debug = True

                  # Split the data into training and validation sets:
                  # Randomize the order of all scene ids
                  scene_ids = sorted(scene_ids)

                  # Figure out how many scenes make up 80% of the whole set
                  num_train_ids = round(len(scene_ids) * 0.8)
                  # Split the scene ids into training and validation lists
                  train_ids = scene_ids[0:num_train_ids]
                  val_ids = scene_ids[num_train_ids:]


              # ------------- TASK -------------

              task = rv.TaskConfig.builder(rv.SEMANTIC_SEGMENTATION) \
                .with_chip_size(300) \
                'pv': (1, 'yellow'),
                'background': (2, 'black')
              }) \
                target_count_threshold=1000) \


            # # ------------- BACKEND -------------
            backend = rv.BackendConfig.builder(rv.TF_DEEPLAB) \
              .with_task(task) \
              .with_debug(debug) \
              .with_batch_size(num_steps) \
              .with_num_steps(batch_size) \
              .with_model_defaults(rv.MOBILENET_V2) \

            # Make scenes to pass to the DataSetConfig builder.
            def make_scene(id):

            # Create lists of train and test scene configs
            train_scenes = [make_scene(id) for id in train_ids]
            val_scenes = [make_scene(id) for id in val_ids]


            # ------------- DATASET -------------
            # Construct a DataSet config using the lists of train and
            # validation scenes
            dataset = rv.DatasetConfig.builder() \
              .with_train_scenes(train_scenes) \
              .with_validation_scenes(val_scenes) \


                # ------------- ANALYZE -------------
                # We will need to convert this imagery from uint16 to uint8
                # in order to use it. We specified that this conversion should take place
                # when we built the train raster source but that process will require
                # dataset-level statistics. To get these stats we need to create an
                # analyzer.
                analyzer = rv.AnalyzerConfig.builder(rv.STATS_ANALYZER) \


          # ------------- EXPERIMENT -------------
          experiment = rv.ExperimentConfig.builder() \
            .with_id(exp_id) \
            .with_task(task) \
            .with_backend(backend) \
            .with_analyzer(analyzer) \
            .with_dataset(dataset) \
            .with_root_uri('/opt/data') \

          return experiment

          if __name__ == '__name__':

Run it!

Start up the docker container

          docker run --runtime=nvidia --rm -it -p 6006:6006 \
            -v ${RV_QUICKSTART_CODE_DIR}:/opt/src/code \
            -v ${RV_QUICKSTART_EXP_DIR}:/opt/data \

Fire away

        rastervision run local -p

Method overview

  • Image data from Aerial Digimap
  • Training a model with 9,508 labels from Open Street Map
  • Model is 'off-the-shelf' neural network
  • Analysis pipeline, training and prediction with RasterVision

That would be too easy.

Creating a good training dataset is hard

Take two: train on an existing dataset

Distributed Solar Photovoltaic Array Location and Extent Data Set for Remote Sensing Object Identification

Bradbury, Kyle; Saboo, Raghav; Johnson, Timothy; Malof, Jordan; Devarajan, Arjun; Zhang, Wuming; et al. (2016): Full Collection: Distributed Solar Photovoltaic Array Location and Extent Data Set for Remote Sensing Object Identification. figshare. Collection.

Predict on UK

Predict on UK

Results on a hand-selected UK set

That's nice, I want pictures

Urban predictions

Cambridge - lots of accurate predictions here.


Knowsley - some hit some miss But - some things like conservatories have not come up as false positives.

False positives

Bristol docks confused the model

Alternative methods for geospatial ML

Get involved!

Is your curiosity piqued?
Do you want to contribute to climate related tech projects?

Open Climate Fix

Thank you

I do exciting energy data science projects at

If you do some kind of data science, I'd love to hear from you!

Say hi:

Twitter: @LaurenceWWatson


Data wrangling:

RasterVision scripts:

Don't we know where the PV is already?


Carbon Tracker,