Exercise: Determining the extent of water bodies¶
Overview¶
In this exercise, we will create a new notebook to determine the extent of water bodies using the Water Observation from Space (WOfS) product. The WOfS product uses an automated water mapping algorithm to identify water in Landsat 8 images.
The notebook will include the following steps:
Load the WOfS feature layer product and Landsat 8 data
Identify water pixels from WOfS
Plot a true-colour image using Landsat data
Plot the water body area for the same area using WOfS data
Customise the plots
At the conclusion of this exercise, you will be able to determine the extent of water bodies using the WOfS product.
Set up notebook¶
In your Training folder, create a new Python 3 notebook. Name it Water_extent_exercise.ipynb
. For more instructions on creating a new notebook, see the instructions from Session 2.
Load packages and functions¶
In the first cell, type the following code and then run the cell to import necessary Python dependencies.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import datacube
from deafrica_tools.datahandling import load_ard, wofs_fuser
from deafrica_tools.plotting import display_map, rgb
In this exercise, we import one new function, wofs_fuser
. wofs_fuser
ensures WOfS data between scenes is combined correctly.
Connect to the datacube¶
Enter the following code and run the cell to create our dc
object, which provides access to the datacube.
dc = datacube.Datacube(app="water_extent_exercise")
Select area of interest¶
In the next cell, enter the following code, and then run it to select an area and time. In this exercise, we use a central point and buffer to define our area of interest, similar to what we did in the Session 5 exercise.
The only difference here is that we provide a latitude buffer and a longitude buffer. In this example, they have the same value. However, choosing different buffer values allows you to select rectangular areas of interest, rather than squares.
# Define the area of interest
lat = -6.0873
lon = 35.1817
lat_buffer = 0.2
lon_buffer = 0.2
# Combine central lat, lon with buffer to get area of interest
lat_range = (lat - lat_buffer, lat + lat_buffer)
lon_range = (lon - lon_buffer, lon + lon_buffer)
# Define the year
time = '2018'
Note
Recall code lines beginning with #
are comments. They help explain the code, and can be removed or added without impacting the actual Python scripts.
In the next cell, enter the following code, and then run it to show the area on a map. Since we have defined our area using the variables lon_range
and lat_range
, we can use those instead of typing out (lat - lat_buffer, lat + lat_buffer)
and (lon - lon_buffer, lon + lon_buffer)
again.
display_map(x=lon_range, y=lat_range)
Create query object¶
Notice lat_range
, lon_range
and time
were all defined in the previous cell, so we can use them as variables here. We will use them to create a query.
The query
variable below is a Python dictionary. It can be used to store parameters. Creating an object variable such as query
makes it possible to reuse parameters in various functions that accept the same input parameters.
This is useful to us because we can use it to load the Landsat 8 data, and then use it again to load the WOfS data.
In the next cell, enter the following code, and then run it.
query = {
'x': lon_range,
'y': lat_range,
'resolution': (-30, 30),
'output_crs':'EPSG:6933',
'group_by': 'solar_day',
'time': (time),
}
Note
Notice the structure of the query
dictionary is slightly different from dc.load
or load_ard
. Each parameter name is in quotes ''
and is followed by a colon :
.
Load data¶
In the next cell, we load the Landsat and WOfS datasets, naming them ds_landsat
and ds_wofs
respectively.
In the functions below, we can directly pass the query
object using **query
— this will give all the settings defined in query
to the function.
The main benefit is that we can use the same query
for both Landsat 8 and WOfS, which saves us typing it again and prevents us from making mistakes.
Load Landsat 8¶
For Landsat 8, we can use the load_ard
function.
ds_landsat = load_ard(dc=dc,
products=['ls8_sr'],
measurements=['red', 'green', 'blue'],
**query)
ds_landsat
Note some users will find ds_landsat
loads with 43 timesteps. That will not affect the next steps.
Load WOfS¶
For WOfS, we need to use the dc.load
function.
ds_wofs = dc.load(product="wofs_ls",
fuse_func=wofs_fuser,
**query
)
ds_wofs
Note
As of September 2021, WOfS can best be accessed through the product name wofs_ls
. This replaces the deprecated ga_ls8c_wofs_2
product name. For more information on wofs_ls
, visit the WOfS technical specifications.
Calculating water extent¶
Understanding the WOfS feature layers¶
WOfS feature layers are stored as a binary number, where each digit of the number is independantly set or not based on the presence (1) or absence (0) of a particular feature. Below is a breakdown of which decimal value represent which features.
Attribute |
Decimal value |
---|---|
No data |
1 |
Non contiguous |
2 |
Sea |
4 |
Terrain or low solar angle |
8 |
High slope |
16 |
Cloud shadow |
32 |
Cloud |
64 |
Water |
128 |
For example, a value of 128 indicates that water were observed for the pixel, whereas a value of 32 would indicate cloud shadow.
In the next cell we will extract only the water features from the WOfS data. This is done by finding values where the water
measurement equals 128. In Python, we can find which pixels have a value of 128 by using the ==
expression:
Extract the water pixels¶
ds_valid_water = ds_wofs.water == 128
The ds_valid_water
array does not contain the decimal values of the WOfS feature layers. Instead, it has a value of False
if the pixel was not water, and True
if it was water. You can check this by viewing the ds_valid_water
DataArray.
ds_valid_water
Calculate the area per pixel¶
The number of pixels can be used for the area of the waterbody if the pixel area is known. We can extract the size of a pixel from the resolution
setting in our query
, then divide the area of a single pixel (in square metres) by the number of square meters in a square kilometre.
In Python, number**2
returns the squared value of number
.
pixel_length = query["resolution"][1] # in metres
m_per_km = 1000 # conversion from metres to kilometres
area_per_pixel = pixel_length**2 / m_per_km**2
Calculate area of water pixels¶
Now that we know how much area is covered by one pixel, we can count up the number of water pixels, and multiply it by this value to get the total area covered by water.
As we saw above, the ds_valid_water
array contains True
values for water pixels, and False
otherwise. When we use the .sum
function, it counts True
values as 1, and False
as 0. Therefore, the sum will be equal to the total number of water pixels for that timestep.
Below, we set the dimensions as x
and y
to make sure we sum up all the pixels over the spatial dimensions. This means we get one pixel sum for each timestep. This will let us track how the water area changes over time.
ds_valid_water_pixel_sum = ds_valid_water.sum(dim=['x', 'y'])
ds_valid_water_area = ds_valid_water_pixel_sum * area_per_pixel
Plot time series¶
Now that we have the area of water in each observation, we can plot a time series to help us identify dates where the was more or less water within the area of interest.
Below, there is code to set-up, display and customise the plot. The settings are as follows:
plt.figure(figsize=(18, 4))
: set up a figure object to contain the plot, and make it 18 inches long and 4 inches highds_valid_water_area.plot(marker='o', color='#9467bd')
: plot the water area data with circular markers in purple (HEX colour code#9467bd
)plt.title('Time Series Analysis of water observed area')
: Give the plot a titleplt.xlabel('Dates')
: Label the x-axisplt.ylabel('Waterbody area (km$^2$)')
: Label the y-axis. The$
symbols allow the use of LaTeX, a mathematical typesetting languageplt.tight_layout()
: Formats the image so that all axes can be clearly seen
plt.figure(figsize=(18, 4))
ds_valid_water_area.plot(marker='o', color='#9467bd')
plt.title('Time Series Analysis of water observed area')
plt.xlabel('Dates')
plt.ylabel('Waterbody area (km$^2$)')
plt.tight_layout()
Display of water coverage for a selected timestep¶
From the graph above you can choose any timestep (between 0 and 45) to display the result on the for both WOfS and Landsat 8.
For example, let us look at the fifth timestep, timestep = 4
.
timestep = 4
# Plot water extent
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
#plot the true colour image
ds_nearest_landsat = ds_landsat.sel(time=ds_wofs.time.isel(time=timestep), method='nearest')
rgb(ds_nearest_landsat, ax=ax[0])
# plot the water extent from WOfS
ds_wofs.isel(time=timestep).water.plot.imshow(ax=ax[1], cmap="Blues", add_colorbar=False)
# Titles
ax[0].set_title("Water Extent - Landsat"), ax[0].xaxis.set_visible(False), ax[0].yaxis.set_visible(False)
ax[1].set_title("Water Extent - WOFS"), ax[1].xaxis.set_visible(False), ax[1].yaxis.set_visible(False)
plt.show()
This code uses some additional settings to customise the plot, including allowing to have two plots together. If you want to know more about making this kind of plot, please ask the instructors during a live session.
Try different timestep
values — can you find an image where the lake is dried out?
Conclusion¶
Congratulations! You have made your own water extent notebook. It is comparible to the existing Sandbox water extent notebook.
You’ve now built your second case study! You make like to reflect on what was similar and different between the two. Are there any pieces of code you could reuse for a new analysis? How might you modify your case studies to do more complex analysis?
If you’d like to experiment futher, try running the code with different areas. Did you learn anything interesting to share with us?
Optional activity¶
If you’re curious about how the existing case study works, you can open and run it in the Sandbox:
From the main Sandbox folder, open the Real_world_examples folder
Double-click the Water_extent.ipynb notebook to open it
The notebook has already been run, so you can read through it step by step. However, you may find it valuable to clear the outputs and run each cell step by step to see how it works. You can do this by clicking Kernel -> Restart Kernel and Clear All Outputs. When asked whether you want to restart the kernel, click Restart.
Note
If you want to significantly modify it, we recommend you make a copy, like you did in Session 1.
There are many similarities between the notebook you built in this session, and the existing Sandbox notebook. Maybe make a note of what is similar and what is different. If you have any questions about how the existing notebook works, please ask the instructors during a live session.