Loading data in the Sandbox¶
In this exercise, we will load data from the datacube. First, we will set up a new notebook to work in. Then, we will load Landsat 8 data for a specific time, and use that data to plot a colour image. Finally, we will show you how to modify the load process to load and plot Sentinel-2 data.
Make a new notebook¶
Let’s create a new, blank Jupyter notebook for this exercise.
Navigate to the Training folder. The Training folder was created as part of Session 1, for copying and running the Crop Health notebook. If you do not have this folder in the Sandbox, you can create it by following the steps in Running a Notebook.
If Launcher is not the active tab in the main work area (right pane), click the + button at the top of the left sidebar to open the launcher in the right pane.
In the Notebook section of Launcher, select the Python 3 option to create a new notebook in the current directory.
The new notebook will be called
Untitled.ipynb, but you can rename it. Right-click the notebook in the file menu and select Rename.
Type in the desired name. For example, we can call it
Enter key to finish renaming the notebook.
Set up notebook¶
Load packages and functions¶
Packages and functions act as the toolbox of Python programming. We will import the ones which will be useful to us.
In the first cell, type the following code and then run the cell.
Run a cell by pressing
Shift + Enter on your keyboard.
%matplotlib inlineallows us to plot graphs and maps
datacubeis imported to allow us to create an object that can retrieve data from the datacube, which we will do in the next cell we create
deafrica_toolscontains several modules which help load, analyse and output data from Digital Earth Africa. Here we call upon the module
deafrica_tools.plottingto import the
rgbplot function, which allows us to visualise data as true-colour (red-green-blue, or RGB) images
When the cell has finished running, it will show
 next to it, and generate a new blank cell below it.
As of June 2021, the
deafrica_tools package replaces the deprecated
sys.path.append('../Scripts') file import. For more information on
deafrica_tools, visit the DE Africa Tools module documentation.
Connect to the datacube¶
datacube package allows us to access the data in the Sandbox. To use it, we must establish a connection with the datacube. Enter the following code and run the cell.
datacube.Datacube class provides access to the datacube. We usually call objects of this class
dc, as we have done here. The
app parameter is a unique name for the analysis which is based on the notebook file name.
When the cell has finished running, it will show a
 next to it, and generate a new blank cell below it.
Load Landsat 8 data¶
This exercise will load Landsat data for an area of Dar es Salaam, Tanzania. We will use a pair of latitude coordinates (-6.90, -6.70) and a pair of longitude coordinates (39.20, 39.37) to specify the area to load. Data will be loaded for the rectangle defined by these coordinate ranges.
First, we will view this area on a map. This allows us to check we have the correct coordinates. In the new cell below, enter the following code, and then run it to see this area on a map.
The output of that cell should look like this.
In the new cell below, enter the following code, and then run it to load Landsat 8 data.
We load data with the
dc.load() function. We have chosen to call the loaded dataset
landsat_ds. The text between the brackets of
dc.load() are our parameters. We have chosen to put the parameters on separate lines to make the code easier to read (and errors easier to spot). Each parameter must be separated by a comma.
productargument is the datacube product to load data from. We want to access the Landsat 8 dataset, which is named
ls8_sruses only numbers and lowercase letters. It stands for Landsat 8 Surface Reflectance.
yarguments specify the area to load data for. In this case, they represent longitude and latitude. This defines a rectangle spanning their ranges of coordinate values as seen in the
timeargument specifies the time range of data to load. We have specified all of the year of 2018.
output_crsargument specifies the Coordinate Reference System (CRS) to load data in. The CRS
EPSG:6933specifies an equal area projection — each pixel has the same area.
resolutionargument is the
xresolutions (in that order) in pixels per degree. The first value is typically negative. In this case, a
(-30, 30)is a resolution of 30 metres per pixel, which is the maximum resolution of Landsat data.
group_byargument controls how data that is close in time is combined to provide better images. Specifying a value of
measurementsargument specifies what bands will be loaded. We will plot a true-colour image of this data later. To do that, we need the red, green, and blue bands.
As of June 2021, DE Africa Landsat data has been upgraded to Collection 2. Datacube names have been updated to
ls8_sr. Deprecated naming conventions such as
ls8_usgs_sr_scene will no longer work. For more information on Landsat Collection 2, visit the DE Africa Landsat documentation.
Sometimes, typing mistakes can occur. This will produce an error message when you run the cell.
For example, this error is a
It tells us there might be a mistake just before the section of the code
x=(39.20, 39.39),. Sure enough, this error message was generated when a comma was missing after the
product parameter, as shown in the screenshot below.
For illustrative purposes, the point where the comma is missing has been highlighted by a red box, but this will not appear in JupyterLab — you will have to find the error or errors yourself.
If errors such as
SyntaxError appear, they must be resolved before you can continue. Try checking for some of these common issues:
Are all brackets and quotation marks in the right place?
Does every open bracket have a corresponding close bracket?
Do your bracket types match?
(must be closed by
], and they have different meanings in Python, so they are not interchangeable.
Does every opening quotation mark have a closing quotation mark? You can use either
", but pairs of quotation marks must be the same.
Are there commas
,between items listed in square brackets
Is the indentation correct? Press
Tabon your keyboard to increase the level of indentation, and press
Shift + Tabon your keyboard to decrease the level of indentation.
Is everything spelt correctly?
Once you have made your changes, try executing the cell again, by pressing
Shift + Enter on your keyboard.
If you get a
NameError, it could be because you have not yet imported the required packages and functions. They must be imported every time you start a new server session. To resolve this, follow the instructions in the section above, Load packages and functions.
An example of a
NameError caused by not importing the
Take your time to type code. If you would like to learn more about Python code syntax, or more chances to practise basic Python skills, take a look at the optional extra session Python basics.
dc.load() cell successfully executes, it will create a new cell below it. In this new cell, we can enter the name of our dataset and run the cell. This will show the dataset we loaded.
The output of the cell should look similar to this:
The output of
dc.load() is an
xarray.Dataset object. This type of dataset is a common format for satellite data, and is organised by:
Dimensions: The dimensions of the dataset. For Earth observation data, this is often
time, as seen here. The units for the
ydimensions are pixels, while
timeis counted in number of flyovers. In this example we see there were 21 flyovers of our selected location during the year of 2018.
Coordinates: A list of the values of each dimension.
spatial_refrefers to the CRS we selected in
Data variables: The data values for our chosen measurements. We see
blueare loaded as we specified in the
dc.load()command. This product provides values for surface reflectance, which is unitless.
Attributes: Metadata about this dataset. The CRS is listed again.
Plot a true-colour image¶
True-colour images are also known as red-green-blue (RGB) images. They are rendered using the image’s ‘natural’ colours and appear how they might be seen by the human eye. As we loaded red, green and blue bands from Landsat 8, we can now plot an RGB image using the data from
In the next blank cell, enter the following code. Run the cell to generate an RGB image.
The function used here is called
The first item inside the
rgb()brackets is the name of the dataset we are drawing the data from. In this case, we want to pull information from
bandsspecifies the name of the data variables in the dataset that correspond to red, green and blue. We saw above that in
landsat_dsthey are conveniently named
indexrefers to the timestep to view. The default is 0. The Python language counts from 0, so
index=0shows the first flyover, and
sizeis the height of the image.
The RGB image will look like this:
The title of the image notes that the date for this data is
2018-02-16, or 16 February 2018.
Exercise: Load and plot Sentinel-2 data¶
Let us repeat the data loading process for Sentinel-2 data. It is a very similar process to loading the Landsat 8 data. We want to load data for the same time and place, so we only have to change the
Let us call our Sentinel-2 dataset
sentinel_2_ds. You must name it something different from the Landsat 8 dataset. In a new cell, type the name of the Sentinel-2 dataset.
Again, we will use
dc.load()to import Sentinel-2 data. After
= dc.load(). It should look like
sentinel_2_ds = dc.load().
How do we fill out the parameters inside the brackets of
dc.load()? We can do this by copying some of the information from the Landsat 8
dc.load()input cell. The first parameter we listed before was
product. However, we don’t want to use the Landsat 8 product, we want to select the Sentinel-2 product,
s2_l2a stands for Sentinel-2 Level-2A. The fourth character is a lowercase alphabet ‘l’. Double-check you have entered the product name correctly to avoid errors.
resolutionparameter will also be different from the Landsat 8 load. For Sentinel-2, it should be
(-10,10), since our Sentinel-2 data has a resolution of 10 metres per pixel.
Now, type the rest of the parameters to be the same as they were for the Landsat 8 load. This includes:
As before, watch out for commas, quotation marks, and brackets to avoid error messages when running the cell.
You should end up with a set of parameters that look like this:
Run the cell to load Sentinel-2 data.
In the new cell below that, let us plot an RGB image around the same time as the Landsat 8 RGB image, which was from February 16, 2018.
We must specify the dataset first, followed by the bands, index, and size. In this case, we want to use
index=9. Ensure the cell has the following code and then run it.
An RGB image using Sentinel-2 data will be generated.
As Landsat 8 data and Sentinel-2 data come from different satellites, their flyovers are not always at the same time. In this case, the closest date of Sentinel-2 data to the Landsat data is one day before, on February 15, 2018. It is another cloudy scene, like the Landsat 8 one.
You have successfully loaded and plotted data for Landsat 8 and Sentinel-2.
You have also finished the second session of the Digital Earth Africa training course. In this session, you have learned about:
Digital Earth Africa products, including Landsat 8 and Sentinel-2
Visualising data with the Digital Earth Africa Map
Exploring data availability with the Digital Earth Africa Explorer
Loading data in the Sandbox
Generating RGB images