{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction to xarray"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Xarray makes datasets easier to interpret when performing data analysis. To demonstrate this, we can start by opening a NetCDF file ( file suffix `.nc`) of the area of Guinea-Bissau we used in the previous [matplotlib lesson](./03_matplotlib.ipynb). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Recap:** In that lesson, we plotted the `.JPG` version of the image using `imread` and `imshow`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Review of Python basics lesson 3: matplotlib\n",
    "\n",
    "%matplotlib inline \n",
    "\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "im = np.copy(plt.imread('Guinea_Bissau.JPG'))\n",
    "plt.imshow(im)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using xarray, we will now read in a `.nc` file of the area. `.nc` NetCDF files are a filetype used for storing scientific array-oriented data. Remembering that image data is an array, this makes both formats almost equivalent.\n",
    "\n",
    "We could also load the `.JPG` as a dataset using xarray, but the NetCDF file contains more metadata so it will be easier to interpret. \n",
    "\n",
    "As usual, we will first import the xarray package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the xarray package as xr\n",
    "\n",
    "import xarray as xr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use xarray to open the data file\n",
    "\n",
    "guinea_bissau = xr.open_dataset('guinea_bissau.nc')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inspect the xarray.DataSet by \n",
    "# typing its name\n",
    "\n",
    "guinea_bissau"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's inspect the data. This image (our dataset) has a height `y` of 501 pixels and a width `x` of 500 pixels. It has 3 `variables`, which correspond to the red, green, and blue bands needed to plot a colour image. Each of these bands has a value at each of the pixels, so we have a total of 751500 values (the result of 501 x 500 x 3). Each of the `x` and `y` values are associated with a longitude and latitude and their values are stored under the `Coordinates` section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `xarray.DataSet` uses numpy arrays in the backend. What we see above is functionally similar to a bunch of numpy arrays holding the image band measurements and the longitude/latitude coordinates, combined with a dictionary of key-value pairs that defines the `Attributes` such as CRS and resolution (`res`). That sounds more complicated, doesn't it? Accessing the data through `xarray` avoids the complications of managing uncorrelated numpy arrays by themselves."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../_static/python_basics/dataset-diagram.png\" alt=\"Xarray dataset breakdown\" width=\"600\" align=\"left\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*A visualisation of an* `xarray.Dataset`*. The coordinates are each a single dimension;* `latitude`*,* `longitude` *and* `time`*. Each* `variable` *(in this case* `temperature` *and* `precipitation`*) holds one value at each of the three coordinate dimensions.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An `xarray.Dataset` can be seen as a dictionary structurefor packing up the data, dimensions and attributes all linked together.\n",
    "\n",
    "As in the NetCDF we loaded above, Digital Earth Africa follows the convention of storing spectral bands as separate variables, with each one as 3-dimensional cubes containing the temporal dimension."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To access a single variable we can use the same format as if it were a Python dictionary.\n",
    "\n",
    "```python\n",
    "var = dataset_name['variable_name']\n",
    "```\n",
    "\n",
    "\n",
    "Alternatively, we can use the `.` notation.\n",
    "\n",
    "```python\n",
    "var = dataset_name.variable_name\n",
    "```\n",
    "\n",
    "A single variable pulled from an `xarray.Dataset` is known as an `xarray.DataArray`. In the visualisation image above, the example `xarray.Dataset` is formed out of two `xarray.DataArray`s &mdash; one each for `temperature` and `precipitation`.\n",
    "\n",
    "> **Note:** Variable names are often strings (enclosed in quotation marks). This is true of most Digital Earth Africa datasets, where variables such as satellite bands have names such as `'red'`, `'green'`, `'nir'`. Do not forget the quotation marks in first call method above, or the data will not load."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataset we loaded only has 1 timestep. We will now pull just the `'red'` variable data from our `guinea_bissau` `xarray.Dataset`, and then plot it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guinea_bissau['red']\n",
    "\n",
    "# This is equivalent to\n",
    "# guinea_bissau.red\n",
    "# Try it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot using imshow\n",
    "\n",
    "plt.imshow(guinea_bissau['red'], cmap='Reds')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note:** `cmap` stands for 'colour map'. What happens if you remove this argument? `plt.imshow` will then plot all the `red` data values in the default colour scheme, which is a purple-to-yellow sequential colour map known as `viridis`. This is because matplotlib does not know the variable name `red` is associated with the colour red &mdash; it is just a name. You can find out more about matplotlib colourmaps through this [matplotlib tutorial](https://matplotlib.org/tutorials/colors/colormaps.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We stated above that _xarray uses numpy arrays in the backend_. You can access these numpy arrays by adding `.values` after a `DataArray` name. In the example below, `guinea_bissau.red` is the `DataArray` name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guinea_bissau.red.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is possible to index `xarray.DataArrays` by position using the numpy `[:,:]` syntax we introduced in previous lessons, without converting to numpy arrays. However, we can also use the xarray syntaxes, which explictly label coordinates instead of relying on knowing the order of dimensions.\n",
    "\n",
    "Each method is demonstrated below.\n",
    "\n",
    "* `[:,:]`: numpy syntax &mdash; requires knowing order of dimensions and positional indexes\n",
    "* `isel(coordinate_name = coordinate_index)`: index selection &mdash; xarray syntax for selecting data based on its positional index (similar to numpy)\n",
    "* `sel(coordinate_name = coordinate_value)`: value selection &mdash; xarray syntax for selecting data based on its value in that dimension\n",
    "\n",
    "For example, we can call out the value of `green` in the top leftmost pixel of the image dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using numpy syntax\n",
    "\n",
    "guinea_bissau.green[0,0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case our positional indexes for `x` and `y` are the same, but it is important to note `y` is the first dimension, and `x` is the second. This is not immediately obvious and can cause confusion. For this reason, it is recommended to use one of the xarray syntaxes shown below. They explicitly call on the dimension names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using index selection, isel()\n",
    "\n",
    "guinea_bissau.green.isel(y=0, x=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, the measurement for `green` in the top leftmost pixel is `730`. In this example, the _values_ of `x` and `y` are latitude and longitude values, but their _positional indexes_ are `0`. \n",
    "\n",
    "We can call out the value by using the index, as shown below.\n",
    "\n",
    "> **Note:** The CRS we are using has units of metres. This means the `x` and `y` values are measured in metres."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The value of x at the pixel located at index (0,0)\n",
    "\n",
    "guinea_bissau.green.x[0].values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The value of y at the pixel located at index (0,0)\n",
    "\n",
    "guinea_bissau.green.y[0].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The intepretation of the cell above is: \n",
    "\n",
    ">Output the **value** of **y** in the **1st position** of the **green** variable from the dataset called **guineau_bissau**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see the first element (index of 0) in `x` has a value of 388020 metres, and the first value of `y` is 1338000 metres."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How about the second method, `sel()`? This method was not available in the numpy arrays we used in previous lessons, and is one of the strengths of xarray as you can use the value of the dimension without knowing its positional index.\n",
    "\n",
    "If we are given `(y, x) = (1338000, 388020)`, we can find the `green` measurement at that point using `sel()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guinea_bissau.green.sel(y=1338000, x=388020)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can select ranges of `DataArrays` with the xarray syntaxes using `slice`. The three cells below all extract the same extent from the `guinea_bissau.green` `DataArray`.\n",
    "\n",
    "The xarray syntaxes for ranges using `slice` are:\n",
    "\n",
    "```python\n",
    "dataarray_name.isel(dimension_name = slice(index_start, index_end))\n",
    "\n",
    "dataarray_name.sel(dimension_name = slice(value_start, value_end))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, it is easier to call upon the dimensions by name rather than using the implicit numpy square bracket syntax, although it still works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Numpy syntax - dimensions are not named\n",
    "# Valid but not recommended\n",
    "\n",
    "guinea_bissau.green[:250, :250]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Index selection slice\n",
    "\n",
    "guinea_bissau.green.isel(x=slice(0,250), y=slice(0,250))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Value selection slice\n",
    "\n",
    "guinea_bissau.green.sel(x=slice(388020,395500), y=slice(1338000,1330530))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can plot the selected area using matplotlib `plt.imshow`. We can also use the xarray `.plot()` function. Plotting is a good way to check the selection is showing the top left section of the image as expected.\n",
    "\n",
    "The syntax of `plt.imshow()` is:\n",
    "\n",
    "```python\n",
    "plt.imshow(data)\n",
    "```\n",
    "\n",
    "Therefore we copy the code into the brackets of `plt.imshow()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(guinea_bissau.green.sel(x=slice(388020,395500), y=slice(1338000,1330530)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">**Note:** We did not specify a colourmap `cmap` so you can see it takes on the default colour scheme. Plotting true-colour (RGB) images is first introduced in [Session 2: Loading data in the Sandbox](https://training.digitalearthafrica.org/en/latest/session_2/04_load_data_exercise.html). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The syntax of the xarray plot function is:\n",
    "\n",
    "```python\n",
    "data.plot()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore we copy the code before we type `.plot()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "guinea_bissau.green.sel(x=slice(388020,395500), y=slice(1338000,1330530)).plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The advantage of using xarray's `.plot()` is that it automatically shows the `x` and `y` axes using coordinate values, not their positional index. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.1 Can you access to the `crs` value in the attributes of the `guinea_bissau` `xarray.Dataset`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Hint:** You can call upon `attributes` in the same way you would select a `variable` or `coordinate`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Replace the ? with the attribute name\n",
    "\n",
    "guinea_bissau.?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.2 Select the region of the `blue` variable delimited by these coordinates:\n",
    "* latitude of range [1335000, 1329030]\n",
    "* longitude of range [389520, 395490]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Hint:** Do we want to use `sel()` or `isel()`? Which coordinate is `x` and which is `y`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.3 Plot the selected region using `imshow`, then plot the region using `.plot()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot using plt.imshow\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot using .plot()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Can you change the colour map to `'Blues'`?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
