Theoretical introduction¶

What is Machine Learning?¶

Machine learning (ML) is the study of computer algorithms that improve automatically through experience.
It is seen as a subset of artificial intelligence.
Machine learning algorithms build a mathematical model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to do so.
Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.

Source: Wikipedia

What is Machine Learning not?¶

Magic

How is it applied in Remote Sensing?¶

Machine learning is widely used in remote sensing applications. Some examples are…

Crop type classification¶

E.g. Moumni et al. 2021

Image source: ESA SEN2AGRI

Deforestation monitoring¶

E.g. Ortega et al. 2019

Image source: Ecotextile

Cloud classification¶

E.g. Drönner at al. 2019

Image source: EUMETSAT

Tree type classification¶

E.g. Egli et al. 2020

Image source: Egli et al. 2020

Fog detection¶

E.g. Egli et al. 2018

Image source: foto-webcam.eu

Basic concept¶

Within the scikit-learn library, a vast number of different ML approaches can be used, if the data that we want to work with can be reduced to or translated into the following structure:

	x_1	x_2	x_3	…	x_n	y
s_1	-	-	-	-	-	-
s_2	-	-	-	-	-	-
s_3	-	-	-	-	-	-
…	-	-	-	-	-	-
s_m	-	-	-	-	-	-

where

x_i are the n independent features,
y is the dependent feature or target,
and s_i are the m samples of our data set.

To give an example: Let’s assume we want to derive peoples favorite colours from different personal characteristics:

We could represent this in the structure via:

	Body size (cm)	Piercing count (#)	Gender	Hair colour	Nose size (cm)	Favorite colour
Melanie	192	1	w	brown	7.4	blue
Robert	163	0	m	blond	4.1	red
Pete	176	2	m	blond	6.4	blue
Harry	166	0	m	brown	4.9	green
Sally	191	0	w	blond	5.8	orange
…	…	…	…	…	…	…

A machine learning model is built on this data by providing it with a subset of the samples (training data). It then “automatically” learns the connection between the independent features and the target feature. As indicated in the example above, certain (or all) features may make more or less sense when predicting the target. Therefore, a feature selection is often made in the training process and the model hyper parameters are tuned.

After the training process, the model can be used to predict target feature values for yet unseen samples (test data). In the validation process, these predictions can be compared to the original (“measured”) target feature values to estimate the models performance.

In remotely sensed optical imagery, often pixels are treated as samples and the values of the bands are interpreted as features. A typical remote sensing example could be:

	Red band	Green band	Blue band	Near infrared band	Far infrared band	Crop yield (kg/m²)
Pixel 1	192	134	78	45	50	1.5
Pixel 2	123	22	213	87	210	2.3
Pixel 3	89	75	12	155	232	1.1
Pixel 4	35	158	86	44	83	1.0
Pixel 5	112	234	11	14	15	0.9
…	…	…	…	…	…	…

where each pixel in a satellite image is treated as one sample with 5 features (= bands with different wavelengths in the electromagnetic spectrum). The target feature is the crop yield of a specific crop type that was measured at the pixels location.

These approaches are called “pixel based” as the model can only learn the connection between independent and target features on the characteristics of a single pixel one at a time. Contrary, more sophisticated ML approaches can also make use of the relationship between different pixels (eg. convolutional neural networks). We will come back to this later…

Aim of the course¶

At the end of this course you will be able to…

create an ML concept suitable to a given remote sensing problem
select an appropriate ML model for a given problem
conduct a feature selection and tune model hyper parameters
train a model
use a model to predict target feature values for whole areas (eg. a satellite image)
validate the model performance (with focus on the particularities of geographic data)
know about the limitations of ML applications

Organization Examples of ML models

Winter Semester 21/22