Contents
import warnings
warnings.filterwarnings('ignore')

Model generation (regressor)

After having classified pixels into cloud free and cloud covered, we need a regressor to predict cloud altitudes for cloud covered pixels. Regressor models that can be used for these tasks can be found in the scikit-learn documentation.

So let’s first read our merged data table again…

As we only want to predict cloud altitudes for cloudy pixels, we first clip our table to samples with cloudcover > 1:

data = data[data.cloudcover>1]

Again, we will validate the performance of the chosen model architecture by using a grouped KFold cross validation approach. Let’s first try a simple linear model. As we are dealing with a regression problem here, we have to change the score to a suitable metric. We choose the R² score here:

from sklearn.model_selection import cross_validate
from sklearn.model_selection import GroupKFold
from sklearn.linear_model import SGDRegressor

predictors = ["IR_016","IR_039","IR_087","IR_097","IR_108","IR_120","IR_134","VIS006","VIS008","WV_062","WV_073","dem"]
target     = "cloud_altitude"

result = cross_validate(SGDRegressor(), data[predictors], y=data[target], groups=data.icao, scoring="r2", cv=GroupKFold(), n_jobs=-1)

# Print average accuracy score
result["test_score"].mean()
-1.7919393465108474e+24

Well, that didn’t really work. Let’s try a RandomForestRegressor:

from sklearn.ensemble import RandomForestRegressor

predictors = ["IR_016","IR_039","IR_087","IR_097","IR_108","IR_120","IR_134","VIS006","VIS008","WV_062","WV_073","dem"]
target     = "cloud_altitude"

result = cross_validate(RandomForestRegressor(), data[predictors], y=data[target], groups=data.icao, scoring="r2", cv=GroupKFold(), n_jobs=-1)

# Print average accuracy score
result["test_score"].mean()
0.5075651931016422

Ok, that looks better. We get ~50% explained variance with a RandomForestRegressor. So let’s train and save a model:

import joblib
model = RandomForestRegressor(n_estimators=100,n_jobs=-1)
model = model.fit(data[predictors], data[target])
joblib.dump(model,"regressor.model")
['regressor.model']

Task

Design and train an ML model suitable to predict cloud altitudes based on the provided MSG data. Don’t forget to enhance your model via hyperparameter tuning and feature selection.