{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Model generation (classifier)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "After having merged the satellite data with the station measurements, we can now proceed to the machine learning part. First, we wanted to train a classifier (1: cloud free, 2: cloud contaminated or cloud covered). Classifier models that can be used for these tasks can be found in the [scikit-learn documentation](https://scikit-learn.org/stable/index.html)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So let's first read our merged data table:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import pandas as pd\n", "data = pd.read_csv(\"data/stations/metar_station_measurements_with_MSG.csv\",parse_dates=[\"time\"]).fillna(0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As we only want to classify into two classes (cloudy yes/no), we have to adapt the data set a little bit:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | icao | \n", "time | \n", "cloudcover | \n", "cloud_altitude | \n", "x | \n", "y | \n", "IR_016 | \n", "IR_039 | \n", "IR_087 | \n", "IR_097 | \n", "IR_108 | \n", "IR_120 | \n", "IR_134 | \n", "VIS006 | \n", "VIS008 | \n", "WV_062 | \n", "WV_073 | \n", "cmask | \n", "dem | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "EBAW | \n", "2005-01-15 00:00:00 | \n", "True | \n", "2400 | \n", "2.920539e+05 | \n", "4.611705e+06 | \n", "0.0 | \n", "263.69100 | \n", "266.75824 | \n", "241.82200 | \n", "270.32462 | \n", "270.10632 | \n", "254.10730 | \n", "0.0 | \n", "0.0 | \n", "229.80460 | \n", "253.90987 | \n", "3.0 | \n", "5.0 | \n", "
1 | \n", "EBBR | \n", "2005-01-15 00:00:00 | \n", "True | \n", "3600 | \n", "2.956507e+05 | \n", "4.596070e+06 | \n", "0.0 | \n", "263.69100 | \n", "267.39102 | \n", "242.04547 | \n", "270.97700 | \n", "271.09344 | \n", "255.38649 | \n", "0.0 | \n", "0.0 | \n", "230.68666 | \n", "254.62540 | \n", "3.0 | \n", "37.0 | \n", "
2 | \n", "EBCI | \n", "2005-01-15 00:00:00 | \n", "True | \n", "2200 | \n", "2.967015e+05 | \n", "4.571825e+06 | \n", "0.0 | \n", "262.45435 | \n", "266.44186 | \n", "241.59853 | \n", "269.99844 | \n", "270.43536 | \n", "255.38649 | \n", "0.0 | \n", "0.0 | \n", "231.74515 | \n", "255.34093 | \n", "3.0 | \n", "150.0 | \n", "
3 | \n", "EBLG | \n", "2005-01-15 00:00:00 | \n", "False | \n", "-999 | \n", "3.608589e+05 | \n", "4.580852e+06 | \n", "0.0 | \n", "266.16430 | \n", "268.02380 | \n", "242.04547 | \n", "269.67227 | \n", "269.77728 | \n", "254.61897 | \n", "0.0 | \n", "0.0 | \n", "230.33385 | \n", "253.43285 | \n", "1.0 | \n", "146.0 | \n", "
4 | \n", "EBOS | \n", "2005-01-15 00:00:00 | \n", "True | \n", "2600 | \n", "1.875154e+05 | \n", "4.613160e+06 | \n", "0.0 | \n", "263.69100 | \n", "268.02380 | \n", "242.04547 | \n", "271.30316 | \n", "271.42250 | \n", "254.87482 | \n", "0.0 | \n", "0.0 | \n", "230.86308 | \n", "253.43285 | \n", "3.0 | \n", "0.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
52007 | \n", "LZKZ | \n", "2006-12-15 21:00:00 | \n", "True | \n", "3300 | \n", "1.430620e+06 | \n", "4.434616e+06 | \n", "0.0 | \n", "263.76193 | \n", "264.90475 | \n", "242.95023 | \n", "268.16614 | \n", "268.26227 | \n", "254.06820 | \n", "0.0 | \n", "0.0 | \n", "231.50674 | \n", "251.63990 | \n", "3.0 | \n", "242.0 | \n", "
52008 | \n", "LZPP | \n", "2006-12-15 21:00:00 | \n", "True | \n", "3500 | \n", "1.212834e+06 | \n", "4.443047e+06 | \n", "0.0 | \n", "262.61996 | \n", "263.68170 | \n", "241.99338 | \n", "266.59192 | \n", "267.01120 | \n", "252.88803 | \n", "0.0 | \n", "0.0 | \n", "233.18146 | \n", "252.30473 | \n", "3.0 | \n", "170.0 | \n", "
52009 | \n", "LZSL | \n", "2006-12-15 21:00:00 | \n", "True | \n", "3300 | \n", "1.296992e+06 | \n", "4.439913e+06 | \n", "0.0 | \n", "262.61996 | \n", "264.29324 | \n", "242.71101 | \n", "267.53647 | \n", "267.94950 | \n", "253.12407 | \n", "0.0 | \n", "0.0 | \n", "232.67905 | \n", "250.08860 | \n", "3.0 | \n", "331.0 | \n", "
52010 | \n", "LZTT | \n", "2006-12-15 21:00:00 | \n", "False | \n", "-999 | \n", "1.354894e+06 | \n", "4.461921e+06 | \n", "0.0 | \n", "262.61996 | \n", "264.29324 | \n", "242.23259 | \n", "267.22162 | \n", "267.32397 | \n", "252.65201 | \n", "0.0 | \n", "0.0 | \n", "230.83685 | \n", "249.42377 | \n", "1.0 | \n", "779.0 | \n", "
52011 | \n", "LZZI | \n", "2006-12-15 21:00:00 | \n", "True | \n", "2400 | \n", "1.246986e+06 | \n", "4.476102e+06 | \n", "0.0 | \n", "262.61996 | \n", "264.29324 | \n", "242.23259 | \n", "267.53647 | \n", "267.94950 | \n", "253.12407 | \n", "0.0 | \n", "0.0 | \n", "232.00916 | \n", "250.75343 | \n", "3.0 | \n", "348.0 | \n", "
52012 rows × 19 columns
\n", "