Contexte and problematic
Axa Climate is launching a parametric insurance offer which includes drought insurance dedicated to European farmers. In order to provide this insurance, Axa needs to predict field rates of return primarily from soil moisture levels. We enrich this data with multiple Open Data sources: Raster, shapefiles, era5, copernicus, etc..
Goals
In the area of Germany :
Enrich internal data with Open Data
Préedict the yield throught the soil moisture rate by region and by crop (regression problem)
Dectection of drought years by region (classification problem)
Our intervention
2 Data Scientist, 1 Lead Data Scientist
- Data collection from internal sources (Axa business partners)
- Open Data collection: shapefiles, era5 copernicus, etc.
- Preparation and transformation of multi-source aggregated data
- Dataset rebalancing
- Modeling and benchmarking of neural networks, LSTM, Random Forest, Gradient Boosting, LGBM, etc.
- Selection and deployment of the model with the best performance: Random Forest
- Interoperability of the model through heatmaps on geographic maps
Résults
Enrichment of data by variables such as soles (raster), delimitation of regions (shapefile) and meteorological variables (era5)
Regression problem: RMSE ~ 30 for values of around 400 (tons /hectare) (tonnes/hectare)
Classification problem: accuracy = 0.88 / precision = 0.74 / recall = 0.61
Technical environment
Python – geoPandas – RasterIO – Shapely Jupyter – Pyzo
Sk-learn – Tensorflow – Keras
API Rest
GCP – AWS