EUMETSAT-AI summer workshop
Last week I attended the EUMETNET-AI workshop on AI applied to weather, climate and environmental sciences. The workshop was held at the German Weather Service (DWD) in Offenbach.
EUMETNET-AI is EUMETNET's Programme on Artificial Intelligence and Machine Learning for Weather, Climate and Environmental Applications. If you are interested in using AI in weather and climate, there is a log of good resource you can find in this repository: https://github.com/eumetnet-e-ai
A good tutorial can be found here. The first day of the week was dedicated to this tutorial, which was a 5 days introductory course given at DWD.
Workshop highlights
Using AI chat bots for automatic forecast reporting
Several weather services are experimenting with setting up an AI assistants in production
Create a forecast report (UKMO) using amazon solution to create an automated product. They use a text to speech with Llama3 and different voices available from amazon Polly.
DWD created an AI assistant called DAWID that have a wide range of functionalities, like coding agent, creating weather report. It is not open source yet, but an example was presented that included a coding agent as well.
An interesting talk by MeteoFrance presented a mini LLM that can generate short reports for french polynesia: very marine weather reports. They created a custom Fuyu like model. This can be found in their list of publications in their git repository.
A representative from the WMO talked about an early warning system being developed with support from UN and how they use LLMs to process complex pdf documents. More details about the early warning project here.
Data engineering topics
Miruna Stoicescu, from the Destination Earth, introduced the Destination Earth Data Lake (DEDL) is the name of the data lake and the kind of services it aims to provide a distributed, seamless platform for accessing and processing Earth observation and digital twin data, supporting AI/ML applications. AI/ML applications (demonstrators). The platform is accessible here. Not sure how mature this is, but it requires a registration to start with, and probably several layers of permissions to use certain data. One cool feature is the possibility of using jupyter notebooks (via jupyterhub) or use a virtual machine to access and process the data. More information on their data porfolio also here
On the more technical side, there were really Arianna Valmassoi from DWD introduced the concept of Climate Data Records (CDR) and Near Real Time (NRT) data. CDRs are long-term, consistent, and homogeneous time series processed with the latest algorithms, including data from satellites predating NRT availability, aimed at studying climate variability and change. In contrast, NRT data prioritize low latency delivery (under 3 hours) and use provisional auxiliary data, with evolving algorithms and corrections applied only forward in time. Her presentation highlighted challenges such as differences between CDR and NRT data used in training versus inference, and the blending of CDR and NRT in reanalyses like ERA5 and future ERA6 (which will be starting production next year). CDR data can be found in this EUMETSAT server.
Nowcasting topics
There was an introduction to the MLCast community, a mainly european (but welcoming to everyone who wants to join!) initiative to unify and advance AI nowcasting by integrating radar, satellite, and NWP data with shared tools and models. It aims to build an open-source Python package featuring preprocessing tools, unified ML model interfaces, benchmark datasets, pre-trained models, and verification tools, fostering collaboration across GPU-rich and limited institutes. The project targets a v1 release by the end of 2025, with shared data infrastructure, collaborative training on GPU resources, and community ownership ensuring equal credit and visibility. Key open questions include dataset focus, model implementation, onboarding simplification, and alignment with existing tools like pySTEPS and py4cast.
Another interesting presentation to mention was one by Kelly Stanley from DWD, who described the creation of a high-resolution wind gust climatology for Germany using a Distributional Regression Network (DRN), a neural network-based approach. The DRN model predicts the distribution of maximum hourly wind speeds at ~10 m height with about 1 km spatial resolution, covering 1995 to 2024. It outperforms traditional reanalysis data (COSMO-REA6) by capturing complex, non-linear relationships and providing probabilistic forecasts with improved accuracy (reducing MAE and RMSE by ~50%). The study includes case analyses like Storm Friederike and integrates diverse meteorological, topographic, and radar data. Future work involves extending the dataset and publishing results. A report (in German) can be found here.
There was a ton more interesting stuff presented, but my mind tends to get full after 3 days of workshops, and this post will drang on forever if I want to include every detail. Hopefully I provided you with some new information on recent European initiatives in the field of ML and AI applied to weather and climate and picked up your interest to dig more in the links provided!