This study introduces a machine learning-driven approach to predicting drought severity across the United States, leveraging the Scikit-Learn library with a particular focus on the Support Vector Machine (SVM) model. SVM was selected after cross-validation testing, where it outperformed other machine learning models. Data for this analysis was acquired through Google Earth Engine, drawing from five distinct datasets, which were then processed and transformed to mitigate skewness and ensure optimal predictive performance. The climate data aggregated monthly and regionally (355 climate regions in total), spans from January 2010 to December 2023, with additional climate projections extending from January 2024 to December 2050.
Key predictive features included minimum and maximum temperature, precipitation, minimum and maximum humidity, and wind speed. To enhance context, 3-, 6-, and 12-month average values for these variables were calculated across each geographic region. Data from 2011 to 2021 was designated for training, while 2022 and 2023 were reserved for model testing, ensuring a robust validation process.
Model performance was primarily assessed using Root Mean Square Error (RMSE), with SVM achieving the lowest RMSE of approximately 1.82 (Note that the range of the target values is 17.1.). Testing across the entire test dataset yielded an RMSE of around 1.13, which decreased to approximately 1.06 following hyperparameter optimization. the SVM model reached an RMSE as low as 0.71 on the test dataset, underscoring its strong fit to recent data.
Through feature importance analysis, adjustments to feature selection were made, improving the model’s performance. These included the removal of less informative features and the addition of 18- and 24-month average climate data, which together reduced RMSE to 0.97. Tests with Principal Component Analysis (PCA) were conducted to assess potential benefits of dimensionality reduction, though PCA was not found to enhance model accuracy.
A final visual assessment was performed by mapping actual and predicted Palmer Drought Severity Index (PDSI) values across the United States for 2022 and 2023 (Figures 1 to 4). Additionally, the predicted PDSI values from 2024 to 2050 were animated (Figure 5) to enhance engagement and illustrate future drought trends dynamically.
These results underscore the potential of machine learning in environmental monitoring, offering an accessible and scalable tool for drought forecasting. This approach has practical implications for proactive water resource management and policy-making, aiding decision makers in making informed decisions to mitigate future drought impacts.
- Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019 (LINK NEEDED)
- https://cartographyvectors.com/map/1290-us-climate-regions
- https://developers.google.com/earth-engine/tutorials/community/time-series-visualization-with-altair
- https://www.kaggle.com/code/akshayasrinivasan2/drought-prediction-using-ml-algorithms
- Regression analysis with Support Vector Machines (LINK NEEDED)
- Visualization on US map (LINK NEEDED)
- Source code available under the GitHub repository (https://github.com/mertalpaydin/USA-Drought-Projection-with-Scikit-Learn)
- https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_TERRACLIMATE#bands
- https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_GRIDMET
- https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MOD13A2#bands
- https://developers.google.com/earth-engine/datasets/catalog/GRIDMET_DROUGHT#bands
- https://developers.google.com/earth-engine/datasets/catalog/NASA_GDDP-CMIP6#bands
Extracted Data is also available under the GitHub repository