Predicting food crises using news streams

Anticipating food crisis outbreaks is crucial to efficiently allocate emergency relief and reduce human suffering. However, existing predictive models rely on risk measures that are often delayed, outdated, or incomplete. Using the text of 11.2 million news articles focused on food-insecure countries and published between 1980 and 2020, we leverage recent advances in deep learning to extract high-frequency precursors to food crises that are both interpretable and validated by traditional risk indicators.

In this public repository we outline our methodolgy for predicting food crises using news streams to ease replication. Find out which features we have assembled and how to set up the methodology for your personal use cases. You can also report bugs here.

Methods

This repository includes multiple individual features, which we employed for our research, that can be utilized in isolation.

Step 1 - Seeds Selection

Starting with the three manually selected seed phrases "food insecurity," "hunger crisis," and "famine," additional potential seed phrases were sought from all unigrams, bigrams, and trigrams in the dataset of 11.2 million articles.

Step 2 - Frame-Semantic Parsing

Causal extraction refers to the natural language processing task of extracting cause-effect relations from text, in our case from news sentences. We use a frame-semantic parser to extract semantic causes of food insecurity. Our scrutinized news dataset stems from the Factiva API, wherefore we are not able to share it publicly. We have included a sample sentences.txt file that should allow to reproduce the method on a news text of your choice.

Step 3 - Keyword Expansion

While the frame-semantic parser allows us to extract text features related to food insecurity, it fails to capture words semantically close to a seed that are also relevant. Therefore, we expand the set of text features with semantically similar key phrases.

Step 4 - Validating News Features

After uncovering the text features semantically related to food insecurity in Steps 1 and 2, we cross-reference the extracted features with time-stamped news corpora. Then, we discard non-predictive indicators of the resulting time series using a Granger causality test.

Step 5 - Regression Modelling

The processed time series are added as inputs to our regression model for food insecurity. We employ a Random Forest (RF) regression that is fed with the news features in addition to traditional food insecurity features. The model can be fed with use case dependent external features.

Step 6 - Visualizations

The findings of the food insecurity project are visualized through maps and scatter plots. We share code to replicate the visualizations of identifying predictive/non-predictive features and episodes.

Data

We utilized two types of data: food insecurity classifications and news articles.

Our dataset on food insecurity comes from the FEWS NET for district level information of 37 countries. Food insecurity is classified into five phases following the IPC framework: (i) minimal, (ii) stressed, (iii) crisis, (iv) emergency, and (v) famine. More information on the IPC classification is available here.

Our dataset of news articles comes from Factiva, a digital archive of global news content. Factiva aggregates more than 33,000 news resources from 200 countries in 28 languages. Each news article is tagged with geographic region codes to ascertain its relevance to a specific country. We collect the text of the 11.2 million articles in English obtained from 5421 news sources.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find this project useful in your research, please cite the following paper.

@article{balashankar2023predicting,
  author = {Ananth Balashankar  and Lakshminarayanan Subramanian  and Samuel P. Fraiberger },
  title = {Predicting food crises using news streams},
  journal = {Science Advances},
  volume = {9},
  number = {9},
  pages = {eabm3449},
  year = {2023},
  doi = {10.1126/sciadv.abm3449},
  URL = {https://www.science.org/doi/abs/10.1126/sciadv.abm3449},
}

Authors & Contact

In case of any questions or remarks, please feel free to reach out to us via e-mail.

Samuel Paul Fraiberger - spfraib - Affiliation: The World Bank
Ananth Balashankar - ananthbalashankar - Affiliation: Google
Lakshmi Subramanian - subramal - Affiliation: New York University

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting food crises using news streams

Methods

Step 1 - Seeds Selection

Step 2 - Frame-Semantic Parsing

Step 3 - Keyword Expansion

Step 4 - Validating News Features

Step 5 - Regression Modelling

Step 6 - Visualizations

Data

License

Citation

Authors & Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
Step 1 - Seeds Selection		Step 1 - Seeds Selection
Step 2 - Frame-Semantic Parsing		Step 2 - Frame-Semantic Parsing
Step 3 - Keyword Expansion		Step 3 - Keyword Expansion
Step 4 - Validating News Features		Step 4 - Validating News Features
Step 5 - Regression Modelling		Step 5 - Regression Modelling
LICENSE		LICENSE
README.md		README.md

License

philippzi98/food_insecurity_predictions_nlp

Folders and files

Latest commit

History

Repository files navigation

Predicting food crises using news streams

Methods

Step 1 - Seeds Selection

Step 2 - Frame-Semantic Parsing

Step 3 - Keyword Expansion

Step 4 - Validating News Features

Step 5 - Regression Modelling

Step 6 - Visualizations

Data

License

Citation

Authors & Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages