African Digital News Corpus (adnC)
Building the first multilingual monitor corpus of digital news published online in Africa
The African Digital News Corpus (adnC) is a long-running effort to build a systematic, multilingual collection of online news published across the African continent. Although more and more Africans get their news from digital sources, this content remains difficult to study at scale: much of it is ephemeral, search engines rarely surface African outlets, and commercial full-text databases privilege content from the Global North. The corpus addresses this gap with a custom-built crawler and scraper that collect headlines, full text, and metadata twice daily from hundreds of news websites.
The project pairs this infrastructure with a custom R package that lets researchers query the underlying database and retrieve pre-processed, text-mineable data without raising copyright concerns. It supports comparative research on questions such as sourcing practices, regional imbalances in coverage, and postcolonial legacies in reporting, and is intended to be especially useful to scholars at African universities who often lack access to expensive commercial databases.
Research outputs
2023Madrid-Morales, D., Rodríguez-Amat, J. R., & Lindner, P. A Computational Mapping of Online News Deserts on African News Websites.
Media and Communication,
11(3).
2021Madrid-Morales, D. Who Set the Narrative? Assessing the Influence of Chinese Global Media on the News Content of 30 African Countries.
Global Media and China,
6(2), 129–151.
2021Madrid-Morales, D., Lindner, P., & Periyasamy, M. Corpus of African Digital News from 600 Websites, Formatted for Text Mining / Computational Text Analysis.
Harvard Dataverse.
2020Madrid-Morales, D. Using Computational Text Analysis Tools to Study African Online News Content.
African Journalism Studies,
41(4), 68–82.
— the methodological paper describing how the corpus is built.
Dissemination, Knowledge Exchange & Impact
2024Conference paper: "When the Devil is in the Sample: Benchmarking Data Sources to Analyse African News Content Computationally," ICA Pre-Conference "A Computational Turn in Journalism," National University of Singapore (Singapore).
2023Invited talk: "A roadmap for building a corpus of African digital news content," Bournemouth University.
2023Teaching: "Analysing African News Websites with R," DigiMethods Winter School, Rhodes University (South Africa).
2021Invited talk: "Old Questions, Big Data: Using the African Digital News (ADN) Corpus to Study News Flows," Rhodes University (South Africa).
2021Invited talk: "adn: African Digital News — Building a Multilingual Monitor Corpus," DH@UH: Building Connections, University of Houston.
2021Invited talk: "African digital media reporting on COVID-19 and China," Yale University.
Funding
University of Houston — Digital Research Commons, Development Grant (2020–21)
University of Houston — Digital Research Commons, Seed Grant (2020)
Collaborators
Dr. Peggy Lindner — University of Houston (technical lead & co-author)
Madhumitha Periyasamy — University of Houston (research assistant & co-author)
← Back to Research