How does ACLED code and review data to ensure quality?

Published On: 1 March 2023 | Last Updated: 1 March 2023

ACLED’s aim is to capture the forms, actors, dates, and locations of political violence and demonstrations as they occur within states. This brief note contains information about how the ACLED team collects, cleans, reviews, and checks event data. ACLED’s coding and review process ensures the dataset is accurate, consistent, comprehensive, and regularly updated. It also ensures that data are directly comparable across time periods, event types, locations, countries, or actors. Data are posted weekly as they complete the coding and review process, although ongoing checks ensure the thoroughness of previously collected events.

ACLED takes various steps to ensure that the data we publish are accurate, thorough, and accessible. The vast majority of ACLED’s resources are spent on three tasks:

Sourcing and reviewing source materials
Collecting and inputting data
Cleaning and reviewing those data and sources

This process is repeated weekly for all regions ACLED covers. The sourcing and coding methodology, including local context decisions, are available in articles on the ACLED Knowledge Base.

Coding and sourcing process

ACLED data are coded by a range of experienced researchers with knowledge of local contexts and languages who collect information mainly from secondary sources by applying the guidelines outlined in the Codebook and supplemental documentation to extract relevant information.

ACLED data are collected each week after individual researchers have examined the information from structured and regularly reviewed lists of secondary sources. A sourcing platform ensures the same sources are checked each week in a consistent manner.¹

Every event is coded using the same rules on ‘who, what, where, and when’ to maximize accuracy and consistency. Additional information is also provided in each row of data, including: event ID numbers, precision scores for location and time, codes to distinguish between the types of actors, a brief summary of the event, fatality numbers if reported, and additional information for deeper analysis.

Throughout the weekly data collection and coding process, Researchers pose questions to team members and their Research Managers to clarify difficult coding decisions or flag potential data collection issues. Researchers use a coding platform to ensure that the coding of actor names, interactions, locations, etc., are consistent with previous iterations of each group and location.

Data review and cleaning process

Following weekly data collection, the data undergo three rounds of review:

First, Researchers review their data to ensure intra-coder reliability.
- Decisions on specific matters – such as a new active group – are flagged for further review.
- After the review, Researchers submit their data and source materials to their Research Manager.
Next, Research Managers review these data for inter-coder reliability across the region.
- Research Managers cross-check the data for general accuracy and consistency, ensuring that events meet the criteria for inclusion and that coding is in line with the methodology and previous local context applications.
Finally, the data are passed to a final reviewer who reviews the data to ensure that the inter-coder standards are met, and that the methodology is applied consistently across different regions and contexts.

Once the manual review of these weekly datasets is complete, the data are sent to the Data Management team for automated data cleaning and formatting checks. These data are then uploaded for public use.

Following this process, researchers receive detailed feedback to improve existing data and future coding decisions.

The above process is designed to ensure:

Reliability, through intra- and inter-coder checks for consistent application of the sourcing and coding methodology;
Accuracy, by identifying and correcting mistakes as well as ensuring coding decisions accurately reflect the situation on the ground; and
Relevance and completeness, by determining whether each compiled event constitutes an act of political violence or demonstration, or a significant non-violent strategic development.²

ACLED’s Data Management team works with Researchers and Research Managers to maintain a list of conflict actors, locations, and sources. New additions to these lists are reviewed as part of the real-time review process explained above to ensure that actors, sources, and locations do not get ‘double-coded.’ For example, if a group goes by multiple names, it is coded only under one standardized actor name in the data. The same goes for locations and sources.

ACLED also updates the details of events when new information becomes available. Since ACLED collects and codes events in real time, the initial details that emerge about an event are often vague. Uncertainties surrounding the actors, location, or other details of an event are clarified in the ‘Notes’ column of each event. Geographical- and time-precision codes are also included as a measure of how much estimation is used in determining the location and date of each event — higher numbers indicate less certainty.³

As more information emerges, event details are updated in the dataset and the altered data are then uploaded as part of the weekly process. For example, a group may not claim responsibility for an incident until sometime after the event. When information about a group is not known, ACLED will initially code the group as Unidentified Armed Group (Country). If new information about a group surfaces at a later date – e.g. a group comes forward claiming responsibility for an attack – then the event is updated to reflect the new information. Another example of when the data may be updated is if there are inconsistent reports in the aftermath of an event, especially with regard to the number of fatalities.⁴ Over time, more in-depth reports, such as those by human rights organizations, may surface. The details are updated in the existing events to ensure the most accurate coding is presented.⁵

Data supplementation and quality assurance

In addition to the weekly real-time coding and review, ACLED data are also regularly supplemented, reviewed, and updated as part of two processes to ensure thoroughness and reliability:

Supplemental/back-coding
Quality assurance checks

Supplemental coding is done on a per-project basis to bolster existing data in light of changes to the types or patterns of disorder in a country or region, the availability of sources, or the emergence of new trends in the political landscape. This may involve back-coding a new source before adding it to the real-time data, or the inclusion of new trends.⁶ In certain situations, data supplementation may also be undertaken to expand ACLED’s coverage of a country or region. However, this kind of supplementation is not done often as it requires significant time and resources. The list of countries and time periods covered by the ACLED dataset can be found here.

ACLED also regularly reviews data on emerging or evolving conflicts. Unique trends or actor types may emerge from these situations, and therefore, the research team reviews these contexts and data to ensure that coding decisions remain consistent with the ACLED methodology. ACLED also adapts its sources and/or sourcing processes as part of such reviews to ensure that the data continue to reflect reality as things change on the ground.⁷Additional clarification on how ACLED deals with specific conflict and disorder landscapes is available in region- or conflict-specific methodology primers in the Knowledge Base.

Additionally, systematic reviews of the data are carried out to ensure reliability. These reviews are generally part of routine quality assurance checks, but may also be prompted by new methodological insight (e.g. upon further investigation by the research team or upon consultation with external experts and/or partners). As these reviews are carried out, changes are made to the dataset available on the ACLED website.

How does ACLED code and review data to ensure quality?

Coding and sourcing process

Data review and cleaning process

Data supplementation and quality assurance

Contents

Related Articles