Election violence automated tags
What do these tags cover?
ACLED uses two automatically generated tags for tracking violence associated with elections: “Poll-related violence” and “Election-related violence.” A tag is applied if the notes of an event indicate that the given type of violence occurred. The scope of each tag is briefly described below.
1. Poll-related violence
This tag is applied to any event that indicates that the voting process was directly affected or targeted. It can include events before, during, and after the elections.It includes any of the following events:
- Violence that occurs at a polling center
- Violence targeting voters
- Violence that targets members of an election commission or its administrative buildings
- Violence that targets electoral materials (e.g., ballots, ballot boxes)
- Violence that targets polling staff
- Measures inhibiting the voting process due to threats or violence (e.g., canceling the installation of voting booths)
2. Election-related violence
This tag excludes events tagged under poll-related violence and captures events related to elections and campaigns more broadly. It includes any of the following events:
- A candidate for the upcoming elections, or their family, was targeted (this also includes the same candidate after the elections)
- Former candidates were targeted
- Property of a candidate or their party was targeted (e.g., campaign vehicles, billboards, candidate’s home, party headquarters)
- Members of a party or campaign staff were targeted due to the upcoming elections
- Members and/or supporters of parties clashed due to upcoming elections
- Candidates were arrested ahead of the elections
- Local administrators were targeted due to their involvement with candidates for the upcoming elections
- Civilians (e.g., voters, activists, journalists) were targeted (including arrests) due to the upcoming elections
- Rioters targeted candidates for the upcoming elections or denounced election results
- Police clashed among themselves due to the upcoming elections
- Security measures were deployed in response to election-related violence
Which events could receive these tags?
ACLED applies the election violence tags to events with an event date (event_date column) in January 2020 or later, across all countries we cover, that fall under the event types “Battles,” “Explosions/remote violence,” “Violence against civilians,” and “Riots,” as well as the sub-event types “Excessive force against protesters” (“Protests” events) and “Looting/property destruction” (“Strategic development” events).
How do these tags differ from ACLED’s “Violence targeting local officials” tag?
ACLED also produces a manually coded tag that identifies events of violence targeting local government officials around the world. There is some overlap between the local officials tag and the poll-/election-related violence tags, as they both capture cases where elected officials, elections staff, and poll workers are targeted with violence. However, the local officials tag captures a broader catchment than just events targeting election-related officials, including civil servants and local officials operating outside of election contexts. Similarly, the poll/election tags capture a broader range of election-related events beyond those involving local officials, including violence targeting voters, the destruction of ballot boxes, violence targeting campaign staff and clashes among supporters.
How do these automatically generated tags differ from ACLED’s other tags?
The election violence tags are part of ACLED’s initiative to create tags using a large language model (LLM), while standard ACLED tags are coded manually by researchers. See ACLED’s data on attacks on Ukrainian infrastructure for another set of automatically generated tags.
An LLM is a type of artificial intelligence that processes and, in the case of certain LLMs, generates text based on patterns learned from vast amounts of data. In the context of classification models, like those used here, an LLM can analyze and categorize text — like identifying whether a text is about a certain topic — by recognizing keywords, phrases, and context.
Using the notes column, we fine-tuned an LLM specifically for classifying poll-/election-related violence among ACLED events. ACLED manually collated training data for the model, which was improved until its performance was deemed satisfactory. The model was applied to various time periods and countries, and eventually globally. ACLED performed several checks, and particularly cross-checks with national and sub-national elections and manual checks of sampled tagged data, before rolling out the new tags.
How accurate are these LLM-based tags?
After creating the final fine-tuned version of the model, using training data from events across the globe and from the time period of 2018-2024, accuracy was determined using data set aside for testing. The test data was balanced across relevant and irrelevant events. When applying the model to this data, both poll-related and election-related violence tags achieved accuracy levels over 95%, calculated as the share of events tagged correctly.
Is there any manual supervision of these tags?
ACLED updates the tags on a weekly basis and the data is manually monitored for false positives and negatives, with entries periodically updated if deemed necessary by contextual experts. The model is also periodically retrained as more data becomes available to improve performance.
How is this classification model superior to a keyword search?
Text classification models trained on the notes column offer several advantages over simpler keyword searches for identifying poll- and election-related violence. For example, subnational elections can easily be missed with a keyword search because of variation in local names (e.g., union parishad elections in Bangladesh). Complex keyword searches would also be required to identify the targeting of election-related personnel, candidates, or violence at polling stations without erroneously identifying many false positives that include relevant keywords but are nonetheless not related to recent elections. By contrast, the model-based approach employed here considers the broader context of words and sentences within the notes, which helps minimize issues related to false positives and negatives that plague keyword searches.
What is the step-by-step approach for fine-tuning such a model and ensuring that rare events or edge cases are caught?
For more information on the modeling approach and best practices for text classification using the ACLED notes column, visit ACLED’s Knowledge Base.