Source: Humanities and Social Sciences Communications | 18 September 2023
Authors: Clionadh Raleigh, Roudabeh Kishi, and Andrew Linke
Abstract: Conflict event datasets are used widely in academic, policymaking, and public spheres. Accounting for political violence across the world requires detailing conflict types, agents, characteristics, and source information. The public and policymaking communities may underestimate the impact of data collection decisions across global, real-time conflict event datasets. Here, we consider four widely used public datasets with global coverage and demonstrate how they differ by definitions of conflict, and which aspects of the information-sourcing processes they prioritize. First, we identify considerable disparities between automated conflict coding projects and researcher-led projects, largely resulting from few inclusion barriers and no data oversight. Second, we compare researcher-led datasets in greater detail. At the crux of their differences is whether a dataset prioritizes and mandates internal reliability by imposing initial conflict definitions on present events, or whether a dataset’s agenda is to capture an externally valid and comprehensive assessment of present violence patterns. Prioritizing reliability privileges specific forms of violence, despite the possibility that other forms actually occur; and leads to reliance on international and English-language information sources. Privileging validity requires a wide definition of political violence forms, and requires diverse, multi-lingual, and local sources. These conceptual, coding, and sourcing variations have significant implications for the use of these data in academic analysis and for practitioner responses to crisis and instability. These foundational differences mean that answers to “which country is most violent?”; “where are civilians most at risk?”; and “is the frequency of conflict increasing or decreasing?” vary according to datasets all purporting to capture the same phenomena of political violence.