According to the Ministry of Justice of the Netherlands, 45% of the Dutch population have experienced domestic violence at some time in their lives, while 27% suffer it on a weekly or even daily basis. Domestic violence here refers to all forms of physical violence committed by someone in the victim's home environment, including all current and former partners, family members, extended family, and friends of the family.
In the first half of the 2000s, the police force of Amsterdam and Amstelland made effective response to domestic violence one of its priorities. Accurate detection and classification of reported incidents was an essential part of this response.
By current regulations, once an offence is reported in Amsterdam, the police officer has to determine whether or not it is a domestic violence case. But the problem is that certain domestic violence incidents are not recognised as such by police, and massive reviews of police report databases have revealed many cases of mislabelling, according to Dmitry Ignatov's paper Using Emergent Self-organising Maps and Multidimensional Scaling to Analyse Police Reports presented at the Fourth International Conference on Integrated Models and Soft Computing in Artificial Intelligence.
To cut down on the number of undetected domestic violence cases, an automated sorting system was set up to select suspicious reports for manual analysis. But even this system missed too many cases; in 2007, a manual check taking the reviewer about five minutes per report confirmed only 20% of the 1,091 automatically selected suspicious cases as domestic violence.
According to the authors of the study, text mining offers a promising approach to processing large amounts of textual data, but previous projects using text mining to identify domestic violence cases had failed for the following reasons, among others:
The researchers used multidimensional scaling (MDS) and emergent self-organising maps (ESOM) to improve the accuracy of detection of domestic violence cases in police reports.
MDS uses similarities and differences in pairs of objects to visualise them in a lower dimension. In this case, a classic metric algorithm of MDS was used to project police report data on a two-dimensional surface, where reports with a similar score were located close to one another.
Self-organizing maps (SOM) is a data visualisation technique that reduces multidimensional data to a lower dimension, usually 2D, through the use of self-organising neural networks. Emergent self-organizing maps (ESOM) have more neurons compared to conventional SOM.
The original data set included 4,814 police reports from 2007 describing violent offences, each containing the victim's statement and other police data. Just 1,657 of these reports were labeled as domestic violence cases.
The control sample consisted of 4,378 reports from 2006 (including 1,734 labeled by police as domestic violence cases). Back in 2006, the sorting system had selected 1,157 reports for manual review, resulting in 318 reports labeled as domestic violence cases and 839 labeled as non-domestic violence cases.
After a detailed manual review, the researchers found that just a relatively small part of police reports were mislabelled, mainly because they contained concepts and terms lacking in the original definition of domestic violence, such as homosexual relationships, premarital and extramarital relationships, sexual abuse, etc.
After a few iterations and terminology adjustments, programming the new map, and analysis of the resultant ESOM, a new thesaurus was created containing more than 800 terms on the subject, their combinations, and clusters. Minimum redundancy maximum relevance (mRMR) feature selection was used before inputting the data, resulting in a ranked list of the most relevant features that have significantly improved the labelling accuracy, once added to the thesaurus.
At the project's final stage, the researchers developed a method for automated object classification based on ESOM and MDS and capable of predicting a class of new cases identified by automated preselection.
Police officers who tested both approaches said they liked the ESOM interface and found it more convenient than MDS for analysing large numbers of police reports. Moreover, ESOM helped with recognising two important additional data clusters missed by MDS. A quantitative comparison showed some superiority of MDS. In addition to the extraction of important data using ESOM and MDS, a study to compare the two instruments was undertaken and showed that the accuracy of the new model for automated data classification was very high at 89%.
In contrast to previously developed methods for automated domestic violence data analysis, most of which do not provide for the operator’s involvement, the new method, notes Ignatov, involves a subject matter expert in the search and thus allows for in-depth understanding of the data. Further research will focus on ESOM application to other types of crimes, and on building a system for their classification.
*The following people took part in the project:
Poelmans J., Ph.D., Former PostDoc researcher at Katholieke University Leuvene;
Marc M. Van Hulle, Ph.D., Professor Katholieke at University Leuven;
Stijn Viaene, Ph.D., Vlerick Mangement School;
Guido Dedene, Ph.D., Professor at Katholieke University Leuven Universiteit van Amsterdam;
Paul Elzinga, PhD., Officer of Amsterdam-Amstelland Police