The type of crimes called human trafficking or trafficking in people for their subsequent exploitation is the fastest growing criminal industry worldwide; according to the UN, its turnover reached nine billion dollars in 2004. In Europe, the most likely subjects of trafficking are young women from Moldova, Bulgaria, Romania, and Hungary who are transported to other countries for forced prostitution; each year, up to two million women fall victim to trafficking.
Human trafficking is a major problem for the Netherlands, and the Amsterdam police have requested a software product for automated processing of numerous police reports to detect trafficking networks; in addition to selecting suspicious cases, the system should be able to identify patterns and to establish potential suspect who may be involved in this criminal business.
The leading developers of the software product were Paul Elzinga from Amsterdam Police and Jonas Poelmans, then researcher at the Leuven Catholic University, and Professor Guido Dedene, academic co-director of the project. Russian mathematicians of the HSE's Department of Data Analysis and Artificial Intelligence Professor Sergei Kuznetsov and his colleagues, Associate Professors Dmitry Ignatov and Alexei Neznanov, joined the project in 2011 at the invitation of Guido and Jonas.
«The idea is to create a good system for analysis and visual representation of police report data,» explains Ignatov. «Formal concept analysis is perhaps the best method to use for that.» Proposed by a German mathematician Professor Rudolf Wille in the 1980s, formal concept analysis enables visual representation of object-attribute correspondences by constructing formal concept lattices, or Galois lattices. This approach makes it possible to construct a complete lattice based on any binary relationship and provides a mathematical description of a concept as an object-attribute pair. In this case, the objects are police reports while the attributes are the data they contain, such as keywords, dates, and people mentioned.
«We had never collaborated with police or helped them find criminals before, but we had analysed data, such as texts,» Ignatov explains. «We had used formal concept analysis (FCA) to find duplicate documents as part of the Internet Mathematics grant competition run by Yandex. My colleagues had previously used FCA to predict the toxic properties of chemical compounds. Whenever data is presented as a set of objects (documents, criminals, etc.) and their attributes, we deal with object-attribute data lattices.»
The researchers analysed some seventy thousand police reports filed since 2008. These were mainly reports by patrol officers who inspected vehicles or patrolled the streets of Amsterdam. Only in a thousand reports or so, did police officers know that the case had something to do with human trafficking. A typical police report would read as follows: «At night on March 23, 2008, I stopped a Mercedes in De Wallen (a red light district in Amsterdam). I noticed two well-dressed young women in the back seat. Neither of them spoke English or Dutch. The driver had the women's papers and explained that they had come to the Netherlands for a vacation.»
Police officers found it extremely difficult to establish involvement in trafficking by random observations in the streets and inspection of cars. However, their reports helped researchers to identify a few indicators – attributes suggesting that people mentioned in a police report may be involved in human trafficking.
Table 1: Sample data of police reports
Prostitution Pimp Violence Expensive car Large sum of money Bulgarians Report 1: June 13, 2007 x x x Report 2: July 26, 2008 x x x Report 3: September 28, 2008 x x x x x Report 4: February 5, 2009 x Report 5: February 22, 2009 x x
All indicators (searched automatically in the text) were categorised into groups:
In addition, indicators were classified into early and late, i.e. possible and obvious, respectively.
Indicators found in police reports were used to create lattices. By looking at a lattice, it was possible to see suspicious indicators in a particular report; e.g., Report 1 mentioned Bulgarians (police have found that Bulgarians have often been involved in trafficking). The report also mentioned indicators such as an expensive car, problems with identity papers, and the red light district.
Reports containing indicator words merited particular attention, and the police analysed formal concepts in an attempt to identify suspects involved in trafficking.
This type of analysis is carried out in three stages:
The instruments developed by the researchers allow the police to use formal concept lattices to find relevant indicators and identify potential suspects. For example, a Bulgarian who had problems with identity papers and a large amount of cash on him and had been seen in the red lights district became a trafficking suspect. Automated analysis of police reports reveals where, when, and in what circumstances certain suspicious indicators have been observed.
Graph 1. A Concept Lattice Diagram
Next, the system was used to analyse and visualise suspects’ social networks: the program showed who the suspect had dealt with and under what circumstances – factors which suggested who else could be involved in the criminal gang.
The results of this collaboration between researchers and police have been presented in a series of papers at conferences on data mining and formal concept analysis and in academic journals. For example, the paper Semi-automated knowledge discovery: identifying and profiling human trafficking published in the General Systems journal provides a detailed methodology of the analysis, illustrated by six cases in which formal concept analysis helped to detect trafficking incidents and identify suspects and criminal networks. Police investigations into these cases have led to the arrest of suspects and closure of brothels.
The HSE's Laboratory of Intellectual systems and Structural Analysis led by Professor Sergei Kuznetsov is now working on FCART, a software product for formal concept analysis of textual information; the product demo is available here.