“Big data” help doctors choose a treatment method

Over the course of 20 years, since the beginning of contemporary medicine’s transition into a digital format, a vast amount of largely unused data has amassed. The analysis of these data and the extraction of a new logic of control from them is one of the most popular areas of focus in applied mathematics, Oleg Pianykh, a Professor in HSE’s Department of Data Analysis and Artificial Intelligence and an Associate Professor at Harvard Medical School, said in a report. His report, “Big Data in Medicine: How to Make them Work,” was presented at HSE’s academic seminar “Mathematical Models of Information Technologies.”

It is believed that the term “big data” was first used in 2008 by the British magazine Nature, which published a special edition dedicated to the phenomenon of explosive growth in the volume and diversity of processed data. Over the next three years, big data became one of the dominating trends in the infrastructure of information technology.

It is assumed that working with big data will in the long term have the biggest effect on production, public administration, trade and medicine.

Medicine will “lean on” mathematics more and more, Oleg Pianykh believes. From the moment medicine transitioned into the digital world, terabytes of information have been amassing in the databases of various institutions. This information includes scans, patient records, laboratory results, insurance, etc. Excluding the instantaneous use of the “look and diagnose” method, the data lie dormant, which represents the very same big data. “This is an ideal task for mathematicians because everything is already digital, [and] it is not necessary to scan anything [or] discern handwritten text. For that reason, digital medicine has become an excellent area of concentration for applied mathematics,” the said.

Decision Tree

One possible aspect of working with big data is the standardization of healthcare. The analysis of such data could be used to work out the optimal algorithm for making medical decisions by reducing the subjective human factor to a minimum.

“Let us assume that a lump was found in a patient’s lungs with a diameter of 1 centimeter. The patient could go to various doctors and get completely different answers. One will say that it’s necessary to remove it, the other will say to look again in half a year, and someone will say a biopsy needs to be done,” Professor Pianykh explained. Even the same doctor, in relying on the same data, could propose one solution today and another in a few weeks or a month. In addition, a variety of factors could affect the doctor’s decision, including his or her mood.

The comparison of analyzed, accumulated, statistical material (“big data”) with data on a specific patient allows for the best decision to be made with the least amount of subjectivity. In other words, if the patient has a growth of a particular diameter and if additional information is known – age, whether the patient smokes, if close relatives have a history of lung cancer, etc. – then it is possible to “purely statistically” foretell the best course of action.

Practicing doctors in the U.S., Pianykh says, fairly often use the “decision making tree,” which is a logical and objective guide in the form of charts or programs based on the analysis of “big data.” As soon as the doctor receives the necessary information about the patient – tests, x-rays, medical history, etc. – then he or she can rely on this “decision tree,” have a mathematically predictable result, and effectively understand what to do with the patient.

Various American medical associations are already working on creating “decision trees” on the basis of statistically accumulated experience.

When more is not better

Other methods of using large data in medicine include optimizing work of clinics, balancing the workload for medical personnel, lowering waiting times, and others.

An important factor in optimization is the abandonment of “pointless projects,” that is, doubtful administrative decisions. “The situation when people say, let’s earn twice as much by taking in twice as many patients… But if no one is seeing if it’s possible to do this purely mathematically, whether its possible to squeeze in twice as much into this tube in the same unit of time. This ends in pointless projects and total stress,” Pianykh explains.

Even when theoretical calculations based on, for example, the QT theory of mass service (this is the theory of queues, which deals with working out optimal service systems based on baseline requirements and allows for the forecasting of a client’s wait time, the length of lines, etc) show that the flow of patients can be increased, it is better not to do this. According to QT, if a doctor services patients faster than they arrive, then there should not be a line; practice, however, disproves this.

The intensity of flow depends on many factors that equation compilers do not control. In addition, there are deviations from the mean that play a very significant role. “A line can grow very rapidly because one person was looking for their money in their pockets. This is a non-trivial lesson that is not understood by those that say: let’s provide services to twice as many [individuals]. It turns out that no… if one person fumbles in this process, a huge line grows,” Oleg Pianykh says.

Another example is when bureaucratic agencies “that allocate the quotas to clinics” decide that the working capacity of medical equipment should be, for example, at least 75%. “Let the equipment be at 90%, then we will definitely ‘process’ all patients,” Pianykh said, describing the logic of these agencies.

Mathematical calculations and practice show that if workload exceeds 80%, then the smallest of deviations in the work schedule can bring about very serious and negative consequences. For example, a small increase in patient inflow would turn into an “explosive surge” in wait time. All reserves for increasing the intensity of work have already been exhausted.

Lords of the queues

Working with “big data” allows for the optimization of operations between institutions by relying not on theoretical models, but on empirical material.

Oleg Pianykh uses the example of an American clinic that specialized in x-ray examinations. There were eight offices at the clinic that carried out the examinations and two special employees that separated patients among the eight rooms. The situation was complicated by the fact that there was not a notification system that told these employees when a room was vacant or occupied. X-ray technicians would simply leave the door opened to rooms with free x-ray machines. The set up of the location, however, did not allow for two workers to immediately keep track of all doors, and they had to walk down the hallway to see if a room was vacant or not. Patients were therefore often brought into offices located close to the entrance, which brought about an uneven distribution of work and upset the technicians.

The group of researchers led by Oleg Pianykh were tasked with answering the question of whether or not it was possible to control a patient’s waiting time in a line and improve the service process.

Calculations were initially performed based on classical models, namely the QT. The author commented on the results: “When we calculated every moment of people’s [time spent] in a line throughout the week and compared this with the dynamic that results from the theory of service, a negative correlation resulted. The theory does not at all predict what happens in reality.”

The presence of a number of unpredictable factors in medicine makes classical service theories of little use. It is necessary to begin with the simple, yet reliable empirical models, such as linear regression, that are based on accumulated “big data.” Simpler models are, however, limited in accuracy and are only handy for simple processes that lack unforeseen situations. The logic of working with “big data,” Pianykh says, must include the elimination of anomalies such as chaos, randomness and untimeliness. This means the goal is not to speed up the service time, but to change the service strategy to lower randomness.

As a result, a special model for the electronic line was developed for the clinic that displayed how long a certain office had been available and how long a certain x-ray examination lasted. All data were displayed on electronic devices similar to a tablet, which were given to the employees who distribute patients among the rooms.

A service strategy was also developed for the clinic: never begin the day with a long x-ray examination, alternate long and short exams (this is the least predictable and breaking this could bring about collapse for the entire schedule), and begin examinations proportional to their actual length.

Large piles of garbage

Creating an algorithm does not, however, mean that the task is solved, Oleg Pianykh stresses. It is critical to integrate new opportunities into hospital processes, and personnel must be trained and the human factor considered. Any large-scale adoption of an algorithm, the author says, leads to people trying to cheat it.

The conclusions that Pianykh makes are as such: it is absolutely crucial to analyse and use “big data” in medicine to optimize the work of any process, though illusions should not be created – “big data” alone do not guarantee a “big idea.” They can consist of large piles of trash – accidental events, mistakes, as well as useless or incomplete information. The task of extracting useful, meaningful information is one of the main goals of working with “big data.”

Author: Гринкевич Владислав Владимирович, June 03, 2014

All texts by

Mathematics