Problem: Many people in Russia believe that they had COVID-19 as early as December 2019 or January 2020. Is it possible to find out when the epidemic really started in Russia and where it came from? Bioinformatics provides an answer.
Solution: Scholars compared mutations of 211 coronavirus genomes from patients in 25 Russian regions. They built the virus’s evolutionary trees on the basis of this data. They discovered that SARS-CoV-2 was brought in Russia from Europe during the period between late February and early March. The first case of its transfer inside the country took place no earlier than March 11, 2020.
A research team from HSE University and SkolTech, together with experts from the Smorodintsev Research Institute of Influenza in St. Petersburg and the RAS Kharkevich Institute for Information Transmission Problems (IITP), discovered that the SARS-CoV-2 virus independently entered Russia at least 67 times, mostly at the end of February and beginning of March 2020. The vast majority of introductions came from European countries. No cases of introduction from China were registered, which is likely due to the timely closure of borders with the country. Currently, nine local virus lineages are circulating in Russia, which are not present elsewhere in the world. Given that Russia was actively ‘importing’ the virus from abroad, the researchers have not detected any cases of ‘exporting’ any SARS-CoV-2 lineages in other countries. Complete results of the study are available in the preprint on the medRxiv database.
In the first two weeks of May, Russia became one of the leaders by the number of infections with SARS-CoV-2. As of today, according to official data reported by the Russian government, there have been 777,486 detected cases since the start of the epidemic. That said, the date of its start was unknown until recently. The first two cases were detected in January, and both were related to direct delivery from China, following which the Russian authorities closed the borders with the country. Nevertheless, neither of those cases led to virus transfer to other people and new cases inside Russia.
COVID-19 was again detected only on March 2, 2020, in a woman who came back from a trip to Italy, followed by a steady increase in the incidence; the reproductive ratio (Rt) of the virus approached 2. This led the country to implement consistent reinforcing non-pharmaceutical interventions (quarantine in observations for people coming from abroad, self-isolation regime, etc).
Rt, or the virus reproductive ratio, means the average number of people infected by one patient before isolation. It is calculated on the basis of data on the number of new cases over the prior eight days, following the formula: Rt = (X8+X7+X6+X5) / (X1+X2+X3+X4), where X1...X8is the number of registered coronavirus cases in the region over the day.
Despite the efforts taken by authorities, the implemented measures did not work, and a full-scale epidemic broke out in the country. Biologists and mathematicians from HSE University, SkolTech, and IITP decided to learn where and how the epidemic started, from what countries the SARS-CoV-2 was brought, and how it mutated and changed Russia.
The most precise data on the virus spreading and the effectiveness of certain measures can be obtained by means of phylogenetic analysis of the virus samples. This approach is called ‘genomic epidemiology’. The fact of the matter is that viruses mutate quickly. Coronaviruses, for example, have an average of one mutation in genome every two weeks. Due to this, new lineages evolve. That’s why, tracking their first appearance, their differentiation from one another, and registration in various regions at certain times, we can restore the virus evolution and distribution in a population.
To analyse and build phylogenetic trees, the researchers used 211 virus genomes, which were sequenced at Smorodintsev Research Institute of Influenza. All the genomes had been obtained from patients from 25 Russian regions (including the Republic of Crimea) during the period from March 11 to April 23, 2020. The resulting dataset reflected the situation at the early stages of the epidemic in Russia.
Genome sequencing helps establish the sequence of nucleotides in the DNA or RNA (as in the case of coronaviruses) molecules – separate ‘building blocks’ of the genetic code. When two sequences are compared, researchers can see mismatching regions – mutations that tell us that we see a new lineage of the virus that may function differently from its predecessors.
In addition, to establish the full scope of phylogenetic links, the scholars used a dataset of 19,623 SARS-CoV-2 genomes from the rest of the world available on May 26, 2020. The analysis also extensively used data about people’s transit across Russian borders and inside the country.
The phylogenetic trees built by the researchers demonstrated that in the early stages of the epidemic, almost all lineages of SARS-CoV-2 that evolved by that time circulated in Russia. Meanwhile, a vast majority of genomes belonged to the lineages prevailing in Europe. It turned out that only 2% of them were Asian ones, while they comprised about 50% of the lineages in China at the moment. This means that the main channel of coronavirus ‘import’ was the passenger traffic from European countries. Probably, the Chinese lineages also came from Europe, since the borders with China had already been closed long before that.
The researchers believe that such a high diversity of the virus was achieved due to at least 67 independent virus introductions in different Russian cities in the end of February – beginning of March. Out of these, nine introductions led to the appearance of local virus lineages, which circulate only in Russia. Georgii Bazykin, the lead researcher and SkolTech professor, said that these figures are far from being final. It is likely that there are much more ‘imported’ and ‘local’ lineages of the virus. This will be revealed as the researchers obtain the dataset from the new sequenced genomes.
One of the most interesting results is that the virus that launched the epidemic appeared in Russia quite late. Most of the people who brought the novel coronavirus to Russia arrived in late February – early March. There were no traces of earlier evolution of specific lineages in the country.
In many cases, the complete chain of the virus transmission can be traced. For example, a person from the Chechen Republic performed the hajj to Mecca, and his lineage belongs to one clade (phylogenetic tree branch) with other viruses in Saudi Arabia. Top executives of YATEC JSC arrived by air from Switzerland, and the genomes of viruses in Yakutia belong to one clade with those in Switzerland. Meanwhile, there are some interesting anomalies: the virus brought by a Russian person from Egypt is inconsistent with the genomes of virus lineages present in the region.
SARS-CoV-2 came to Russia to stay. There are no signs of ‘exporting’ the Russian variations of the virus abroad (at least to the countries with data available). This sets the country apart from such global super-spreaders as Italy and Spain, which laid the foundation for the European epidemic, or Great Britain. Inside Russia, the virus started spreading almost immediately after it was imported. The first in-Russia transmission happened as early as March 11.
The researchers paid particular attention at the virus outbreak at one of non-infectious hospitals in St. Petersburg – the Vreden Hospital, where over 700 people were locked down in a quarantine for a month, and over 400 of them were infected. According to the city governor Alexander Beglov, the virus was brought there by an employee who came back from a vacation in Turkey. However, the analysis of several dozen genomes from the institution demonstrated that the virus was introduced there from two to four times. And each time, this led to a specific outbreak, which merged and looked like one big outbreak.
The specific features of the Vreden Hospital case – its isolated community – make it look like the notorious infection at Diamond Princess cruise ship in the beginning of the pandemic. Such closed populations really get infected by one introduction. There had been no reason to believe that the situation in St. Petersburg was different. And only genome analysis helped reveal the real dynamics of infection in the hospital, with multiple repeated virus introductions.
In the beginning of the outbreak, SARS-CoV-2 was spreading very rapidly, with Rt at about 3.7. Then, after some of the medics and patients got through the disease, and hospital departments were isolated from each other, Rt fell to 1.4. Interestingly, the quarantine didn’t help deter the virus inside the Vreden Hospital: there were several patients who weren’t directly connected to the hospital but who were infected with the lineages that circulated inside the hospital.
Genomic epidemiology is only able to dispel the rumours and myths about the ‘early epidemic’ in Russia in November 2019 – January 2020, which are persistently discussed on social media. It also demonstrates the effectiveness of the state restrictive policies. For example, the timely closure of borders with China did not allow the novel coronavirus to be introduced from there into Russia in January and February 2020.
At the same time, the measures to control passenger traffic from Europe and other countries in February and March 2020 turned out to be insufficient. This is where SARS-CoV-2 came from, which provoked several local outbreaks that led to a full-scale epidemic.
Today, in the context of the planned further lifting of limitations and opening of the borders, genomic epidemiology is also able to help detect the sources of new outbreaks, regardless of whether they were caused by local lineages or by the virus again being brought into the country from abroad.
The study demonstrates a way to objectively investigate the paths and character of a virus spreading in a population. Today is a unique situation for researchers: we are getting data, including genomic data, almost in real time. ‘Thanks’ to the pandemic, we are learning a lot about epidemics more broadly, which will help in predicting and controlling them in future.
In addition, genomic epidemiology is a good practical tool for epidemiologists, doctors and developers of vaccines. The more time that passes since the beginning of the pandemic, the more there are sequenced genomes of viruses from all parts of the planet. Different lineages will depart further, which will help us understand the general evolutionary vector of SARS-CoV-2, how it adapts to humanity and whether it demonstrates the signs of attenuation (decreasing the harmful effect on the human body and virulence). It is quite probable that the vaccines, which were created for certain lineages in one region, will not be suitable for preventing the virus from spreading in other regions. That is why accumulating and analysing the array of SARS-CoV-2 genomes is so important for further efforts to combat the virus.