• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

‘It Used to Be Difficult without an Electronic Archive’

Anastasia Smirnova explains how HSE folklore researchers digitise and store data

From the archives of HSE folklore expeditions

Storing the data collected during folklore expeditions in a convenient and accessible way is not an easy task. As a solution, HSE researchers studying folk traditions have created an Electronic Folklore ArchiveAnastasia Smirnova, research assistant and, until recently, staff member of the HSE Faculty of Humanities Laboratory for Theoretical and Field Folklore Studies, where the electronic archive was designed and developed, tells us how it came about and why the skills of collecting and digitising folklore are so important, particularly at the outset of one's academic career.


Anastasia Smirnova,
bachelor's student of Fundamental
and Computational Linguistics,
research assistant at the Arctic Social
Sciences and Humanities Laboratory;
in 2019–2020, research assistant
at the HSE Laboratory for Theoretical
and Field Folklore Studies

— What was life like for HSE folklorists without an electronic archive and why did they need it?

— It was very difficult without an electronic archive. The data collected by folklore expeditions in the field used to be stored in cloud accounts accessible only to the expedition members. While this was better than manually searching a paper-based card catalogue, finding what you needed was still time-consuming — e.g. you had to read through all the expedition notes in order to select information for your paper.

An electronic archive saves time and improves efficiency significantly, allowing researchers to focus on the actual research. Our goal was not only to create a convenient research tool but also to place the data in a storage service accessible to everyone, including academics from other cities and countries.

— When did you start working on the electronic archive?

— This work was led by Andrey Moroz and Yulia Kuvshinskaya, and has been underway since 2019, but the idea emerged much earlier. The HSE Laboratory for Theoretical and Field Folklore Studies has undertaken numerous folklore expeditions since 2016 in order to study local traditions, to interview older residents about their lives before and today, to collect handwritten documents, to search local archives and museums, and to observe folk traditions live (e.g. by making videos of local festivals).

Over time, the folklorists have gathered an impressive amount of data. After years of looking for a good data storage solution, the solution arose when the HSE Fund for Educational Innovation launched its Rediscovering Russia programme (the Post-Production Project using data collected in the field).


Re-discovering Russia is a programme designed to support undergraduate field expeditions by HSE. Led by faculty members, students immerse themselves in the reality of life in Russian villages and small towns and learn to gather and analyse folklore.

Thanks to the funding made available under the programme, a quality database was created; now the data from folk expeditions no longer sits in private cloud storage but is digitised and placed in open access.

— What can one find in the archive?

— Today, the archive stores materials from expeditions led by Andrey Moroz and Yulia Kuvshinskaya. These are two different research groups. Moroz leads the research team from the School of Philological Studies and Kuvshinskaya leads the one from the School of Linguistics.

Since 2016, the former group has been collecting folklore and studying folk beliefs in communities living in the border area between Belarus and Russia. The latter group focuses more on verbal folklore, in particular on the mythological representation of history (since 2017, field trips have also been made to the Tver and Ivanovo regions).

Both groups share the objective of documenting the current state of traditional culture. One way to do this is through semi-structured interviews with local residents, i.e. engaging them in conversation on certain topics based on predetermined questions.

The transcripts of such interviews are then uploaded to the electronic archive. In addition to texts, the archive stores media files, such as photographs of churches and houses, woodcarving and embroidery, sculptures, examples of sacred geography such as worship crosses, chapels and stones; videos of local people telling folk tales, performing songs and engaging in conversation — all of this provides an important resource for folk culture and anthropological research.

— How can one navigate the archive to find what they need?

— One can search by keywords, including some 1200 search terms at the moment; by genre, such as jokes, epics, toasts, ceremonies, songs, sayings, etc. — more than 70 in total; by year and place of data collection (region, district, community); by informant profile (gender, year of birth, place of residence); and by questionnaire, where one can retrieve a page of questions asked during a particular expedition.

The archive's website was created mainly by the students and faculty from the School of Linguistics led by Boris Orekhov, therefore a linguistic search based on an abbreviated version of Timofey Arkhangelsky's Tsakorpus platform is also available, making it possible to search texts by the lemma (the dictionary form of a word), the exact form of a word or by a word combination.

— Is the HSE's Electronic Folklore Archive unique or are there other similar resources available?

— Many Russian academic centres have been exploring approaches to a corpus-based study of folklore, and a few electronic databases have emerged as part of such efforts. There have been attempts to create a folklore subcorpus of the Russian National Corpus (RNC). There is a folklore archive set up and maintained by the Institute of Linguistics, Literature and History of the RAS Karelian Research Centre making it possible for users to read or listen to songs, folk tales, legends and incantations of Karelia's indigenous people. There is also a very extensive digital archive of Latvian folklore which contains digitised collections from the Latvian Folklore Archives.

— What is special about the HSE's project?

— In addition to the diversity of formats, from text to multimedia, as I previously mentioned, the archive stores materials gathered during field expeditions by two different research groups using different approaches to data collection and processing.

— How are they integrated into one database?

— All recorded conversations get transcribed. The researchers working to document the system of folk beliefs break down the transcribed text into thematic blocks according to study questions and mark up the keywords. The other group studying verbal folklore marks up the transcribed texts by keywords and genres. The resulting transcripts are segmented into files and uploaded to the archive.

— Is it difficult to enter folklore into the system?

— There are quite a number of challenges related to both science and methodology, such as the form in which texts should be presented and the extent of detail in the transcripts. When our research focus is on a dialect, we need to enter a phonetic transcription using Praat — speech analysis software. In contrast, for ethnolinguistic research, a simplified transcription is sufficient, omitting any hesitation pauses, slip-ups and repetitions. When the focus is on folklore and anthropology, the traditional method of folklore transcription is suitable, which is designed convey the idea, story and details but not the speech. Another important factor is the type of markup to be used: by genre, action or character.

— Now, that you have the structure of your archive in place, what’s next?

— Developing it further, filling it with new data and expanding the media collection, of course. With most of the technical aspects taken care of, we can now focus on making the system even better: enhancing the website, working on the code architecture and design, user-friendly interface and process automation. There are some research ideas as well, such as analysing the archive content using the Digital Humanities and automated text processing tools.

— What about your personal research plans?

— Working with folklore gave my research a good start. Currently, I study social and linguistic anthropology with the Arctic Social Sciences and Humanities Laboratory. My main interest is in studies of the North, which I plan to pursue further, alongside studies in language policy, after I complete my bachelor's degree.

— Do you still take part in field expeditions?

— Yes, this summer I travelled to Karelia with the third expedition in the 'Ethno-linguistic Landscape of the Russian North' cycle, and then we embarked on a study of a new geographical area for us, the Izhemsky district of the Komi Republic. In these expeditions, our main focus has been on the region's linguistic situation and ethnocultural landscape.

— Have your perceptions of Russia changed by switching from the lecture hall to the field?

— Absolutely, because I find it difficult to discuss things which I have not experienced first-hand. For example, I struggled with studying the linguistic situation of local communities in the Izhemsky district relying only on information from the internet, open academic sources and literature. This was until I travelled to the Izhemsky district to experience things first-hand by talking to the local people. Now, with all this knowledge, my work just flows. Generally, field expeditions give us a wonderful experience and an opportunity to see first-hand how people live and how amazing they are, as well as contributing to scientific study.

Author: Svetlana Saltanova, October 15, 2021