October 10, 2022
Russian Web Tables corpus paper accepted & presented at DAMDID/RCDL'22
On October 5, our colleague Alexey Mironov presented the article "Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia" at the DAMDID/RCDL'22 conference. The article presents the first Russian-language corpus of Wikipedia tables, as well as a toolkit for its construction. This corpus will enable further research on machine learning topics such as determining the semantic type of a column, automatically filling in missing values, extracting knowledge, and many more.

The talk was met positively, and as a result, the conference chairs decided to publish the article in a Springer journal. The proceedings of the conference have not yet been published, but a preprint is available, which can be found at the link below.

The DAMDID/RCDL conference is one of the largest and oldest Russian conferences on information management. This year it was held in St. Petersburg, at the ITMO University.

Conference website: https://damdid2022.frccsc.ru/
About the conference: https://synthesis .frccsc.ru/rcdl.html
Preprint of the article: https://arxiv.org/abs/2210.6 353
Corpus: https://gitlab.com/unidata-labs/ru-wiki-tables-dataset