On October 5, our colleague Alexey Mironov presented the article "Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia" at the DAMDID/RCDL'22 conference. The article presents the first Russian-language corpus of Wikipedia tables, as well as a toolkit for its construction. This corpus will enable further research on machine learning topics such as determining the semantic type of a column, automatically filling in missing values, extracting knowledge, and many more.
The talk was met positively, and as a result, the conference chairs decided to publish the article in a Springer journal. The proceedings of the conference have not yet been published, but a preprint is available, which can be found at the link below.
The DAMDID/RCDL conference is one of the largest and oldest Russian conferences on information management. This year it was held in St. Petersburg, at the ITMO University.
LinksConference website:
https://damdid2022.frccsc.ru/ About the conference:
https://synthesis .frccsc.ru/rcdl.html Preprint of the article:
https://arxiv.org/abs/2210.6 353 Corpus:
https://gitlab.com/unidata-labs/ru-wiki-tables-dataset