Visit Universe website

Universe
Research Laboratory

We use our own innovation strategies and implement new technologies
to solve data management problems

Scientific interests

Machine learning for data quality and data management

Machine learning is now pervasive in virtually all areas, but textual data is its foothold. Novel approaches that emerged in the last decade made it possible to achieve significant progress in areas such as natural language processing, data mining, knowledge discovery, and many others. In our laboratory, we apply existing machine learning methods to pressing real-life industrial problems, as well as develop dedicated approaches to novel problems and explore promising applications.

Information integration and data quality for big data applications

The era of big data is here and this means that everything should be able to scale to enormous volumes, including information systems featuring multiple data sources. Its impact on federated systems led to the need to rethink approaches to classic data quality problems such as entity resolution, outlier detection, completing missing values and many others. These issues are the primary research venues of our laboratory.

Search: retrieval models, implementation issues, and evaluation

Members of our laboratory are also well-versed in system aspects of information retrieval technologies. Our expertise is centered around (but not limited to) search engine architectures, evaluation metrics, test collections, and experimental design. Overall, in our laboratory we both develop novel types of systems and tune existing ones.

Query engines: architectures, benchmarking & performance

A query engine is the backbone of almost every process that involves any kind of data management. Therefore, efficient execution of queries is crucial for ensuring the overall quality of such applications. We have expertise in building from scratch and modifying existing core components of classic RDBMSes as well as NoSQL, graph-oriented, and other types of systems. We can handle such system aspects as distributed processing, optimization, indexing, physical design tuning, and many more.

Data Cleaning, Data Discovery and Data Exploration

In the last decade there has been an explosive growth of data that is available to users. Consequently, new challenges related to understanding and explaining the data at hand have emerged. We have the expertise to approach them with our extensive mathematical background and knowledge of the cutting-edge approaches developed by the ACM SIGMOD & SIGKDD communities.

Dataset Analysis Service

1. Searching for Patterns in Datasets:
   - Exact and approximate functional dependencies
   - Conditional functional dependencies
   - Association rules
2. Data Error Detection
3. Performing Feature Engineering
4. Searching for Record Identification Keys

Desbordante is an open-source project, see Github

See it in action (beta version)

News