Managing Large-scale Probabilistic Databases

Subscribers:
344,000
Published on ● Video Link: https://www.youtube.com/watch?v=53rSLXGeslc



Duration: 1:06:51
231 views
2


For the next generation of data-management applications, such as sensor-based monitoring, data integration, and information extraction, data processing is the dominant cost. Often, the data driving these applications are uncertain, for example, due to missed, inconsistent, or imprecise sensor readings. Unfortunately, traditional data-management systems provide little or no support for managing uncertainty. To remedy this, my dissertation advocates an approach for data management in which uncertainty is modeled using probabilities. The cost of modeling imprecision using probabilities is that basic data-management tasks, such as querying, become theoretically and practically more difficult. Thus, the key challenge in managing large-scale probabilistic data is efficiency. In this talk, I will discuss the fundamental techniques that I developed in my dissertation to build a probabilistic database capable of handling large, imprecise datasets: these techniques include top-k processing with probabilities, materialized views, approximate lineage, and extensional processing for complex analytic queries. This work resulted in two systems: Mystiq, the first system to support complex queries on gigabytes of probabilistic relational data, and Lahar, the first system to support rich event-style queries on large, probabilistic streams.







Tags:
microsoft research