Continuous Queries over Data Streams
Continuous queries are a common interface for monitoring dynamically changing data, including data streams. Applications include tracking financial trends, network health monitoring, and sensor deployments. In the STREAM project at Stanford, we have built a comprehensive prototype system that supports rich, declarative continuous queries over data streams. In this talk I will focus on continuous aggregation queries and address the following three problem settings. (1) A large number of queries: Here a primary challenge is to share resources (e.g., space, computation) across different queries. (2) Limited memory: Here the challenge is to design algorithms for maintaining approximate statistics making the best use of available memory. (3) Distributed systems: Here a primary challenge is to minimize communication while correlating events on distributed streams. I will conclude with a brief summary of my other work, including continuous query language design and semantics, and characterizing memory requirements for continuous queries.