Title: Querying XML Peers Speaker: Emiran Curtmola (UCSD) Abstract: As the web evolves, it is becoming easier to form communities based on shared interests, and to create and publish data on a wide variety of topics. With this democratization of information creation comes the natural desire to make one's data accessible for querying within the community and also be able to query the global collection that is the union of all local data collections of others within the community. Given the large number of potential publishers, the dynamic nature of published data, and the need by publishers to maintain complete control over who accesses their data, centralized approaches (e.g., search engines, hosted online communities) that disintermediate publishers from data consumers are not appropriate. In this paper, we consider a distributed approach, where data resides only with their publishers, who thereby maintain control on who can access their data and how to advertise their data. Given the virtual nature of the global data collection, we study the challenging problem of efficiently locating publishers in the community that contain data items matching a specified query. We propose a distributed index structure, UQDT, that is organized as a union of Query Dissemination Trees (QDTs), and realized on an overlay (i.e., logical) network infrastructure. Each QDT has data publishers as its leaf nodes, and overlay network nodes as its internal nodes; each internal node routes queries to publishers, based on a summary of the data advertised by publishers in its subtree. We experimentally evaluate design trade-offs, study the efficiency of UQDT using real data sets, and demonstrate that UQDT can maximize throughput by preventing any overlay network node from becoming a bottleneck.