XKeyword Homepage


Main Idea People Publications Presentations Demo

Main Idea

Keyword search is the most popular information discovery method because the user does not need to know either a query language or the underlying structure of the data. The search engines available today provide keyword search on top of sets of documents. In addition to documents, a huge amount of information is stored in databases, but no simple way is available to discover information in them, except by using structured languages (such as XQuery or SQL) where the type(s) of each keyword must be specified, as well as the types of connections between the objects. The result of a keyword query is a list of trees of nodes from the database, which is viewed as a labeled graph. For example, a keyword query “Smith, Miller” may return that “Smith” is the name of a sales agent who took an order from customer “Miller”, while a second solution may indicate that “Smith” and “Miller” are customers served by the “San Diego” sales unit.

As another example, consider a bibliographic database (see demo below), with publications, authors and conferences. Assume we want to know how author “Widom” is connected to author “Ullman”. Using a language like SQL, this is expressed as a set of queries of the type: the two authors have co-authored a paper, or the first references a paper of the second, or the second references a paper of the first, or they published in the same conference and so on. The same query, using XKeyword, is simply expressed as “Widom Ullman”.

XKeyword is built on a relational database and, hence, can accommodate very large graphs. Query evaluation is optimized by using the schema of the database. In particular, XKeyword consists of two stages. In the preprocessing stage a set of keyword indices are built along with indexed path relations that describe particular patterns of paths in the graph. In the query processing stage plans are developed that use a near optimal set of path relations to efficiently locate the keyword query results.


bullet Yannis Papakonstantinou, Professor
bullet Andrey Balmin, PhD student
bullet Vagelis Hristidis, PhD student
bullet Tem Wang, MS student
bullet Patrick Lightbody, undergraduate student

Related Publications

bullet Vagelis Hristidis, Luis Gravano, Yannis Papakonstantinou: Efficient IR-Style Keyword Search over Relational Databases. VLDB, 2003
bullet Andrey Balmin, Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou, Divesh Srivastava, Tianqiu Wang: A System for Keyword Search on XML Databases. Demo at VLDB, 2003
bullet Vagelis Hristidis, Yannis Papakonstantinou, Andrey Balmin: Keyword Proximity Search on XML Graphs (extended version). ICDE, 2003
bullet Vagelis Hristidis, Yannis Papakonstantinou: DISCOVER: Keyword Search in Relational Databases. VLDB, 2002


bullet Efficient IR-style Keyword Search over Relational Databases, at VLDB 2003, Berlin
bullet XKeyword's presentation, at IEEE ICDE 2003, Bangalore, India
bullet DISCOVER's presentation, at VLDB 2002, Hong Kong.

Demo description

The XKeyword demo is operating on a subset of the DBLP database containing the  12 major conferences including VLDB, SIGMOD, ICDE, PODS, up to year 2001. The schema of the database is the following

The user inputs a set of keywords, which may belong to any type (relation), and the maximum Candidate Network size, which is the maximum size of the result trees in number of edges. For example, for the keyword query "Widom Ullman" the results include:

Result Size
author["Widom"] paper["TSIMMIS System"] → author["Ullman"] 2
author["Widom"] paper["Production Rules"] ← paper["Constraint Checking"] → author["Ullman"] 3
author["Widom"] paper["LORE"] → paper["TSIMMIS System"]  → author["Ullman"] 3

A thread is generated for every Candidate Network (CN), that is for every possible join tree on the schema containing all keywords. These threads execute in parallel and output results as they come. The threads of smaller CNs execute faster since they correspond to simpler queries. Hence the smaller results, which are intuitively the most relevant, are output first.



Go to XKeyword Demo


Inquiries for commercial use of this software should be directed to invent@ucsd.edu.