Title: Making Text Search More Powerful by Supporting Fuzzy String Matching Speaker: Chen Li (Associate Professor, UC Irvine) Abstract: In text search systems such as search engines and databases, very often users are frustrated by not being able to find their interested entities (such as records, documents, and emails). One of the common reasons is the discrepancies between the user query and the representations of the entities in the information repository, due to users' limited knowledge about the entities, careless typos, or errors in the entity representations. As an example, how many of us can correctly spell the last name of our governor in "Caliphoneya", without using a spell checker or the "Did you mean" feature of a search engine? What if we want to find a person or restaurant the name of which we remember roughly? For these reasons, we want to support *approximate* keyword queries to make text search more powerful and user friendly. Clearly the required techniques are applicable in many applications such as spellchecking, online query relaxation, and optical character recognition (OCR). In this talk we will present some of recent research results my team has developed. We will focus on a specific problem: from a large collection of strings such as person names, how to efficiently find those that are similar to a given string, based on functions such as edit distance? We will present a new technique, called VGRAM. It improves those algorithms that use fixed-length grams, which are substrings of a string used as signatures to identify similar strings. A primary advantage of the technique is that it can be adopted by a plethora of approximate string algorithms without the need to modify them substantially. We present our extensive experiments on real data sets to evaluate the technique, and show the significant performance improvements on three existing algorithms. Related results have appeared in our recent VLDB/ICDE papers. Bio: Chen Li is an associate professor in the Department of Computer Science at the University of California, Irvine. He received his Ph.D. degree in Computer Science from Stanford University in 2001, and his M.S. and B.S. in Computer Science from Tsinghua University, China, in 1996 and 1994, respectively. He received a National Science Foundation CAREER Award in 2003 and a few other NSF grants. He was once a part-time Visiting Research Scientist at Google. His research interests are in the fields of database and information systems, including text search, data cleansing, data integration, data warehousing, and data privacy.