ChemXSeer: A Digital Library for Chemical Kinetics Data & Scientific Literature Prasenjit Mitra Scientists have digital documents and experimental data that they want to publish, link, and share. ChemXSeer is an ongoing NSF- funded project that aims to establish a digital library for documents and data related to chemical kinetics. This talk will introduce the architecture and algorithms deployed for the following components: (a) Data extraction: (i) TableSeer : This tool automatically identifies tables in digital documents and extracts the contents in the cells of the tables. The contents are stored in a queryable table in a database. TableSeer extracts table metadata, and uses a novel ranking function to search for tables relevant to user queries. (ii) Extraction of data from two- dimensional plots in figures in digital documents using image processing techniques, (b) Chemical Entity Search : We seek to enable improved search capabilities for chemists. Such tools are absolutely vital for easy access of information by chemists. Our tool identifies chemical formulae and chemical names, disambiguates the terms from other general terms using hierarchical Conditional Random Fields, and tags them. Novel similarity scores, ranking functions and search methods are introduced to enable searching for chemical entities. I will also briefly talk about our efforts in populating chemical kinetics databases with both extracted data and data submitted by domain scientists. Our tools can process, store and link data in multiple formats, e.g., Excel, XML, Gaussian, and Charmm. A metadata ad-on can help annotate the data and link multiple datasets. The metadata is then used to link the data to published articles allow the end-user to search for relevant data and examine the articles that contain descriptions of the data. Biography: Prasenjit Mitra is an assistant professor at the College of Information Sciences and Technology at the Pennsylvania State University. He received his Ph.D. in Electrical Engineering from Stanford University in 2004. Prior to that, he had received a Master of Science degree in Computer Science at The University of Texas at Austin in December 1994. His Bachelor of Technology (with Honors) degree in Computer Science and Engineering was received from the Indian Institute of Technology, Kharagpur in May, 1993. From 1995, he was a senior member of the technical staff at the Server Technologies division at Oracle Corporation in Redwood Shores, CA for five years developing database management systems software. He also held a position as a senior engineer at Narus, Inc. and DBWizards working on database applications. His primary research interests are in information extraction from the world- wide web. He has also pursued research on data mining, digital libraries, database systems, information retrieval, and artificial intelligence. He has published around 50 scholarly articles in peer-reviewed journals, conferences and workshops including Science, WWW, WebDb, SIGIR, AAAI, TKDE, PODS, EDBT, etc. He has served on the program committees of AAAI, WWW, CIKM, ICDM and other prestigious conferences.