As increasing volumes of RDF data are being produced and analyzed, many massively distributed architectures have been proposed for storing and querying this data. These architectures are characterized first, by their RDF partitioning and storage method, and second, by their approach for distributed query optimization, i.e., determining which operations to execute on each node in order to compute the query answers. We present CliqueSquare, a novel optimization approach for evaluating conjunctive RDF queries in a massively parallel environment.We focus on reducing query response time, and thus seek to build flat plans, where the number of joins encountered on a root-to-leaf path in the plan is minimized. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. We have deployed our algorithms in a Map Reduce based RDF platform and demonstrate experimentally the interest of the flat plans built by our best algorithms.
By: Stamatis Zampetakis (INRIA)
If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein