• A Unified Framework for Polyglot Factorized Machine Learning - David Justo


  • Abstract:
    Data Science workflows have become increasingly polyglot. For example, data loading, exploration, and pre-processing may occur in a specialized language like R while model training could occur in Python, where premier Machine Learning support abounds. Finally, trained models are often deployed as part of JavaScript applications for the Web.
    In this setting, a unified runtime environment for all these systems may unlock exciting new opportunities for developer tooling and optimizations. The GraalVM platform is one such system, offering a unified virtual machine supporting a variety of languages including Python, R, JavaScript, Java, Kotlin, and many more. Using GraalVM, we explore the space of multi-language optimizations by extending ADA Lab’s Morpheus system, a framework to optimize machine learning models when operating over multi-table datasets, to this setting.
    In this work, we develop a host-language-agnostic implementation of Morpheus and embed it within GraalVM’s language interpreter service. By doing this, we are able to re-use a single specification of the Morpheus system to optimize many GraalVM languages. Furthermore, we show that casting Morpheus as an interpreter-level service in GraalVM enables us to optimize polyglot systems as well. This means, for instance, executing a Python ML algorithm using data represented as an R matrix while optimizing it all via the Morpheus service!!
    Come learn about the GraalVM’s unified runtime, Futamura projections, Programming Language Implementation, and the surprising benefits unlocked by lyfting matrices to become first-class-datatypes of a polyglot language interpreter. No prior experience required!
     
    Bio:
    David Justo is an MS student advised by Nadia Polikarpova and Arun Kumar at UC San Diego. His interests lie at the intersection of Programming Languages, Databases, and Computational Logic; he strives to develop useful abstractions to optimize program execution and increase programmer productivity.