• Compiling Classical ML Pipelines into Tensor Computations for One-size-fits-all Prediction Serving - Supun Nakandala


  • Abstract:
    Recent advances in Deep Neural Networks (DNNs) and the subsequent explosion of DNN frameworks have fostered the creation of a new class of systems. ONNX, TVM, and TensorRT are notable examples of such systems: they share the same goal of providing a runtime for DNN model inference with state-of-the-art performance, ease of deployment on hardware accelerators (e.g., GPUs), and portability across platforms and devices. Yet, in the enterprise space data is mostly tabular and classical Machine Learning (ML) techniques such as tree methods are frequently used, often within complex pipelines composed of data featurizers and feature selection operators. Unfortunately, in the classical ML space no unified inference serving system exists. Therefore, developers are forced to resort to bespoken solutions or subpar performance. In this talk I will present HUMMINGBIRD: a system able to compile classical ML pipelines end-to-end into tensor computations. It thereby seamlessly leverages the features provided by DNN inference systems, e.g., ease of deployment, operator optimizations, and GPU support. I will discuss the challenges, initial system prototype, and promising initial empirical results.
     
    Bio:
    Supun Nakandala is a third-year Ph.D. student advised by Prof. Arun Kumar at UC San Diego. His research interest lies broadly in the intersection Systems and Machine Learning, an emerging area that is increasingly referred to as Systems for ML.