SoCC'20 | InferLine: latency-aware provisioning and scaling for prediction serving pipelines
This paper introduces InferLine, a system for provisioning and managing each stage of prediction pipelines to meet end-to-end tail latency constraints while minimizing cost.