所有分类 - 冬青的个人主页

所有分类

博客

大模型推理：Prompting、多路径搜索与迭代自改进

RLHF 与 Test-Time 算力：大模型强化学习与推理时优化

大模型基础：预训练、提示、微调与强化学习

Attention的多种变体

LLM Serving System的计算与通信建模

更多 >>

论文笔记

SoCC'20 | InferLine: latency-aware provisioning and scaling for prediction serving pipelines

MobiSys'21 | nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices