From CoT and analogical prompting to self-consistency, ORM/PRM verification, tree-of-thoughts, multi-round self-reflection and token budget allocation, with the Bitter Lesson in mind.
From reward design, policy gradient, and PPO to RLHF/RLVR, then inference-time sampling and verification, Archon architecture search, and when to use RL vs test-time scaling.
An overview of core methods for training and using large language models: compute and scaling, prompting, fine-tuning, and reinforcement learning.
This article analyzes the computational and communication overhead patterns in LLM Serving systems under different parallelism strategies.
In the paper, authors introduced a new work scheduler to improve both work efficiency and parallelism for the Single Source Shortest Path search.
This paper shows that by using the execution statistics of standalone workloads and the fairness of execution when these workloads are executed with three representative microbenchmarks, we can get a reasonably accurate prediction.