A systematic review of Tool Agents, covering execution paradigms (Toolkens, CodeAct), tool induction and self-learning (Toolformer), and environment representation and exploration (Set-of-Mark, BAGEL).
A comprehensive overview of Coding Agents, detailing fine-grained evaluation benchmarks (SWE-bench, LiveCodeBench), agentic frameworks (SWE-agent vs. Agentless), and the mechanisms of code localization and safety.
From the definition of Agents and Language Agents and their three generations, through memory (episodic/semantic/procedural, RAG, HippoRAG), reasoning (ReAct interleaved with action), and planning (reactive, tree search, world models, WebDreamer), to a unified picture and the Bitter Lesson.
From CoT and analogical prompting to self-consistency, ORM/PRM verification, tree-of-thoughts, multi-round self-reflection and token budget allocation, with the Bitter Lesson in mind.
From reward design, policy gradient, and PPO to RLHF/RLVR, then inference-time sampling and verification, Archon architecture search, and when to use RL vs test-time scaling.
An overview of core methods for training and using large language models: compute and scaling, prompting, fine-tuning, and reinforcement learning.