Blog - Category - Weile Luo's homepage

Blog

2026

Disaggregated LLM Serving: From PD Disaggregation to Attention Offloading 06-25

Multimodal Agents: When LLMs Open Their Eyes to the World 06-23

Deep Research: How LLMs Evolve into Full-Stack AI Scientists 04-29

Breaking GPU Hardware Limits: Micro-benchmark Methodology, PTX Assembly, and Hopper Architecture 04-27

CUDA Performance Profiling Cornerstone: Toolchains, Warp Scheduling, and Nsight Compute 04-26

Math Agents: Mathematical Reasoning and Formal Proofs in LLMs 04-08

Tool Agents: Empowering LLMs to Use Tools and Explore Environments 04-08

Coding Agents: Evaluation, Frameworks, and Code LLMs 04-08

When LLMs Learn Memory, Reasoning, and Planning: The Three Core Capabilities of Language Agents 03-12

LLM Reasoning: Prompting, Multi-Path Search, and Iterative Self-Improvement 03-08

RLHF and Test-Time Compute: Reinforcement Learning and Inference-Time Optimization for LLMs 03-08

LLM Basics: Pretraining, Prompting, Fine-tuning and Reinforcement Learning 03-08

2025

The Evolution of Attention: From MHA to MLA and KV Cache Optimization 12-30

Computational and Communication Modeling of LLM Serving System 11-18

2021

Docker Containers and Images 12-22