Blog
2026
Breaking GPU Hardware Limits: Micro-benchmark Methodology, PTX Assembly, and Hopper Architecture
04-27
CUDA Performance Profiling Cornerstone: Toolchains, Warp Scheduling, and Nsight Compute
04-26
Math Agents: Mathematical Reasoning and Formal Proofs in LLMs
04-08
Tool Agents: Empowering LLMs to Use Tools and Explore Environments
04-08
Coding Agents: Evaluation, Frameworks, and Code LLMs
04-08
When LLMs Learn Memory, Reasoning, and Planning: The Three Core Capabilities of Language Agents
03-12
LLM Reasoning: Prompting, Multi-Path Search, and Iterative Self-Improvement
03-08
RLHF and Test-Time Compute: Reinforcement Learning and Inference-Time Optimization for LLMs
03-08
LLM Basics: Pretraining, Prompting, Fine-tuning and Reinforcement Learning
03-08
2025
The Evolution of Attention: From MHA to MLA and KV Cache Optimization
12-30
Computational and Communication Modeling of LLM Serving System
11-18
2021
Docker Containers and Images
12-22