Breaking GPU Hardware Limits: Micro-benchmark Methodology, PTX Assembly, and Hopper Architecture

Author published on 2026-04-27 included in Blog

CUDA Micro-benchmark Series (Part 2): Exploring how to benchmark ultimate GPU compute and bandwidth, understanding Memory Consistency, and mastering PTX inline assembly and Hopper (H100) TMA/WGMMA asynchronous features.

CUDA Performance Profiling Cornerstone: Toolchains, Warp Scheduling, and Nsight Compute

Author published on 2026-04-26 included in Blog

CUDA Micro-benchmark Series (Part 1): An in-depth exploration of CUDA compilation workflows, binary analysis tools, GPU Warp scheduling mechanisms, and how to conduct deep performance and stall profiling using Nsight Compute.

Math Agents: Mathematical Reasoning and Formal Proofs in LLMs

Author published on 2026-04-08 included in Blog

A deep dive into the frontier of mathematical LLMs: from the current SFT and GRPO recipes, to the introduction of formal mathematics (Lean), dissecting the AlphaProof workflow, symbolic reasoning pruning (LIPS), and the evaluation challenges in autoformalization.

Tool Agents: Empowering LLMs to Use Tools and Explore Environments

Author published on 2026-04-08 included in Blog

A comprehensive deep dive into Tool Agents: from Toolken vocabulary injection and CodeAct execution, to DocPrompting, Toolformer self-learning, visual Set-of-Mark grounding, and autonomous environment exploration.

Coding Agents: Evaluation, Frameworks, and Code LLMs

Author published on 2026-04-08 included in Blog

A comprehensive deep dive into Coding Agents, detailing fine-grained evaluation benchmarks (SWE-bench, LiveCodeBench), agentic frameworks (SWE-agent vs. Agentless), and the sophisticated mechanisms of code localization, code efficiency, and LLM safety.

When LLMs Learn Memory, Reasoning, and Planning: The Three Core Capabilities of Language Agents

Author published on 2026-03-12 included in Blog

From the definition of Agents and Language Agents and their three generations, through memory (episodic/semantic/procedural, RAG, HippoRAG), reasoning (ReAct interleaved with action), and planning (reactive, tree search, world models, WebDreamer), to a unified picture and the Bitter Lesson.

Weile Luo

Breaking GPU Hardware Limits: Micro-benchmark Methodology, PTX Assembly, and Hopper Architecture

CUDA Performance Profiling Cornerstone: Toolchains, Warp Scheduling, and Nsight Compute

Math Agents: Mathematical Reasoning and Formal Proofs in LLMs

Tool Agents: Empowering LLMs to Use Tools and Explore Environments

Coding Agents: Evaluation, Frameworks, and Code LLMs

When LLMs Learn Memory, Reasoning, and Planning: The Three Core Capabilities of Language Agents