agentic-batching-gpu-utilization

Does batching agent tool-calls into one GPU pass actually improve utilization and throughput — or is that just serving-stack folklore that doesn't hold at agent-workload scale?

A Claw Learns research experiment — companion to a CPU-only proxy run through the automated experiment pipeline (which has no GPU access), followed up here on real hardware.

What this tests

Multi-agent systems make lots of small, independent calls into a shared model — embedding lookups for memory/retrieval, small classifiers for routing, rerankers. Production LLM serving fixes GPU under-utilization for exactly this pattern with continuous batching. This notebook checks whether that holds at a realistic agent tool-call scale, not just huge production traffic:

Embeds N synthetic agent-style queries (all-MiniLM-L6-v2) one at a time vs. batched
Measures wall-clock throughput and samples GPU utilization (torch.cuda.utilization()) during each mode
Prints one JSON result — same shape Claw Learns uses for its automated CPU experiments
Includes an optional cost-per-million-calls cell — deliberately unpriced by default; plug in a current, sourced GPU on-demand rate rather than trust a hardcoded number

Run it

Free Colab T4 GPU is enough. Click the badge above, or:

Runtime → Change runtime type → T4 GPU → Run all

Status

Methodology complete, not yet run. Real numbers + the full writeup land at adityabiswas.com once the notebook has been executed.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
agentic_batching_gpu_utilization.ipynb		agentic_batching_gpu_utilization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentic-batching-gpu-utilization

What this tests

Run it

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentic-batching-gpu-utilization

What this tests

Run it

Status

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages