Fix use-after-free in async host ledger reads on shutdown#8003
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a shutdown-only use-after-free risk in the host’s async ledger read path (uv_work_t threadpool jobs) by ensuring worker callbacks never dereference a Ledger* after the owning asynchost::Ledger has been destroyed. It does this by introducing a shared control block that outlives the Ledger and by having ~Ledger() wait for in-flight worker callbacks to finish before allowing member destruction.
Changes:
- Added a
shared_ptr-ownedAsyncReadState(mutex + condition variable +active_workers+ledger_alive) shared betweenLedgerand eachAsyncLedgerGet. - Updated
on_ledger_get_asyncto (a) skip reads once shutdown begins and (b) register/deregister active worker callbacks so~Ledger()can safely drain them. - Added a CHANGELOG entry documenting the shutdown use-after-free fix.
Custom instructions used:
.github/copilot-instructions.md.github/instructions/reviewing.instructions.md
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/host/ledger.h |
Introduces AsyncReadState and shutdown coordination so async libuv workers don’t access a destroyed Ledger during/after shutdown. |
CHANGELOG.md |
Documents the shutdown UAF fix with a PR reference. |
|
No blocking issues found. I reviewed PR #8003 against main, focusing on Correctness:
Non-blocking suggestions:
CI note: At the time of review, the main VM/ACI checks for the current head were still queued or in progress; ASAN/TSAN and long jobs were skipped by workflow configuration. Custom instructions used:
|
Make the active worker accounting exception-safe and clarify that AsyncReadState protects worker access to Ledger while completion callbacks are Ledger-independent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Async
uv_work_tledger reads insrc/host/ledger.hcould have their worker and completion callbacks run after the hostLedgerwas destroyed during shutdown (uv_stopends the main loop, then the post-loop flushing iterations run afterLedgerhas left scope inrun_main_loop). The worker dereferenceddata->ledger, a use-after-free. Per @eddyashton's analysis, the other flagged sites (timer.h,process_launcher.h,Signal,HTTPParser) are already safe via libuv proxy/close pointers orrun-scoped lifetimes.Changes (all in
src/host/ledger.h)AsyncReadStatecontrol block -shared_ptr-owned byLedgerand copied into every outstandingAsyncLedgerGet. Holds a mutex, condition variable,active_workerscount, andledger_aliveflag. It outlives theLedger, so worker callbacks can safely decide whether they may still access it during shutdown.on_ledger_get_async- registers as an active worker under the lock; ifledger_aliveis already false it skips the read entirely (logged at DEBUG). The rawLedger*is only dereferenced after observingledger_alive == trueand incrementing the count, which forces the destructor to wait. An RAII guard decrementsactive_workersand notifies waiters on every exit path from the worker.~Ledger()- setsledger_alive = falseand blocks untilactive_workers == 0, draining in-flight reads before member destruction.Ledger-independent, writing back through a capturedto_enclaveshared_ptr.The lock ordering is safe: a worker either registers before the destructor checks (destructor waits,
Ledgerstays alive through the read) or observesledger_alive == falseand never touches theLedger. TheAsyncReadStatestays alive via the worker's ownshared_ptr, so member-destruction order is irrelevant.Scope
Shutdown-only; running nodes are unaffected. A
CHANGELOG.mdentry is included.