HDDS-15059. Shift streaming write sortDatanodes logic to OM#10633
HDDS-15059. Shift streaming write sortDatanodes logic to OM#10633chihsuan wants to merge 10 commits into
Conversation
There was a problem hiding this comment.
@ivandika3 Is there any risk that a block ends up allocated on a suboptimal datanodes/pipeline because the OM cache topology is somehow stale?
|
@peterxcli thanks for checking.
Should have effect on performance but not correctness, since Streaming Write Pipeline should be able to pick an arbitrary topology. The the data path (streaming WriteChunk data) can be sent to any primary (first node) is separated from the metadata path (PutBlock) which will be sent to the DN leader. The impact of suboptimal Streaming write pipeline topology should be worse write latency (e.g. if the topology picks the furthest node as the primary node). But this possible performance penalty also apply to read path (i.e. where the further node is read first) so I think it should be acceptable. Please let me know if I miss something. @chihsuan Thanks for the patch, I'll review this soon. |
What changes were proposed in this pull request?
SCM sorts the write pipeline (nearest datanode first) on every
allocateBlock, on its block-allocation hot path. OM already caches the cluster topology (HDDS-9343) and sorts reads locally, so this PR moves the write sort to OM.OMKeyRequest.allocateBlocksends an emptyclientMachineto SCM (SCM skips sorting) and sorts each pipeline locally via a newKeyManager.sortDatanodesForWrite; the result is cached per pipeline.nodeManager.getNodedid.UserInfo.remoteAddressis always an IP), so a client co-located on a datanode is recognized even withhdds.datanode.use.datanode.hostnameenabled.SCMBlockProtocolServeris unchanged for rolling-upgrade safety: an old OM still gets SCM-side sorting, a new OM's empty address is a no-op for SCM. No Protobuf/RPC change.Note: SCM's
ALLOCATE_BLOCKaudit now logsclient=""for OM-originated writes; the per-client audit stays at OM.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15059
How was this patch tested?
TestOMAllocateBlockRequest): SCM receives an emptyclientMachine; the sorted order is applied to every block; a shared pipeline is sorted once.TestOMSortDatanodes): nearest datanode is first for writes, including for RPC-deserialized (protobuf round-tripped) pipeline nodes; order is preserved for an empty or unresolved client; the client is matched by both IP and hostname.build-branchCI: https://github.com/chihsuan/ozone/actions/runs/28455187435Generated-by: Claude Code (Claude Opus 4.8)