Use procedures for region create/delete to fix the delete-region ghost task by CRZbulabula · Pull Request #18033 · apache/iotdb

CRZbulabula · 2026-06-25T10:19:21Z

Problem

The ConfigNode-leader periodically drained a persistent RegionMaintainer queue (PartitionManager.maintainRegionReplicas, every 10s) by sending synchronous deleteRegion RPCs to DataNodes.

For a data region this deletion can be slow (removing TsFiles + the consensus peer) and may outlive the RPC timeout. When it does:

The ConfigNode times out and retries the same task 10s later.
On the DataNode the second call re-runs the deletion, which now returns SUCCESS because the region is already gone (ConsensusGroupNotExistException is swallowed and deleting an already-removed region is a no-op).
The ConfigNode treats that SUCCESS as completion, polls the task off the queue, and forgets it — while the first call is still running on the DataNode.

The result is a ghost task: the cluster keeps executing a deletion the coordinator has already discarded.

Approach

Replace the whole RegionMaintainer queue with Procedures, mirroring the existing AddRegionPeerProcedure / RemoveRegionPeerProcedure pattern (RegionOperationProcedure<State> + submit*Task(procId, …) + waitTaskFinish polling). Both the CREATE and DELETE paths move to dedicated procedures, gaining persistence and retry across leader changes.

DataNode — async delete

New deleteRegionAsync(TMaintainPeerReq) RPC on the internal service.
New DeleteRegionTask in RegionMigrateService: runs the deletion in the background and records its terminal state in the existing taskResultMap, so the ConfigNode polls getRegionMaintainResult for progress (PROCESSING / SUCCESS / FAIL / TASK_NOT_EXIST) instead of trusting a single RPC response. The task returns immediately after taskFail and never falls through to taskSucceed, so a failed deletion can never be reported as success. Retried submits are deduped by taskId.

ConfigNode — procedures

DeleteRegionProcedure: async submit + waitTaskFinish poll, taskId = procId, idempotent on retry / leader change (submit only on first entry; the DataNode dedups by taskId).
CreateRegionProcedure: region creation already returns a final, unambiguous status from a single RPC (no ghost-task race), so it stays synchronous and the procedure only adds persistence and retry.
DeleteDatabaseProcedure and CreateRegionGroupsProcedure now spawn these as child procedures instead of offering to the queue, and every completed maintain task is logged (region, DataNode, duration).
PartitionManager.maintainRegionReplicas and the RegionCleaner scheduler are removed.

Upgrade safety (deprecate, don't delete)

The OfferRegionMaintainTasks / PollRegionMaintainTask / PollSpecificRegionMaintainTask plan types and classes are kept (@Deprecated, still deserialized and applied as no-ops) so an old ConfigNode consensus log still replays.
The PartitionInfo snapshot has no version header, so the trailing RegionMaintainer block is still consumed on load — its tasks are discarded (with a WARN naming them so any loss is observable) — and written as empty on save, keeping the byte layout unchanged.

Tests

DeleteRegionProcedureTest / CreateRegionProcedureTest — serialization round-trip through ProcedureFactory (exercises the new ProcedureType codes + factory registration).
PartitionInfoTest.testLoadLegacySnapshotWithRegionMaintainTasksIsDrained — a snapshot written with pending tasks loads cleanly under the new code and ends with an empty maintain list.
Existing ConfigPhysicalPlanSerDeTest for the retained plan types stays green (proves consensus-log replay compatibility).

All confignode + datanode unit tests above pass; spotless:check and checkstyle are clean.

@deprecated

…t task The ConfigNode-leader periodically drained a persistent RegionMaintainer queue (PartitionManager.maintainRegionReplicas) by sending synchronous deleteRegion RPCs to DataNodes. A slow data-region deletion could outlive the RPC timeout; the ConfigNode retried 10s later, and the DataNode's second call returned SUCCESS (the region was already gone and ConsensusGroupNotExistException was swallowed). The ConfigNode then polled the task off the queue and believed it was done, while the first call was still running on the DataNode - a "ghost task". Replace the whole RegionMaintainer queue with Procedures that mirror the existing AddRegionPeerProcedure/RemoveRegionPeerProcedure pattern: - DataNode: new deleteRegionAsync(TMaintainPeerReq) RPC plus a DeleteRegionTask in RegionMigrateService. The deletion runs in the background and records its terminal state in taskResultMap, so the ConfigNode polls getRegionMaintainResult for progress (PROCESSING/SUCCESS/FAIL/TASK_NOT_EXIST) instead of trusting a single RPC response. The task returns immediately after taskFail and never falls through to taskSucceed, so a failed deletion can never be reported as success. - ConfigNode: new DeleteRegionProcedure (async submit + waitTaskFinish poll, taskId = procId, idempotent on retry/leader-change) and CreateRegionProcedure (the create RPC already returns a final status, so it stays synchronous and the procedure only adds persistence and retry). DeleteDatabaseProcedure and CreateRegionGroupsProcedure now spawn these as child procedures instead of offering to the queue, and every completed task is logged. - Upgrade safety: the OfferRegionMaintainTasks/PollRegionMaintainTask/ PollSpecificRegionMaintainTask plan types and classes are kept (@deprecated, still deserialized and applied as no-ops) so an old consensus log still replays. The PartitionInfo snapshot has no version header, so the trailing RegionMaintainer block is still consumed on load (and the tasks discarded with a WARN) and written as empty on save, keeping the byte layout unchanged. Add DeleteRegionProcedureTest/CreateRegionProcedureTest and a PartitionInfo test that loads a legacy snapshot containing pending tasks.

Address review feedback: DeleteRegionProcedure now reuses the existing deleteOldRegionPeer async plumbing (submitDeleteOldRegionPeerTask -> DeleteOldRegionPeerTask), which already runs deleteLocalPeer + delete region data on a background thread keyed by taskId and is polled via getRegionMaintainResult. This removes the duplicate deleteRegionAsync RPC, the DeleteRegionTask, submitDeleteRegionTask (both the service and handler layers), the CnToDnSyncRequestType.DELETE_REGION_ASYNC entry, and the related i18n constant. Also fix a latent bug in DeleteOldRegionPeerTask.run() that this reuse exposes: on a deletePeer/deleteRegion failure it called taskFail but did NOT return, then fell through to taskSucceed - so a failed deletion was reported as success. It now returns after taskFail. deletePeer also treats ConsensusGroupNotExistException as success (the peer may already be gone when deleting a whole region replica), so a retry converges cleanly.

CRZbulabula · 2026-06-25T10:36:47Z

Follow-up (6306e96): simplified the DataNode side. Instead of adding a new deleteRegionAsync RPC + DeleteRegionTask, DeleteRegionProcedure now reuses the existing deleteOldRegionPeer plumbing (submitDeleteOldRegionPeerTask → DeleteOldRegionPeerTask), which already does deleteLocalPeer + delete-region-data on a background thread keyed by taskId and is polled via getRegionMaintainResult — exactly what region-replica deletion needs.

This removes the duplicated deleteRegionAsync RPC, DeleteRegionTask, submitDeleteRegionTask (service + handler), the DELETE_REGION_ASYNC request type and its i18n constant (net −254 lines on the DataNode side).

It also fixes a latent bug in DeleteOldRegionPeerTask.run() that the reuse exposes: on a deletePeer/deleteRegion failure it called taskFail but did not return, then fell through to taskSucceed, so a failed deletion was reported as success. It now returns after taskFail, and deletePeer treats ConsensusGroupNotExistException as success so a retry converges cleanly. This also hardens the existing RemoveRegionPeer/migration path that already uses this task.

The ConfigNode-side CreateRegionProcedure/DeleteRegionProcedure remain separate procedures: Add/RemoveRegionPeerProcedure model peer-membership changes (coordinator, consensus pipes, transfer-leader, resetPeerList rollback, partition-table edits), which is a different concern from region-group create/delete, so reusing them there would drag in irrelevant/harmful steps.

codecov · 2026-06-25T11:23:38Z

Codecov Report

❌ Patch coverage is 23.33333% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.44%. Comparing base (1d893f1) to head (35a3150).
⚠️ Report is 11 commits behind head on master.

Files with missing lines	Patch %	Lines
...cedure/impl/region/RemoveRegionGroupProcedure.java	23.95%	73 Missing ⚠️
...edure/impl/region/CreateRegionGroupsProcedure.java	0.00%	7 Missing ⚠️
...procedure/impl/schema/DeleteDatabaseProcedure.java	0.00%	4 Missing ⚠️
.../apache/iotdb/db/service/RegionMigrateService.java	0.00%	4 Missing ⚠️
...ignode/procedure/state/RemoveRegionGroupState.java	0.00%	2 Missing ⚠️
...b/confignode/procedure/store/ProcedureFactory.java	50.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18033      +/-   ##
============================================
+ Coverage     41.26%   41.44%   +0.17%     
  Complexity      318      318              
============================================
  Files          5273     5284      +11     
  Lines        368462   369313     +851     
  Branches      47692    47789      +97     
============================================
+ Hits         152040   153052    +1012     
+ Misses       216422   216261     -161

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…roups via procedure When CreateRegionGroupsProcedure creates a RegionGroup and some replicas fail but the group still reaches a serving quorum, the missing replicas are offered to the RegionMaintainer queue (RegionCreateTask) and recreated by the RegionCleaner. When the group cannot reach a quorum, its created replicas are torn down by a single child RemoveRegionGroupProcedure. DeleteDatabaseProcedure deletes each region group through a child RemoveRegionGroupProcedure, which removes every replica's peer and data with deleteOldRegionPeer (no quorum required) and polls the DataNode for the async result, so the DatabasePartitionTable is dropped only after the regions are gone. Drop CreateRegionProcedure and DeleteRegionProcedure; RemoveRegionGroupProcedure reuses procedure type code 207.

sonarqubecloud · 2026-06-26T12:49:23Z

Quality Gate passed

Issues
9 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

CRZbulabula added 2 commits June 25, 2026 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use procedures for region create/delete to fix the delete-region ghost task#18033

Use procedures for region create/delete to fix the delete-region ghost task#18033
CRZbulabula wants to merge 3 commits into
masterfrom
improve-region-maintain-async

CRZbulabula commented Jun 25, 2026

Uh oh!

CRZbulabula commented Jun 25, 2026

Uh oh!

codecov Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

CRZbulabula commented Jun 25, 2026

Problem

Approach

DataNode — async delete

ConfigNode — procedures

Upgrade safety (deprecate, don't delete)

Tests

Uh oh!

CRZbulabula commented Jun 25, 2026

Uh oh!

codecov Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud Bot commented Jun 26, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 25, 2026 •

edited

Loading