Skip to content

pd: document CPU-aware hot region scheduling#23140

Open
lhy1024 wants to merge 2 commits into
pingcap:masterfrom
lhy1024:docs/cpu-aware-hot-region-scheduling-v857
Open

pd: document CPU-aware hot region scheduling#23140
lhy1024 wants to merge 2 commits into
pingcap:masterfrom
lhy1024:docs/cpu-aware-hot-region-scheduling-v857

Conversation

@lhy1024

@lhy1024 lhy1024 commented Jun 24, 2026

Copy link
Copy Markdown
Member

What is changed, added or deleted? (Required)

Document CPU-aware Hot Region scheduling for read hotspots introduced in v8.5.7:

  • Add flow_cpu and cpu-read-rate output descriptions for pd-ctl hot.
  • Add min-hot-cpu-rate, cpu-rate-rank-step-ratio, and CPU-aware read-priorities behavior for balance-hot-region-scheduler.
  • Add troubleshooting guidance for CPU-aware read hotspot scheduling.
  • Add the Store read cpu panel description in the PD Grafana dashboard doc.

Which TiDB version(s) do your changes apply to? (Required)

  • master (the latest development version)
  • v8.5 (TiDB 8.5 versions)

Need cherry-pick to release-8.5 because the feature is requested for v8.5.7.

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Add a release note
  • Add or modify documentation for a feature

Which TiDB components does this PR affect?

  • TiDB
  • PD
  • TiKV

How do you ensure that your changes are correct? (Required)

  • Checked local /home/lhy1024/pd and /home/lhy1024/tikv implementation for fields, defaults, and version fallback behavior.
  • Ran ./scripts/markdownlint pd-control.md troubleshoot-hot-spot-issues.md grafana-pd-dashboard.md.
  • Ran git diff --check.

@ti-chi-bot

ti-chi-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lance6716 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the missing-translation-status This PR does not have translation status info. label Jun 24, 2026
@lhy1024 lhy1024 added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Jun 24, 2026
@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 24, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation across grafana-pd-dashboard.md, pd-control.md, and troubleshoot-hot-spot-issues.md to introduce CPU-aware Hot Region scheduling for read hotspots starting from v8.5.7. The changes document new metrics, fields, configuration parameters, and scheduling priorities. The reviewer's feedback focuses on improving readability, simplifying sentence structures, and adhering to the style guide by replacing passive voice with active voice.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread grafana-pd-dashboard.md Outdated
- Total read bytes on hot peer Regions: The total read bytes of peers that have become read hotspots on each TiKV instance
- Store read rate bytes: The total read bytes of each TiKV instance
- Store read rate keys: The total read keys of each TiKV instance
- Store read cpu: The read CPU usage of each TiKV instance, which PD uses for CPU-aware read hotspot scheduling starting from v8.5.7

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

To improve readability and flow, split this into two sentences and avoid the relative clause.

Suggested change
- Store read cpu: The read CPU usage of each TiKV instance, which PD uses for CPU-aware read hotspot scheduling starting from v8.5.7
- Store read cpu: The read CPU usage of each TiKV instance. PD uses this metric for CPU-aware read hotspot scheduling starting from v8.5.7.

Comment thread pd-control.md Outdated
}
```

Starting from v8.5.7, `hot read` and `hot history` include `flow_cpu`, and `hot store` includes `cpu-read-rate`. These fields show read CPU usage for CPU-aware read hotspot scheduling.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Clarify that these are commands to make the instructions more precise for users.

Suggested change
Starting from v8.5.7, `hot read` and `hot history` include `flow_cpu`, and `hot store` includes `cpu-read-rate`. These fields show read CPU usage for CPU-aware read hotspot scheduling.
Starting from v8.5.7, the `hot read` and `hot history` commands include `flow_cpu`, and the `hot store` command includes `cpu-read-rate`. These fields show the read CPU usage for CPU-aware read hotspot scheduling.

Comment thread pd-control.md Outdated
scheduler config balance-hot-region-scheduler set min-hot-query-rate 10
```

- `min-hot-cpu-rate` means the smallest CPU usage of read requests to be counted, which is usually 10.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ('to be counted') by using active voice ('to count') per the style guide.

Suggested change
- `min-hot-cpu-rate` means the smallest CPU usage of read requests to be counted, which is usually 10.
- `min-hot-cpu-rate` specifies the minimum CPU usage of read requests to count, which is usually 10.
References
  1. Avoid passive voice overuse. For example, rewrite 'to be counted' to 'to count'. (link)

Comment thread pd-control.md Outdated
```

- `byte-rate-rank-step-ratio`, `key-rate-rank-step-ratio`, `query-rate-rank-step-ratio`, and `count-rank-step-ratio` respectively mean the step ranks of byte, key, query, and count. The rank-step-ratio decides the step when the rank is calculated. `great-dec-ratio` and `minor-dec-ratio` are used to determine the `dec` rank. Usually, you do not need to modify these items.
- `byte-rate-rank-step-ratio`, `key-rate-rank-step-ratio`, `query-rate-rank-step-ratio`, `cpu-rate-rank-step-ratio`, and `count-rank-step-ratio` respectively mean the step ranks of byte, key, query, CPU, and count. The rank-step-ratio decides the step when the rank is calculated. `great-dec-ratio` and `minor-dec-ratio` are used to determine the `dec` rank. Usually, you do not need to modify these items.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ('is calculated', 'are used') and improve sentence structure to make it more active and concise.

Suggested change
- `byte-rate-rank-step-ratio`, `key-rate-rank-step-ratio`, `query-rate-rank-step-ratio`, `cpu-rate-rank-step-ratio`, and `count-rank-step-ratio` respectively mean the step ranks of byte, key, query, CPU, and count. The rank-step-ratio decides the step when the rank is calculated. `great-dec-ratio` and `minor-dec-ratio` are used to determine the `dec` rank. Usually, you do not need to modify these items.
- `byte-rate-rank-step-ratio`, `key-rate-rank-step-ratio`, `query-rate-rank-step-ratio`, `cpu-rate-rank-step-ratio`, and `count-rank-step-ratio` represent the step ranks of byte, key, query, CPU, and count, respectively. The rank-step-ratio decides the step when calculating the rank. PD uses `great-dec-ratio` and `minor-dec-ratio` to determine the `dec` rank. Usually, you do not need to modify these items.
References
  1. Avoid passive voice overuse. For example, rewrite 'when the rank is calculated' to 'when calculating the rank'. (link)

Comment thread pd-control.md Outdated
> If a cluster component is earlier than v5.2, the configuration of `query` dimension does not take effect. If some components are upgraded to v5.2 or later, the `byte` and `key` dimensions still by default have the priority for hot Region scheduling. After all components of the cluster are upgraded to v5.2 or later, such a configuration still takes effect for compatibility. You can view the real-time configuration using the `pd-ctl` command. Usually, you do not need to modify these configurations.
> If a cluster component is earlier than v5.2, the configuration of the `query` dimension does not take effect. If some components are upgraded to v5.2 or later, the `byte` and `key` dimensions still by default have the priority for hot Region scheduling. After all components of the cluster are upgraded to v5.2 or later, such a configuration still takes effect for compatibility.
>
> Starting from v8.5.7, TiKV reports read CPU usage for hot Region scheduling. For clusters that support read CPU reporting, the default `read-priorities` value is `cpu,byte`. For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the `query` dimension is also unsupported. You can view the real-time configuration using the `pd-ctl` command. Usually, you do not need to modify these configurations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ('is also unsupported') by using active voice ('does not support... either') per the style guide.

Suggested change
> Starting from v8.5.7, TiKV reports read CPU usage for hot Region scheduling. For clusters that support read CPU reporting, the default `read-priorities` value is `cpu,byte`. For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the `query` dimension is also unsupported. You can view the real-time configuration using the `pd-ctl` command. Usually, you do not need to modify these configurations.
> Starting from v8.5.7, TiKV reports read CPU usage for hot Region scheduling. For clusters that support read CPU reporting, the default `read-priorities` value is `cpu,byte`. For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the cluster does not support the `query` dimension either. You can view the real-time configuration using the `pd-ctl` command. Usually, you do not need to modify these configurations.
References
  1. Avoid passive voice overuse. For example, rewrite 'is also unsupported' to 'does not support... either'. (link)

Comment thread troubleshoot-hot-spot-issues.md Outdated
Comment on lines +188 to +190
Starting from v8.5.7, PD supports CPU-aware Hot Region scheduling for read hotspots. TiKV reports per-Region read CPU usage in store heartbeats, and PD can use CPU usage as a scheduling dimension. This helps PD identify read hotspots whose QPS or byte throughput looks balanced but whose TiKV CPU usage is still uneven, such as workloads that contain queries with different CPU costs or clusters that contain TiKV nodes with different performance profiles.

For clusters whose TiKV version supports read CPU reporting, the default `read-priorities` value of `balance-hot-region-scheduler` is `cpu,byte`. For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the `query` dimension is also unsupported. To view or adjust the scheduling dimensions, use [`pd-ctl scheduler config balance-hot-region-scheduler`](/pd-control.md#scheduler-config-balance-hot-region-scheduler).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Simplify the wording to avoid unnecessary relative clauses and rewrite passive voice ('is also unsupported') to active voice per the style guide.

Suggested change
Starting from v8.5.7, PD supports CPU-aware Hot Region scheduling for read hotspots. TiKV reports per-Region read CPU usage in store heartbeats, and PD can use CPU usage as a scheduling dimension. This helps PD identify read hotspots whose QPS or byte throughput looks balanced but whose TiKV CPU usage is still uneven, such as workloads that contain queries with different CPU costs or clusters that contain TiKV nodes with different performance profiles.
For clusters whose TiKV version supports read CPU reporting, the default `read-priorities` value of `balance-hot-region-scheduler` is `cpu,byte`. For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the `query` dimension is also unsupported. To view or adjust the scheduling dimensions, use [`pd-ctl scheduler config balance-hot-region-scheduler`](/pd-control.md#scheduler-config-balance-hot-region-scheduler).
Starting from v8.5.7, PD supports CPU-aware Hot Region scheduling for read hotspots. TiKV reports per-Region read CPU usage in store heartbeats, and PD can use CPU usage as a scheduling dimension. This helps PD identify read hotspots whose QPS or byte throughput looks balanced but whose TiKV CPU usage is still uneven, such as workloads with different CPU-cost queries or clusters with different TiKV performance profiles.
For clusters that support read CPU reporting, the default `read-priorities` value of `balance-hot-region-scheduler` is `cpu,byte`. For clusters that do not support read CPU reporting, PD automatically falls back to `query,byte`, or to `byte,key` if the cluster does not support the `query` dimension either. To view or adjust the scheduling dimensions, use [`pd-ctl scheduler config balance-hot-region-scheduler`](/pd-control.md#scheduler-config-balance-hot-region-scheduler).
References
  1. Avoid passive voice overuse. For example, rewrite 'is also unsupported' to 'does not support... either'. (link)
  2. Avoid unnecessary words and repetition. (link)

@qiancai qiancai self-assigned this Jun 25, 2026
@qiancai qiancai added v9.0-beta.3 This PR/issue applies to TiDB v9.0-beta.3. translation/doing This PR's assignee is translating this PR. labels Jun 25, 2026
@ti-chi-bot ti-chi-bot Bot removed the missing-translation-status This PR does not have translation status info. label Jun 25, 2026
github-actions Bot pushed a commit to doc-claw-bot/pingcap-docsite-preview that referenced this pull request Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. translation/doing This PR's assignee is translating this PR. v9.0-beta.3 This PR/issue applies to TiDB v9.0-beta.3.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants