Skip to content

i18n(ja): standardize SQL type names to English#23145

Open
yahonda wants to merge 13 commits into
pingcap:i18n-ja-release-8.5from
yahonda:i18n-ja-fix-type-names
Open

i18n(ja): standardize SQL type names to English#23145
yahonda wants to merge 13 commits into
pingcap:i18n-ja-release-8.5from
yahonda:i18n-ja-fix-type-names

Conversation

@yahonda

@yahonda yahonda commented Jun 25, 2026

Copy link
Copy Markdown
Member

What is changed, added or deleted? (Required)

Standardize SQL and programming type names from Japanese transliterations to canonical English forms across 14 files (~213 changes).

All SQL type identifiers in mapping tables are now in English (matching the EN source), while table column headers are kept in Japanese for readability. See the commit message for the full mapping of 30+ type name replacements.

This covers:

  • TiCDC protocol docs (5): ticdc-avro-protocol.md, ticdc-canal-json.md, ticdc-csv.md, ticdc-open-protocol.md, ticdc-simple-protocol.md — Avro/SQL type mapping tables
  • Type mapping tables (3): develop/serverless-driver.md, tidb-cloud/serverless-export.md
  • Parquet import tables (2): tidb-cloud/import-parquet-files-serverless.md, tidb-cloud/import-parquet-files.md
  • Schema examples (2): develop/dev-guide-bookshop-schema-design.md, develop/dev-guide-unique-serial-number-generation.md
  • Variable type descriptions (2): system-variables.md, tidb-configuration-file.md (タイプ: フロート → タイプ: float)
  • Audit log (1): tidb-cloud/tidb-cloud-auditing.md

Not changed: prose uses of ダブルクリック (double click), ダブルクォート (double quote), バイト配列 (byte array) in running text.

Which TiDB version(s) do your changes apply to? (Required)

  • master (the latest development version)
  • v8.5 (TiDB 8.5 versions)
  • v8.4 (TiDB 8.4 versions)
  • v8.3 (TiDB 8.3 versions)
  • v8.2 (TiDB 8.2 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)
  • v5.3 (TiDB 5.3 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot Bot added the area/develop This PR relates to the area of TiDB App development. label Jun 25, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hfxsd for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added missing-translation-status This PR does not have translation status info. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 25, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request standardizes database type names (such as SMALLINT, BIGINT, float, double, and VARCHAR) across several Japanese documentation files, replacing their Japanese transliterations with standard English technical terms. The review feedback identifies three issues: a duplicate table header row in ticdc-avro-protocol.md, a typo (VARCHAR々) in ticdc-canal-json.md that should be corrected to VARCHAR, and an incorrect uppercase Golang type (FLOAT64) in ticdc-simple-protocol.md that should be changed to lowercase float64 for technical accuracy.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ticdc/ticdc-avro-protocol.md
Comment thread ticdc/ticdc-canal-json.md Outdated
Comment thread ticdc/ticdc-simple-protocol.md Outdated
yahonda added a commit to yahonda/docs that referenced this pull request Jun 25, 2026
…lapping files)

Cherry-picked commit bdbb6f2 but excluded the 8 files that
PR pingcap#23145 (i18n-ja-fix-type-names) handles independently.

Kept changes to: data-type-json.md, ticdc/ticdc-debezium.md,
tidb-cloud/data-service-app-config-files.md,
tidb-cloud/tidb-cloud-console-auditing.md,
ai/integrations/vector-search-integrate-with-langchain.md,
functions-and-operators/numeric-functions-and-operators.md
@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 25, 2026
@yahonda yahonda force-pushed the i18n-ja-fix-type-names branch 6 times, most recently from ff8ff2a to 1775cf7 Compare June 25, 2026 05:23
@qiancai qiancai added translation/no-need No need to translate this PR. and removed missing-translation-status This PR does not have translation status info. labels Jun 25, 2026
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 25, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

[LGTM Timeline notifier]

Timeline:

  • 2026-06-25 05:44:04.103209084 +0000 UTC m=+185527.766434617: ☑️ agreed by qiancai.

@qiancai

qiancai commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

hi @yahonda, would you please resolve the conflicts of this PR? Thanks.

yahonda added 10 commits June 25, 2026 14:48
Replace Japanese transliterations/katakana of SQL/programming type
identifiers with canonical English names in all documentation files
where they appeared as type labels in mapping tables:

  SQL TYPE column (uppercase, matching EN source)
  - スモールイント → SMALLINT
  - ミディアムミント → MEDIUMINT
  - ビッグイント → BIGINT
  - フロート → FLOAT
  - ダブル → DOUBLE
  - タイニーイント → TINYINT
  - 十進数 → DECIMAL
  - チャー / チャール → CHAR
  - バイナリ → BINARY
  - 二進法 → VARBINARY
  - タイニーブロブ → TINYBLOB
  - ミディアムブロブ → MEDIUMBLOB
  - ロングブロブ → LONGBLOB
  - 小さなテキスト / 小さな文字 → TINYTEXT
  - 中テキスト → MEDIUMTEXT
  - 長文 → LONGTEXT
  - ヴァルチャー → VARCHAR
  - 可変長文字 → VARCHAR
  - 列挙型 → ENUM
  - タイムスタンプ → TIMESTAMP
  - 日付 → DATE
  - 日時 → DATETIME
  - 時間 → TIME
  - 年 → YEAR
  - 少し → BIT
  - ブール / ブール値 → BOOL / BOOLEAN
  - 署名なし / 未署名 / 符号なし → UNSIGNED

  JAVASCRIPT TYPE column (lowercase, matching EN source)
  - 番号 → number
  - 文字列 → string
  - ヌル → null
  - 整数 → int
  - 長さ → long
  - バイト → bytes

  PARQUET TYPE column
  - バイト配列 → BYTE_ARRAY
  - 固定長バイト配列 → FIXED_LEN_BYTE_ARRAY
  - タイムスタンプマイクロ → TIMESTAMP_MICROS

  Also fixed column headers to Japanese where tables were fully
  replaced (e.g. 'TiDB Cloud Serverlessの型 / JavaScriptの型',
  'Parquet プリミティブ型 / Parquet 論理型 / TiDBまたはMySQLの型').

Affected files: ticdc-avro-protocol, ticdc-canal-json, ticdc-csv,
ticdc-open-protocol, ticdc-simple-protocol, serverless-driver.md,
serverless-export.md, import-parquet-files*.md,
bookshop-schema-design.md, unique-serial-number-generation.md,
system-variables.md (タイプ: フロート → タイプ: float),
tidb-configuration-file.md, tidb-cloud-auditing.md

This completes the standardization of SQL/configuration type names
across the entire i18n-ja-release-8.5 branch.
- ticdc-avro-protocol: remove duplicate table header row
- ticdc-canal-json: fix VARCHARル leftover typo
- ticdc-simple-protocol: fix FLOAT64 → float64 (Go convention)
…ma tables

Fix 6 schema description tables in dev-guide-bookshop-schema-design.md:
- Column header: タイプ → 型
- Field names: restore Japanese translations to English
  (e.g., タイトル → title, ストック → stock, 名前 → name, etc.)
- Type values: restore katakana to canonical SQL types
  (e.g., ビギント → BIGINT, 小さな整数 → TINYINT, etc.)
- Descriptions: kept in Japanese as-is

i18n(ja): fix 整数 → int in sequence table

i18n(ja): フィールドタイプ → フィールドの型
i18n(ja): fix remaining type names in ticdc-canal-json type tables

- First table (MySQL Type mapping): binary, varbinary, text variants,
  blob variants, date/time types, SET, BIT, TiDBVectorFloat32
- Second table (Integer types): SMALLINT, MEDIUMINT, INT, BIGINT,
  UNSIGNED variants
- Third table (Java SQL Type): INTEGER, REAL, VARCHAR, CLOB, BIT,
  DATE, TIME, TIMESTAMP, BLOB
i18n(ja): fix remaining integer type names in canal-json

- tinyint unsigned → TINYINT UNSIGNED
- mediumint unsigned → MEDIUMINT UNSIGNED
- 整数 → INT
- [128、255] → [128, 255] (Japanese comma → ASCII comma)

i18n(ja): fix column type code table in ticdc-open-protocol

- Header: タイプ → 型
- ヌル → NULL, タイムスタンプ → TIMESTAMP
- 日付 → DATE, 時間 → TIME, 日時 → DATETIME, 年 → YEAR
- ブール値 → BOOLEAN, 少し → BIT
- 列挙型 → ENUM, セット → SET, 幾何学 → GEOMETRY
- 文字/バイナリ → CHAR/BINARY
- TiDBベクトルFLOAT32 → TiDBVectorFloat32
- 10月14日 → 10/14 (MT mistakenly translated the code as a date)

i18n(ja): fix 少し → Bit in bit flags table header
- Header: mysqlタイプ → mysqlType, and all column headers to EN
- All Japanese type names → canonical SQL types (lowercase like EN)
- 長さ → long, 弦 → string, バイト → bytes, FLOAT → float, DOUBLE → double
- 少し → BIT, ブール → BOOL, 列挙型 → ENUM, etc.
- TiDBベクトルFLOAT32 → TiDBVectorFloat32
Fix 3 audit log field tables in tidb-cloud-auditing.md:
- Field names restored to EN source (EVENT_CLASS, COST_TIME, etc.)
- Type names were already INTEGER/VARCHAR/TIMESTAMP/FLOAT
- Descriptions kept in Japanese as-is
- Additional CONNECTION and TABLE_ACCESS/GENERAL tables also fixed

i18n(ja): 社内使用 → 内部使用 for 'internal use'

i18n(ja): fix bit flags table - 価値→Value, name column to English
i18n(ja): Others → その他 (label, not a type name)
- data-type-json.md: JSON value type table (タイプ→型, type values to EN)
- data-type-date-and-time.md: zero value date type names to EN
- tidb-limitations.md: CHAR/BINARY/VARCHAR/BLOB type names to EN
- tidb-cloud/tidb-cloud-console-auditing.md: audit event field and type names to EN
- data-type-numeric.md: UNSIGNED/ZEROFILL syntax elements to EN
- develop/dev-guide-create-secondary-indexes.md: bookshop schema table (same pattern)

i18n(ja): fix programming type names in protocol field tables

Change Japanese programming type names to English in protocol field
definition tables across 6 files:

- 弦 → string
- 番号 → number
- 物体 → object
- ブール値/ブール → boolean (JavaScript/JSON types)
- 整数 → integer (config param types)

Affected: ticdc-simple-protocol, ticdc-canal-json, ticdc-open-protocol,
ticdc-debezium, develop/serverless-driver (config table),
tidb-cloud/data-service-app-config-files (config table)

i18n(ja): 関数 → function in config table type column

i18n(ja): タイプ → 型 in SQL level options table header
i18n(ja): タイプ → 型 in system-variables with English type values

- タイプ: ブール値 → 型: Boolean (161)
- タイプ: 列挙型 → 型: Enumeration (31)
- タイプ: 時間 → 型: Time (6)
- タイプ: float → 型: Float (40)
- タイプ:期間 → 型: Duration (5)

i18n(ja): タイプ: float → 型: Float in tidb-configuration-file

i18n(ja): 型: 整数 → 型: Integer in tidb-configuration-file
@yahonda yahonda force-pushed the i18n-ja-fix-type-names branch 2 times, most recently from adc6300 to f88e990 Compare June 25, 2026 05:52
yahonda added a commit to yahonda/docs that referenced this pull request Jun 26, 2026
Reverted files that had overlapping 非表示→不可視 changes:
- releases/release-5.0.0-rc.md
- releases/release-8.0.0.md
- sql-statement-alter-index.md
- sql-statement-create-index.md
- best-practices/index-management-best-practices.md

Preserved unique PR pingcap#23145 changes (タイプ→型) in system-variables.md
yahonda added a commit to yahonda/docs that referenced this pull request Jun 26, 2026
Reverted files that had overlapping 非表示→不可視 changes:
- releases/release-5.0.0-rc.md
- releases/release-8.0.0.md
- sql-statement-alter-index.md
- sql-statement-create-index.md
- best-practices/index-management-best-practices.md

Preserved unique PR pingcap#23145 changes (タイプ→型) in system-variables.md
@yahonda yahonda force-pushed the i18n-ja-fix-type-names branch 4 times, most recently from 987bca8 to e5f1141 Compare June 26, 2026 06:12
yahonda added 2 commits June 26, 2026 15:27
Standardize the N/A notation across the 4 files modified in this PR
that used the Japanese '該当なし' for null/empty table cells

i18n(ja): 分野 → フィールド名 (fix field mistranslation)

分野 means 'academic field/discipline' - the correct translation for
'database/protocol field' is フィールド名. Fixed 11 table headers
across ticdc-canal-json, ticdc-debezium, and ticdc-simple-protocol.
Also fixed remaining タイプ/Type → 型 in simple-protocol headers.

i18n(ja): fix remaining field name translations and type values

- canal-json: sqlタイプ/mysqlタイプ → sqlType/mysqlType (protocol field names)
- create-secondary-indexes: 分野の説明 → フィールドの説明
- data-service-app-config-files: 分野 → フィールド名, タイプ → 型
- system-variables: 型: 文字列 → 型: String (26 occurrences)
- debezium: ペイロード → payload (protocol field paths)
- debezium: ソース.コミット_ts → source.commit_ts
- debezium: payload後 → payload.after

i18n(ja): fix remaining protocol field paths and type names

- debezium: スキーマ名 → schema.name, スキーマ.オプション → schema.optional,
  スキーマタイプ → schema.type
- canal-json: 配列 → Array
- simple-protocol: 配列 → Array (2 occurrences)

i18n(ja): 非表示のインデックス → 不可視インデックス
@yahonda yahonda force-pushed the i18n-ja-fix-type-names branch from ab74dad to a5cfcd2 Compare June 26, 2026 06:27
- Remove duplicated SET/RESOURCE_GROUP/CREATE/ADMIN/AS/VEC_COSINE_DISTANCE
  words that were left over from English sentence structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/develop This PR relates to the area of TiDB App development. needs-1-more-lgtm Indicates a PR needs 1 more LGTM. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. translation/no-need No need to translate this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants