Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 20 additions & 10 deletions peps/pep-0545.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,15 +187,25 @@ Language Tag
''''''''''''

A common notation for language tags is the :rfc:`IETF Language Tag <5646>`
[4]_ based on ISO 639, although gettext uses ISO 639 tags with
underscores (ex: ``pt_BR``) instead of dashes to join tags [5]_
(ex: ``pt-BR``). Examples of IETF Language Tags: ``fr`` (French),
``ja`` (Japanese), ``pt-BR`` (Orthographic formulation of 1943 -
Official in Brazil).
[4]_ (BCP 47, RFC 5646), which is based on ISO 639 for language codes,
ISO 15924 for script codes, and ISO 3166 for region codes. Gettext uses
ISO 639 tags with underscores (e.g. ``pt_BR``), but IETF tags use hyphens
as separators instead of dashes to join tags [5]_ (e.g. ``pt-BR``).

It is more common to see dashes instead of underscores in URLs [6]_,
so we should use IETF language tags, even if sphinx uses gettext
internally: URLs are not meant to leak the underlying implementation.
Examples of IETF Language Tags:

* ``fr`` (French),
* ``ja`` (Japanese),
* ``pt-br`` (Portugese as spoken in Brazil),
* ``pa-guru`` (Punjabi written in Gurmukhi script)

The ``script`` subtag is used when a language can be written in multiple
writing systems. For example, Punjabi can be written in Gurmukhi (``pa-guru``)
or Shahmukhi (``pa-arab``).

It is more common to see hyphens instead of underscores in URLs [6]_,
so we should use IETF language tags in URL paths, even if Sphinx or Gettext use
different internal conventions. URLs should not leak implementation details.

It's uncommon to see capitalized letters in URLs, and docs.python.org
doesn't use any, so it may hurt readability by attracting the eye on it,
Expand All @@ -206,10 +216,10 @@ states that tags are not case sensitive. As the RFC allows lower case,
and it enhances readability, we should use lowercased tags like
``pt-br``.

We may drop the region subtag when it does not add distinguishing
We may drop the subtag when it does not add distinguishing
information, for example: "de-DE" or "fr-FR". (Although it might
make sense, respectively meaning "German as spoken in Germany"
and "French as spoken in France"). But when the region subtag
and "French as spoken in France"). But when the subtag
actually adds information, for example "pt-BR" or "Portuguese as
spoken in Brazil", it should be kept.

Expand Down
Loading