RegEx Patterns

Jangle uses a variety of regular expressions for validating and parsing language tags. All rules from the RFC 5646 ABNF syntax definition have now been implemented!

Note

Some patterns are case-sensitive, which is incompliant with RFC 5646 section 2.1. RULES are compiled with the I flag – always pass this flag if you are compiling elsewhere.

Language-Tag:

(?P<grandfathered>(?P<regular>art\-lojban|cel\-gaulish|no\-bok|no\-nyn|zh\-guoyo|zh\-hakka|zh\-min|zh\-min\-nan|zh\-xiang)|(?P<irregular>en\-GB\-oed|i\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\-(?:BE\-FR|BE\-NL|CH\-DE)))|(?P<private_tag>x(?:\-[A-Za-z\d]{1,8})+)|(?P<langtag>(?P<iso_639>[A-Za-z]{2,3})(?:\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})))?(?![^\-\s])(?:\-(?P<script>[A-Za-z]{4})(?![^\-\s]))?(?:\-(?P<region>(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\d{3}))(?![^\-\s]))?(?P<variants>(?:\-(?:[A-Za-z\d]{5,8}|\d[A-Za-z\d]{3}))*)(?P<extensions>(?:\-(?P<singleton>[A-WY-Za-wy-z\d])(?P<ext_text>(?:\-[A-Za-z\d]{2,8})+))*)(?:\-(?P<private_subtag>x(?:\-[A-Za-z\d]{1,8})+))?)

langtag:

(?P<iso_639>[A-Za-z]{2,3})(?:\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})))?(?![^\-\s])(?:\-(?P<script>[A-Za-z]{4})(?![^\-\s]))?(?:\-(?P<region>(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\d{3}))(?![^\-\s]))?(?P<variants>(?:\-(?:[A-Za-z\d]{5,8}|\d[A-Za-z\d]{3}))*)(?P<extensions>(?:\-(?P<singleton>[A-WY-Za-wy-z\d])(?P<ext_text>(?:\-[A-Za-z\d]{2,8})+))*)(?:\-(?P<private_subtag>x(?:\-[A-Za-z\d]{1,8})+))?

language:

(?P<iso_639>[A-Za-z]{2,3})(?:\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})))?

extlang:

(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})

script:

[A-Za-z]{4}

region:

(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\d{3})

variant:

[A-Za-z\d]{5,8}|\d[A-Za-z\d]{3}

extension:

(?P<singleton>[A-WY-Za-wy-z\d])(?P<ext_text>(?:\-[A-Za-z\d]{2,8})+)

singleton:

[A-WY-Za-wy-z\d]

privateuse:

x(?:\-[A-Za-z\d]{1,8})+

grandfathered:

(?P<regular>art\-lojban|cel\-gaulish|no\-bok|no\-nyn|zh\-guoyo|zh\-hakka|zh\-min|zh\-min\-nan|zh\-xiang)|(?P<irregular>en\-GB\-oed|i\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\-(?:BE\-FR|BE\-NL|CH\-DE))

regular:

art\-lojban|cel\-gaulish|no\-bok|no\-nyn|zh\-guoyo|zh\-hakka|zh\-min|zh\-min\-nan|zh\-xiang

irregular:

en\-GB\-oed|i\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\-(?:BE\-FR|BE\-NL|CH\-DE)

alphanum:

[A-Za-z\d]

jangle.patterns.RULES: dict[str, re.Pattern[str]] = {'Language-Tag': re.compile('(?P<grandfathered>(?P<regular>art\\-lojban|cel\\-gaulish|no\\-bok|no\\-nyn|zh\\-guoyo|zh\\-hakka|zh\\-min|zh\\-min\\-nan|zh\\-xiang)|(?P<irregular>en\\-GB\\-oed|i\\-(?:ami|bnn|default|enochian|hak|kl, re.IGNORECASE), 'alphanum': re.compile('[A-Za-z\\d]', re.IGNORECASE), 'extension': re.compile('(?P<singleton>[A-WY-Za-wy-z\\d])(?P<ext_text>(?:\\-[A-Za-z\\d]{2,8})+)', re.IGNORECASE), 'extlang': re.compile('(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\\-[A-Za-z]{3}){0,2})', re.IGNORECASE), 'grandfathered': re.compile('(?P<regular>art\\-lojban|cel\\-gaulish|no\\-bok|no\\-nyn|zh\\-guoyo|zh\\-hakka|zh\\-min|zh\\-min\\-nan|zh\\-xiang)|(?P<irregular>en\\-GB\\-oed|i\\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|na, re.IGNORECASE), 'irregular': re.compile('en\\-GB\\-oed|i\\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\\-(?:BE\\-FR|BE\\-NL|CH\\-DE)', re.IGNORECASE), 'langtag': re.compile('(?P<iso_639>[A-Za-z]{2,3})(?:\\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\\-[A-Za-z]{3}){0,2})))?(?![^\\-\\s])(?:\\-(?P<script>[A-Za-z]{4})(?![^\\-\\s]))?(?:\\-(?P<region>(, re.IGNORECASE), 'language': re.compile('(?P<iso_639>[A-Za-z]{2,3})(?:\\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\\-[A-Za-z]{3}){0,2})))?', re.IGNORECASE), 'privateuse': re.compile('x(?:\\-[A-Za-z\\d]{1,8})+', re.IGNORECASE), 'region': re.compile('(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\\d{3})', re.IGNORECASE), 'regular': re.compile('art\\-lojban|cel\\-gaulish|no\\-bok|no\\-nyn|zh\\-guoyo|zh\\-hakka|zh\\-min|zh\\-min\\-nan|zh\\-xiang', re.IGNORECASE), 'script': re.compile('[A-Za-z]{4}', re.IGNORECASE), 'singleton': re.compile('[A-WY-Za-wy-z\\d]', re.IGNORECASE), 'variant': re.compile('[A-Za-z\\d]{5,8}|\\d[A-Za-z\\d]{3}', re.IGNORECASE)}

RegEx patterns for rules from the RFC 5646 ABNF syntax definition. See https://www.rfc-editor.org/rfc/rfc5646.html#section-2.1.

jangle.patterns.match_rule(rule: str, string: str) Match[str][source]
jangle.patterns.rules_rst() str[source]

Saves a temporary reStructuredText file documenting all rule patterns, used in docs/source/patterns.rst