RegEx Patterns¶
Jangle uses a variety of regular expressions for validating and parsing language tags. All rules from the RFC 5646 ABNF syntax definition have now been implemented!
Note
Some patterns are case-sensitive, which is incompliant with RFC 5646 section 2.1. RULES are compiled with the I flag – always pass this flag if you are compiling elsewhere.
Language-Tag:
(?P<grandfathered>(?P<regular>art\-lojban|cel\-gaulish|no\-bok|no\-nyn|zh\-guoyo|zh\-hakka|zh\-min|zh\-min\-nan|zh\-xiang)|(?P<irregular>en\-GB\-oed|i\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\-(?:BE\-FR|BE\-NL|CH\-DE)))|(?P<private_tag>x(?:\-[A-Za-z\d]{1,8})+)|(?P<langtag>(?P<iso_639>[A-Za-z]{2,3})(?:\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})))?(?![^\-\s])(?:\-(?P<script>[A-Za-z]{4})(?![^\-\s]))?(?:\-(?P<region>(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\d{3}))(?![^\-\s]))?(?P<variants>(?:\-(?:[A-Za-z\d]{5,8}|\d[A-Za-z\d]{3}))*)(?P<extensions>(?:\-(?P<singleton>[A-WY-Za-wy-z\d])(?P<ext_text>(?:\-[A-Za-z\d]{2,8})+))*)(?:\-(?P<private_subtag>x(?:\-[A-Za-z\d]{1,8})+))?)
langtag:
(?P<iso_639>[A-Za-z]{2,3})(?:\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})))?(?![^\-\s])(?:\-(?P<script>[A-Za-z]{4})(?![^\-\s]))?(?:\-(?P<region>(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\d{3}))(?![^\-\s]))?(?P<variants>(?:\-(?:[A-Za-z\d]{5,8}|\d[A-Za-z\d]{3}))*)(?P<extensions>(?:\-(?P<singleton>[A-WY-Za-wy-z\d])(?P<ext_text>(?:\-[A-Za-z\d]{2,8})+))*)(?:\-(?P<private_subtag>x(?:\-[A-Za-z\d]{1,8})+))?
language:
(?P<iso_639>[A-Za-z]{2,3})(?:\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})))?
extlang:
(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\-[A-Za-z]{3}){0,2})
script:
[A-Za-z]{4}
region:
(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\d{3})
variant:
[A-Za-z\d]{5,8}|\d[A-Za-z\d]{3}
extension:
(?P<singleton>[A-WY-Za-wy-z\d])(?P<ext_text>(?:\-[A-Za-z\d]{2,8})+)
singleton:
[A-WY-Za-wy-z\d]
privateuse:
x(?:\-[A-Za-z\d]{1,8})+
grandfathered:
(?P<regular>art\-lojban|cel\-gaulish|no\-bok|no\-nyn|zh\-guoyo|zh\-hakka|zh\-min|zh\-min\-nan|zh\-xiang)|(?P<irregular>en\-GB\-oed|i\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\-(?:BE\-FR|BE\-NL|CH\-DE))
regular:
art\-lojban|cel\-gaulish|no\-bok|no\-nyn|zh\-guoyo|zh\-hakka|zh\-min|zh\-min\-nan|zh\-xiang
irregular:
en\-GB\-oed|i\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\-(?:BE\-FR|BE\-NL|CH\-DE)
alphanum:
[A-Za-z\d]
- jangle.patterns.RULES: dict[str, re.Pattern[str]] = {'Language-Tag': re.compile('(?P<grandfathered>(?P<regular>art\\-lojban|cel\\-gaulish|no\\-bok|no\\-nyn|zh\\-guoyo|zh\\-hakka|zh\\-min|zh\\-min\\-nan|zh\\-xiang)|(?P<irregular>en\\-GB\\-oed|i\\-(?:ami|bnn|default|enochian|hak|kl, re.IGNORECASE), 'alphanum': re.compile('[A-Za-z\\d]', re.IGNORECASE), 'extension': re.compile('(?P<singleton>[A-WY-Za-wy-z\\d])(?P<ext_text>(?:\\-[A-Za-z\\d]{2,8})+)', re.IGNORECASE), 'extlang': re.compile('(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\\-[A-Za-z]{3}){0,2})', re.IGNORECASE), 'grandfathered': re.compile('(?P<regular>art\\-lojban|cel\\-gaulish|no\\-bok|no\\-nyn|zh\\-guoyo|zh\\-hakka|zh\\-min|zh\\-min\\-nan|zh\\-xiang)|(?P<irregular>en\\-GB\\-oed|i\\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|na, re.IGNORECASE), 'irregular': re.compile('en\\-GB\\-oed|i\\-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|tao|tay|tsu)|sgn\\-(?:BE\\-FR|BE\\-NL|CH\\-DE)', re.IGNORECASE), 'langtag': re.compile('(?P<iso_639>[A-Za-z]{2,3})(?:\\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\\-[A-Za-z]{3}){0,2})))?(?![^\\-\\s])(?:\\-(?P<script>[A-Za-z]{4})(?![^\\-\\s]))?(?:\\-(?P<region>(, re.IGNORECASE), 'language': re.compile('(?P<iso_639>[A-Za-z]{2,3})(?:\\-(?P<extlang>(?P<extlang_iso_639>[A-Za-z]{3})(?P<extlang_reserved>(?:\\-[A-Za-z]{3}){0,2})))?', re.IGNORECASE), 'privateuse': re.compile('x(?:\\-[A-Za-z\\d]{1,8})+', re.IGNORECASE), 'region': re.compile('(?P<iso_3166>[A-Za-z]{2})|(?P<un_m49>\\d{3})', re.IGNORECASE), 'regular': re.compile('art\\-lojban|cel\\-gaulish|no\\-bok|no\\-nyn|zh\\-guoyo|zh\\-hakka|zh\\-min|zh\\-min\\-nan|zh\\-xiang', re.IGNORECASE), 'script': re.compile('[A-Za-z]{4}', re.IGNORECASE), 'singleton': re.compile('[A-WY-Za-wy-z\\d]', re.IGNORECASE), 'variant': re.compile('[A-Za-z\\d]{5,8}|\\d[A-Za-z\\d]{3}', re.IGNORECASE)}¶
RegEx patterns for rules from the RFC 5646 ABNF syntax definition. See https://www.rfc-editor.org/rfc/rfc5646.html#section-2.1.