Module sets

Source
Expand description

The functions in this module return a CodePointSetData containing the set of characters with a particular Unicode property.

The descriptions of most properties are taken from TR44, the documentation for the Unicode Character Database. Some properties are instead defined in TR18, the documentation for Unicode regular expressions. In particular, Annex C of this document defines properties for POSIX compatibility.

Structs§

CodePointSetData
A wrapper around code point set data. It is returned by APIs that return Unicode property data in a set-like form, ex: a set of code points sharing the same value for a Unicode property. Access its data via the borrowed version, CodePointSetDataBorrowed.
CodePointSetDataBorrowed
A borrowed wrapper around code point set data, returned by CodePointSetData::as_borrowed(). More efficient to query.
UnicodeSetData
A wrapper around UnicodeSet data (characters and strings)
UnicodeSetDataBorrowed
A borrowed wrapper around code point set data, returned by UnicodeSetData::as_borrowed(). More efficient to query.

Functions§

alnum
Characters with the Alphabetic or Decimal_Number property This is defined for POSIX compatibility.
alphabetic
Alphabetic characters
ascii_hex_digit
ASCII characters commonly used for the representation of hexadecimal numbers
basic_emoji
Characters and character sequences intended for general-purpose, independent, direct input. See Unicode Technical Standard #51 for more details.
bidi_control
Format control characters which have specific functions in the Unicode Bidirectional Algorithm
bidi_mirrored
Characters that are mirrored in bidirectional text
blank
Horizontal whitespace characters
case_ignorable
Characters which are ignored for casing purposes
case_sensitive
Characters that are either the source of a case mapping or in the target of a case mapping
cased
Uppercase, lowercase, and titlecase characters
changes_when_casefolded
Characters whose normalized forms are not stable under case folding
changes_when_casemapped
Characters which may change when they undergo case mapping
changes_when_lowercased
Characters whose normalized forms are not stable under a toLowercase mapping
changes_when_nfkc_casefolded
Characters which are not identical to their NFKC_Casefold mapping
changes_when_titlecased
Characters whose normalized forms are not stable under a toTitlecase mapping
changes_when_uppercased
Characters whose normalized forms are not stable under a toUppercase mapping
dash
Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents
default_ignorable_code_point
For programmatic determination of default ignorable code points. New characters that should be ignored in rendering (unless explicitly supported) will be assigned in these ranges, permitting programs to correctly handle the default rendering of such characters when not otherwise supported.
deprecated
Deprecated characters. No characters will ever be removed from the standard, but the usage of deprecated characters is strongly discouraged.
diacritic
Characters that linguistically modify the meaning of another character to which they apply
emoji
Characters that are emoji
emoji_component
Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps
emoji_modifier
Characters that are emoji modifiers
emoji_modifier_base
Characters that can serve as a base for emoji modifiers
emoji_presentation
Characters that have emoji presentation by default
extended_pictographic
Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
extender
Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
for_general_category_group
Return a CodePointSetData for a value or a grouping of values of the General_Category property. See GeneralCategoryGroup.
full_composition_exclusion
Characters that are excluded from composition See https://unicode.org/Public/UNIDATA/CompositionExclusions.txt
graph
Visible characters. This is defined for POSIX compatibility.
grapheme_base
Property used together with the definition of Standard Korean Syllable Block to define “Grapheme base”. See D58 in Chapter 3, Conformance in the Unicode Standard.
grapheme_extend
Property used to define “Grapheme extender”. See D59 in Chapter 3, Conformance in the Unicode Standard.
grapheme_link
Deprecated property. Formerly proposed for programmatic determination of grapheme cluster boundaries.
hex_digit
Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents
hyphen
Deprecated property. Dashes which are used to mark connections between pieces of words, plus the Katakana middle dot.
id_continue
Characters that can come after the first character in an identifier. If using NFKC to fold differences between characters, use load_xid_continue instead. See Unicode Standard Annex #31 for more details.
id_start
Characters that can begin an identifier. If using NFKC to fold differences between characters, use load_xid_start instead. See Unicode Standard Annex #31 for more details.
ideographic
Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
ids_binary_operator
Characters used in Ideographic Description Sequences
ids_trinary_operator
Characters used in Ideographic Description Sequences
join_control
Format control characters which have specific functions for control of cursive joining and ligation
load_alnum
A version of alnum() that uses custom data provided by a DataProvider.
load_alphabetic
A version of alphabetic() that uses custom data provided by a DataProvider.
load_ascii_hex_digit
A version of ascii_hex_digit() that uses custom data provided by a DataProvider.
load_basic_emoji
A version of basic_emoji() that uses custom data provided by a DataProvider.
load_bidi_control
A version of bidi_control() that uses custom data provided by a DataProvider.
load_bidi_mirrored
A version of bidi_mirrored() that uses custom data provided by a DataProvider.
load_blank
A version of blank() that uses custom data provided by a DataProvider.
load_case_ignorable
A version of case_ignorable() that uses custom data provided by a DataProvider.
load_case_sensitive
A version of case_sensitive() that uses custom data provided by a DataProvider.
load_cased
A version of cased() that uses custom data provided by a DataProvider.
load_changes_when_casefolded
A version of changes_when_casefolded() that uses custom data provided by a DataProvider.
load_changes_when_casemapped
A version of changes_when_casemapped() that uses custom data provided by a DataProvider.
load_changes_when_lowercased
A version of changes_when_lowercased() that uses custom data provided by a DataProvider.
load_changes_when_nfkc_casefolded
A version of changes_when_nfkc_casefolded() that uses custom data provided by a DataProvider.
load_changes_when_titlecased
A version of changes_when_titlecased() that uses custom data provided by a DataProvider.
load_changes_when_uppercased
A version of changes_when_uppercased() that uses custom data provided by a DataProvider.
load_dash
A version of dash() that uses custom data provided by a DataProvider.
load_default_ignorable_code_point
A version of default_ignorable_code_point() that uses custom data provided by a DataProvider.
load_deprecated
A version of deprecated() that uses custom data provided by a DataProvider.
load_diacritic
A version of diacritic() that uses custom data provided by a DataProvider.
load_emoji
A version of emoji() that uses custom data provided by a DataProvider.
load_emoji_component
A version of emoji_component() that uses custom data provided by a DataProvider.
load_emoji_modifier
A version of emoji_modifier() that uses custom data provided by a DataProvider.
load_emoji_modifier_base
A version of emoji_modifier_base() that uses custom data provided by a DataProvider.
load_emoji_presentation
A version of emoji_presentation() that uses custom data provided by a DataProvider.
load_extended_pictographic
A version of extended_pictographic() that uses custom data provided by a DataProvider.
load_extender
A version of extender() that uses custom data provided by a DataProvider.
load_for_ecma262
Returns a type capable of looking up values for a property specified as a string, as long as it is a binary property listed in ECMA-262, using strict matching on the names in the spec.
load_for_ecma262_unstable
A version of load_for_ecma262 that uses custom data provided by a DataProvider.
load_for_ecma262_with_any_provider
A version of load_for_ecma262 that uses custom data provided by an AnyProvider.
load_for_general_category_group
A version of for_general_category_group() that uses custom data provided by a DataProvider.
load_full_composition_exclusion
A version of full_composition_exclusion() that uses custom data provided by a DataProvider.
load_graph
A version of graph() that uses custom data provided by a DataProvider.
load_grapheme_base
A version of grapheme_base() that uses custom data provided by a DataProvider.
load_grapheme_extend
A version of grapheme_extend() that uses custom data provided by a DataProvider.
load_grapheme_link
A version of grapheme_link() that uses custom data provided by a DataProvider.
load_hex_digit
A version of hex_digit() that uses custom data provided by a DataProvider.
load_hyphen
A version of hyphen() that uses custom data provided by a DataProvider.
load_id_continue
A version of id_continue() that uses custom data provided by a DataProvider.
load_id_start
A version of id_start() that uses custom data provided by a DataProvider.
load_ideographic
A version of ideographic() that uses custom data provided by a DataProvider.
load_ids_binary_operator
A version of ids_binary_operator() that uses custom data provided by a DataProvider.
load_ids_trinary_operator
A version of ids_trinary_operator() that uses custom data provided by a DataProvider.
load_join_control
A version of join_control() that uses custom data provided by a DataProvider.
load_logical_order_exception
A version of logical_order_exception() that uses custom data provided by a DataProvider.
load_lowercase
A version of lowercase() that uses custom data provided by a DataProvider.
load_math
A version of math() that uses custom data provided by a DataProvider.
load_nfc_inert
A version of nfc_inert() that uses custom data provided by a DataProvider.
load_nfd_inert
A version of nfd_inert() that uses custom data provided by a DataProvider.
load_nfkc_inert
A version of nfkc_inert() that uses custom data provided by a DataProvider.
load_nfkd_inert
A version of nfkd_inert() that uses custom data provided by a DataProvider.
load_noncharacter_code_point
A version of noncharacter_code_point() that uses custom data provided by a DataProvider.
load_pattern_syntax
A version of pattern_syntax() that uses custom data provided by a DataProvider.
load_pattern_white_space
A version of pattern_white_space() that uses custom data provided by a DataProvider.
load_prepended_concatenation_mark
A version of prepended_concatenation_mark() that uses custom data provided by a DataProvider.
load_print
A version of print() that uses custom data provided by a DataProvider.
load_quotation_mark
A version of quotation_mark() that uses custom data provided by a DataProvider.
load_radical
A version of radical() that uses custom data provided by a DataProvider.
load_regional_indicator
A version of regional_indicator() that uses custom data provided by a DataProvider.
load_segment_starter
A version of segment_starter() that uses custom data provided by a DataProvider.
load_sentence_terminal
A version of sentence_terminal() that uses custom data provided by a DataProvider.
load_soft_dotted
A version of soft_dotted() that uses custom data provided by a DataProvider.
load_terminal_punctuation
A version of terminal_punctuation() that uses custom data provided by a DataProvider.
load_unified_ideograph
A version of unified_ideograph() that uses custom data provided by a DataProvider.
load_uppercase
A version of uppercase() that uses custom data provided by a DataProvider.
load_variation_selector
A version of variation_selector() that uses custom data provided by a DataProvider.
load_white_space
A version of white_space() that uses custom data provided by a DataProvider.
load_xdigit
A version of xdigit() that uses custom data provided by a DataProvider.
load_xid_continue
A version of xid_continue() that uses custom data provided by a DataProvider.
load_xid_start
A version of xid_start() that uses custom data provided by a DataProvider.
logical_order_exception
A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao
lowercase
Lowercase characters
math
Characters used in mathematical notation
nfc_inert
Characters that are inert under NFC, i.e., they do not interact with adjacent characters
nfd_inert
Characters that are inert under NFD, i.e., they do not interact with adjacent characters
nfkc_inert
Characters that are inert under NFKC, i.e., they do not interact with adjacent characters
nfkd_inert
Characters that are inert under NFKD, i.e., they do not interact with adjacent characters
noncharacter_code_point
Code points permanently reserved for internal use
pattern_syntax
Characters used as syntax in patterns (such as regular expressions). See Unicode Standard Annex #31 for more details.
pattern_white_space
Characters used as whitespace in patterns (such as regular expressions). See Unicode Standard Annex #31 for more details.
prepended_concatenation_mark
A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
print
Printable characters (visible characters and whitespace). This is defined for POSIX compatibility.
quotation_mark
Punctuation characters that function as quotation marks.
radical
Characters used in the definition of Ideographic Description Sequences
regional_indicator
Regional indicator characters, U+1F1E6..U+1F1FF
segment_starter
Characters that are starters in terms of Unicode normalization and combining character sequences
sentence_terminal
Punctuation characters that generally mark the end of sentences
soft_dotted
Characters with a “soft dot”, like i or j. An accent placed on these characters causes the dot to disappear.
terminal_punctuation
Punctuation characters that generally mark the end of textual units
unified_ideograph
A property which specifies the exact set of Unified CJK Ideographs in the standard
uppercase
Uppercase characters
variation_selector
Characters that are Variation Selectors.
white_space
Spaces, separator characters and other control characters which should be treated by programming languages as “white space” for the purpose of parsing elements
xdigit
Hexadecimal digits This is defined for POSIX compatibility.
xid_continue
Characters that can come after the first character in an identifier. See Unicode Standard Annex #31 for more details.
xid_start
Characters that can begin an identifier. See Unicode Standard Annex #31 for more details.