Expand description
This module defines all available properties.
Properties may be empty marker types and implement BinaryProperty, or enumerations1
and implement EnumeratedProperty.
BinaryPropertys are queried through a CodePointSetData,
while EnumeratedPropertys are queried through CodePointMapData.
In addition, some EnumeratedPropertys also implement ParseableEnumeratedProperty or
NamedEnumeratedProperty. For these properties, PropertyParser,
PropertyNamesLong, and PropertyNamesShort
can be constructed.
either Rust
enums, or Ruststructs with associated constants (open enums) ↩
Structs§
- Alnum
- Characters with the
AlphabeticorDecimal_Numberproperty. - Alphabetic
- Alphabetic characters.
- Ascii
HexDigit - ASCII characters commonly used for the representation of hexadecimal numbers.
- Basic
Emoji - Characters and character sequences intended for general-purpose, independent, direct input.
- Bidi
Class - Enumerated property Bidi_Class
- Bidi
Control - Format control characters which have specific functions in the Unicode Bidirectional Algorithm.
- Bidi
Mirrored - Characters that are mirrored in bidirectional text.
- Bidi
Mirroring Glyph - This is a bitpacked combination of the
Bidi_Mirroring_Glyph,Bidi_Mirrored, andBidi_Paired_Bracket_Typeproperties. - Blank
- Horizontal whitespace characters
- Canonical
Combining Class - Property Canonical_Combining_Class. See UAX #15: https://www.unicode.org/reports/tr15/.
- Case
Ignorable - Characters which are ignored for casing purposes.
- Case
Sensitive - Characters that are either the source of a case mapping or in the target of a case mapping.
- Cased
- Uppercase, lowercase, and titlecase characters.
- Changes
When Casefolded - Characters whose normalized forms are not stable under case folding.
- Changes
When Casemapped - Characters which may change when they undergo case mapping.
- Changes
When Lowercased - Characters whose normalized forms are not stable under a
toLowercasemapping. - Changes
When Nfkc Casefolded - Characters which are not identical to their
NFKC_Casefoldmapping. - Changes
When Titlecased - Characters whose normalized forms are not stable under a
toTitlecasemapping. - Changes
When Uppercased - Characters whose normalized forms are not stable under a
toUppercasemapping. - Dash
- Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents.
- Default
Ignorable Code Point - For programmatic determination of default ignorable code points.
- Deprecated
- Deprecated characters.
- Diacritic
- Characters that linguistically modify the meaning of another character to which they apply.
- East
Asian Width - Enumerated property East_Asian_Width.
- Emoji
- Characters that are emoji.
- Emoji
Component - Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps.
- Emoji
Modifier - Characters that are emoji modifiers.
- Emoji
Modifier Base - Characters that can serve as a base for emoji modifiers.
- Emoji
Presentation - Characters that have emoji presentation by default.
- Extended
Pictographic - Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
- Extender
- Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
- Full
Composition Exclusion - Characters that are excluded from composition.
- General
Category Group - Groupings of multiple General_Category property values.
- General
Category OutOf Bounds Error - Error value for
impl TryFrom<u8> for GeneralCategory. - Graph
- Invisible characters.
- Grapheme
Base - Property used together with the definition of Standard Korean Syllable Block to define “Grapheme base”.
- Grapheme
Cluster Break - Enumerated property Grapheme_Cluster_Break.
- Grapheme
Extend - Property used to define “Grapheme extender”.
- Grapheme
Link - Deprecated property.
- Hangul
Syllable Type - Enumerated property Hangul_Syllable_Type
- HexDigit
- Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents.
- Hyphen
- Deprecated property.
- IdContinue
- Characters that can come after the first character in an identifier.
- IdStart
- Characters that can begin an identifier.
- Ideographic
- Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
- IdsBinary
Operator - Characters used in Ideographic Description Sequences.
- IdsTrinary
Operator - Characters used in Ideographic Description Sequences.
- Indic
Syllabic Category - Property Indic_Syllabic_Category. See UAX #44: https://www.unicode.org/reports/tr44/#Indic_Syllabic_Category.
- Join
Control - Format control characters which have specific functions for control of cursive joining and ligation.
- Joining
Type - Enumerated property Joining_Type.
- Line
Break - Enumerated property Line_Break.
- Logical
Order Exception - A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao.
- Lowercase
- Lowercase characters.
- Math
- Characters used in mathematical notation.
- NfcInert
- Characters that are inert under NFC, i.e., they do not interact with adjacent characters.
- NfdInert
- Characters that are inert under NFD, i.e., they do not interact with adjacent characters.
- Nfkc
Inert - Characters that are inert under NFKC, i.e., they do not interact with adjacent characters.
- Nfkd
Inert - Characters that are inert under NFKD, i.e., they do not interact with adjacent characters.
- Noncharacter
Code Point - Code points permanently reserved for internal use.
- Pattern
Syntax - Characters used as syntax in patterns (such as regular expressions).
- Pattern
White Space - Characters used as whitespace in patterns (such as regular expressions).
- Prepended
Concatenation Mark - A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
- Printable characters (visible characters and whitespace).
- Quotation
Mark - Punctuation characters that function as quotation marks.
- Radical
- Characters used in the definition of Ideographic Description Sequences.
- Regional
Indicator - Regional indicator characters,
U+1F1E6..U+1F1FF. - Script
- Enumerated property Script.
- Segment
Starter - Characters that are starters in terms of Unicode normalization and combining character sequences.
- Sentence
Break - Enumerated property Sentence_Break.
- Sentence
Terminal - Punctuation characters that generally mark the end of sentences.
- Soft
Dotted - Characters with a “soft dot”, like i or j.
- Terminal
Punctuation - Punctuation characters that generally mark the end of textual units.
- Unified
Ideograph - A property which specifies the exact set of Unified CJK Ideographs in the standard.
- Uppercase
- Uppercase characters.
- Variation
Selector - Characters that are Variation Selectors.
- Vertical
Orientation - Property Vertical_Orientation
- White
Space - Spaces, separator characters and other control characters which should be treated by programming languages as “white space” for the purpose of parsing elements.
- Word
Break - Enumerated property Word_Break.
- Xdigit
- Hexadecimal digits
- XidContinue
- Characters that can come after the first character in an identifier.
- XidStart
- Characters that can begin an identifier.
Enums§
- Bidi
Paired Bracket Type - The enum represents Bidi_Paired_Bracket_Type.
- General
Category - Enumerated property General_Category.
Traits§
- Binary
Property - A binary Unicode character property.
- Emoji
Set - An Emoji set as defined by
Unicode Technical Standard #51. - Enumerated
Property - A Unicode character property that assigns a value to each code point.
- Named
Enumerated Property - A property whose value names can be represented as strings.
- Parseable
Enumerated Property - A property whose value names can be parsed from strings.