Top Prev Next Up Down
CmakeGC_URV
Make General Category URV-maps from Unicodedata.txt.
Syntax: ./CmakeGC_URV {-v} <UCD file>
<DATfile>
The pattern structure applied to every row of the given UCD-file
001|
p_C:CFix:="^"c &
002| _"([0123456789ABCDEF]{4:6});"c
& --1 Code value. Code value in 4-digit hexadecimal format.
003| _"([^;]*);"c & --2 Character
name
004| _"([^;]*);"c & --3 General
Category
005| _"([^;]*);"c & --4 Canonical
Combining Classes
006| _"([^;]*);"c & --5
Bidirectional Category
007| _"([^;]*);"c & --6 Character
Decomposition Mapping
008| _"([^;]*);"c & --7 Decimal
digit value normative This is a numeric field
009| _"([^;]*);"c & --8 Digit value
This is a numeric field
010| _"([^;]*);"c & --9 Numeric
value This is a numeric field
011| _"([^;]*);"c & --10 Mirrored
This field has the value "Y" otherwise "N".
012| _"([^;]*);"c & --11 Unicode
1.0 Name. This is the old name
013| _"([^;]*);"c & --12 10646
comment field
014| _"([^;]*);"c & --13 Uppercase
Mapping
015| _"([^;]*);"c & --14 Lowercase
Mapping
016| _"([^;]*)"c & --15 Titlecase
Mapping
017| _"$"c;
As an example, one row of the UCD-file
0041;LATIN CAPITAL LETTER
A;Lu;0;L;;;;;N;;;;0061;
Usually as input, UCD-file, is not the whole unicodedata.txt but a
part of it.
The result of the run is in a DAT-file. A DAT-file is in binary
format and has a name ending with ".dat".
The program is building an URV with codepoints belonging to a
specific General Category, if the whole unicodedata.txt is the input
you get codepoints for all General Categories in the DAT-file.