Top Prev Next Up Down

CmakeGC_URV

Make General Category URV-maps from Unicodedata.txt.
Syntax: ./CmakeGC_URV {-v} <UCD file> <DATfile>
The pattern structure applied to every row of the given UCD-file


001| p_C:CFix:="^"c &
002| _"([0123456789ABCDEF]{4:6});"c & --1 Code value. Code value in 4-digit hexadecimal format.
003| _"([^;]*);"c & --2 Character name
004| _"([^;]*);"c & --3 General Category
005| _"([^;]*);"c & --4 Canonical Combining Classes
006| _"([^;]*);"c & --5 Bidirectional Category
007| _"([^;]*);"c & --6 Character Decomposition Mapping
008| _"([^;]*);"c & --7 Decimal digit value normative This is a numeric field
009| _"([^;]*);"c & --8 Digit value This is a numeric field
010| _"([^;]*);"c & --9 Numeric value This is a numeric field
011| _"([^;]*);"c & --10 Mirrored This field has the value "Y" otherwise "N".
012| _"([^;]*);"c & --11 Unicode 1.0 Name. This is the old name
013| _"([^;]*);"c & --12 10646 comment field
014| _"([^;]*);"c & --13 Uppercase Mapping
015| _"([^;]*);"c & --14 Lowercase Mapping
016| _"([^;]*)"c & --15 Titlecase Mapping
017| _"$"c;

As an example, one row of the UCD-file
0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;

Usually as input, UCD-file, is not the whole unicodedata.txt but a part of it.
The result of the run is in a DAT-file. A DAT-file is in binary format and has a name ending with ".dat".
The program is building an URV with codepoints belonging to a specific General Category, if the whole unicodedata.txt is the input you get codepoints for all General Categories in the DAT-file.