Top Prev Next Up Down

Classes, Sets and MapPools

Inside \p (or \P) bracket is an identifier for a character group. Note this differs from the solutions in Java or C++. In Y2018-Text regular expression if an identifier is not one of Unicode GC then the identifier must be found in the Jets.RangeVectorPack.Map_TY.Map given in the Pattern call (examples in  ExPi.adb and ExRo.adb). The Map_TY.Map of RangeVectorPack looks like a library of groups where the key of the map is this group identifier.
The reason for this solution is that contruction of a map item is rather time consuming on a small computer and may involve reading through Unicode UCD data file or files.

A class is an entity generated during the compilation of a regular expression. The code is as follows


001| when REQ_CLASS => -- Exists only in RegexPattern
002| __class_min : Integer; -- 0:Integer'Last
003| __class_max : Integer;
004| __class_maxORG : Integer;
005| __class_greedy : Boolean;
006| __class_pos : RangeVectorPack.RangeVector;
007| __class_neg : RangeVectorPack.RangeVector;

In a class we have two RangeVectors, one for positive result and one for negative (i.e. should not contain these codepoints). In a regular expression this corresponds to a \p (positive) and \P (negative) values and if in square brackets ([]) the content is joined.

Presentation of RangeVector funtion for function.
The implementation structure Y2018.Text.Jets.RangeVectorPack

Presentation of URV funtions for function.
The implementatio structure Y2018.Text.Util.UrvPack.

Tools for creating RangeVectorPack.UserRangeVectorMap_TY.Map's in CreateURV.

At last General category of Unicode.org