Top Prev Next Up Down

Regular Expression implementation

More specifics about the implementation, grouping and capturing.
The Regular expression implementation uses packages

with Y2018.Text.Jets; use Y2018.Text.Jets; -- contain I_A and I_A_Array
with Y2018.Text.Jets.MatchPack;
with Y2018.Text.Jets.PatternPack;

Variables which has to be defined

p: PatternPack.Pattern_AC:=new PatternPack.Pattern
startPos:Integer:=1
nextPos: Integer
m: MatchPack.Match_TY

The program pattern to use is
  1. To start the process we have to create the pattern by calling: PatternPack.compileM(p,"^.*?(P[\w]*).*"c).
  2. Match the source string to the pattern. If pattern is found the function retrurns TRUE else FALSE: PatternPack.matches(p,startPos,nextPos,"Please Y"c,m).
  3. If success then we have to retreive the result to a I_A_Array by the call r:I_A_ARRAY:=MatchPack.getMatch(m). I_A_ARRAY is a two dimensional array of Integers pairs (I_A type). In a I_A variable first index is the start point where we found the value we searched for and the second index is the end point of the same searched
  4. In our case we get two I_A values, one for the whole source string and one for the value "Please", without the " Y" part. In I_A_Array index 0 contain the whole string and index 1 the "Please". But we got only index values not these strings
  5. To access strings we use subIA-function. subIA("Please Y"c,r(0)) return the string "Please Y"c, and subIA("Please Y"c,r(1)) returns "Please"c
Note this implementation works with codepoint arrays (CFix) not with character arrays. This is the reason for the notation "Please Y"c .
Above the starting point for the search, startPos, is set to 1, index for the start of the source string, in our case "Please Y"c. Variable nextPos returns the index for the next search or next matching operation (example ExPsi.adb).
Regular expression uses meta character sequences \d, \D, \p, \P, \s, \S, \w, \W for character classes, Meta character sequences. A regular expression 'meta character sequene' starts allways with a backslash ('\').