Author: ph10

Date: 2017-03-09 17:11 UTC

To: pcre-dev

Subject: [pcre-dev] Re-factored pcre2_match() needs testing



Folks,



I have just committed seriously refactored code for pcre2_match() to the

SVN repository. I have not yet updated the build system or the

documentation, which I will do over the next few weeks. There won't be a

new release for several months, but in the meantime it would be nice if

anybody can run tests on the new code to try to shake it down as much as

possible. (It runs all the current tests, of course, at least on my

box.)



The JIT code is not yet updated to track the interpreter changes (see

below) but Zoltán will be doing that in due course. As well as a lot of

code tidies, the main changes are as follows:



1. Backtracking is no longer implemented by recursive function calls,

and therefore does not use the system stack. The --disable-stack-for-

recursion build option is obsolete (I will make it give a warning). Once

this is released, the regular reports of "stack exceeded" bugs should go

away. Yay! Backtracking is implemented by using vector of fixed size

"frames" (size depends on the number of captures in a pattern). An

initial 10K vector (enough for ~50 frames) is allocated on the stack,

but if this is too small, heap memory is used.



2. The "match limit" and "match limit recursion" features still work.

The first limits the number of backtracking points that are ever

established, which is effectively a limit on computing resource. The

second limits the depth of nested backtracking, which is effectively a

limit on the amount of heap memory that is used. I may change the name

of "match limit recursion" to something more suitable - perhaps "match

limit depth" though of course the old name will be a synonym.



3. The new implementation now allows backtracking into (possibly

recursive) subroutine calls within the pattern, which is how Perl acts.

It would be easy to add a new option to force these calls to be atomic,

but I would like to be sure that such an option is wanted/needed before

adding it. An individual pattern can always use, for example, (?>(?1))

instead of just (?1) if atomic behaviour is wanted.



4. When a callout is called, a pointer to the ovector is made available.

Formerly, this was the ovector supplied by the caller in a match_data

block. Now it is an internal private vector.



I ran some timing tests on the testdata/testinput1 file and on my Linux

box the new interpreter seems to run a bit faster than the old one.



All feedback is welcome!



Philip



--

Philip Hazel