new regexp library
Henry Cejtin
henry@sourcelight.com
Thu, 21 Jun 2001 02:35:11 -0500
I don't think I would bother with returning the list of matches yet. What I
usually do is have a regexp that just handles one initial element of the
list, and then, using that, I extract the element and the rest of the string
and loop. This really pushes you to have some kind of match-regexp-against-
substring kind of thing, but that should be no problem.
This division can often be more efficient than matching everything in one big
regular expression any way. If you think about it, it is very analogous to
Prolog's cut (`!'). The notion is that once you see the start of the list,
you are NOT willing to backtrack around this choice. If you write it as one
big regular expression, you can't express that, and so the matching can often
take longer because it is going to keep these other possibilities alive. In
order to avoid any order n^2 thing, you have to stop scanning when you come
to a state which can't possibly succeed. This shouldn't be hard even with an
NFA, let alone with a DFA.