SF Area/Download
Docs:
User
Internal/Devel
Support This Project
SourceForge.net Logo

CSRegEx Developer Documentation

Basics

Basic usage involves three functions.

  • Compile()
  • Match()
  • MatchRE()

Before a string can be searched, the regular expression must be converted (compiled) into an internal format. Once this has been done, matching with the same regular expression can be done as many times as wanted without recompiling.

Now supports UNICODE! All functions will use wchar_t instead of chars if UNICODE support is enabled in your compiler.

The matching engine is non-recursive and has no call overhead and backtracking is done mostly by reducing an index count by one.

features

Elements of a regular expression.

  • letters letters are matched on a one on one basis.
  • . matches any char, even newlines.
  • \c characters can be escaped at any time and will have no special meaning except for below.
  • \# Exceptions to the above are the numbers # = 0 to 9 which are reserved for backreferences.
  • [] matches a single character from a list. - signifies a range. ^ as 1st char negates the sets. anything immediately after the opening [ or ^ is treated as a normal character and has no special meaning unless it is a backslash (ie. becomes an escape sequence). This means you can put [ ] - or any other character and that character will be inserted into the set. [[] and []] are valid sets although they look strange. The first one consists of only '[' and the other ']'.


Special escape characters:

  • \n 0x0A newline (unix & dos differences not parsed. You have to handle that on your own).
  • \r 0x0D linefeed
  • \t 0x09 tab
  • \a 0x07 bell
  • \b 0x08 backspace
  • \f 0x0C formfeed
  • \v 0x0B vertical tab
  • \x## or \X## hex code for character.


Grouping

( ) You can group any items together. These can be backreferenced with \1 to \9 depending on the opening bracket count. Backreferences will match the exact string matched inside the ().

(?: ) If you don't need backreferences, place a ?: after the opening round bracket. This will save time and space during the matching process.

| Match an alternate set of items if the previous set did not match.

Quantifiers

? match previous item 0 or 1 time.
* match previous item 0 to infinite times.
+ match previous item 1 to infinite times.
{n} match previous item exactly n times. n is a number from 1 to 253.
{n,} match previous item n to infinite times. n is a number from 0 to 253.
{n,m} match previous item n to m times. n and m are numbers from 1 to 253. n can also be 0.

The ranges for n and m in the last 3 qualifiers have a maximum of 253 because one byte is used for the range. Also, 255 is used for infinite, so that's reserved. And if want to be able to use the compiled string just as a normal string, 0 can't be used internally either. So that makes 2 numbers that are reserved. So 255-2 = 253 and that's the largest value you can use.

? can be used after the above qualifers to make it lazy. ie. it'll only match as little as necessarry. Ex: c.*?t will return "cat" from "catabctch" instead of the usual "catabct"

^ anchors to the front of the string. Can only be used as 1st char, otherwise, normal char.
$ anchors to the end of the string. Can only be used as last char, otherwise, normal char.


Docs for CSRegEx created on Tue Dec 11 14:36:52 2007 by Doxygen 1.4.3


Webmaster: Cléo Saulnier