Regular Expressions

Top  Previous  Next

The User Interface > Preview/Edit Window > Text/Edit Tab > Editor Toolbar... > Text Cleanup > Regular Expressions

The Text Cleanup function can work with Regular Expressions, known as RegEx.

ClipMate uses a RegEx library, written by Andrey V. Sorokin.  There is a lot of information at his website, which describes RegEx in detail.  This page is not meant to be a full reference for RegEx, but will show some simple but useful examples.


Introduction

Regular Expressions are a widely-used method of specifying patterns of text to search for. Special metacharacters allow You to specify, for instance, that a particular string You are looking for occurs at the beginning or end of a line, or contains n recurrences of a certain character.

Metacharacters

Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below.

Metacharacters - line separators

^ start of line

$ end of line

\A start of text

\Z end of text

. any character in line

Examples:

^foobar matches string 'foobar' only if it's at the beginning of line

foobar$ matches string 'foobar' only if it's at the end of line

^foobar$ matches string 'foobar' only if it's the only string in line

foob.r matches strings like 'foobar', 'foobbr', 'foob1r' and so on

Metacharacters - predefined classes

\w an alphanumeric character (including "_")

\W a nonalphanumeric

\d a numeric character

\D a non-numeric

\s any space (same as [ \t\n\r\f])

\S a non space

You may use \w, \d and \s within custom character classes.

Examples:

foob\dr matchs strings like 'foob1r', ''foob6r' and so on but not 'foobar', 'foobbr' and so on

foob[\w\s]r matchs strings like 'foobar', 'foob r', 'foobbr' and so on but not 'foob1r', 'foob=r' and so on

Metacharacters - word boundaries

\b Match a word boundary

\B Match a non-(word boundary)

A word boundary (\b) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W.

Metacharacters - iterators

Any item of a regular expression may be followed by another type of metacharacters - iterators. Using this metacharacters You can specify number of occurences of previous character, metacharacter or subexpression.

* zero or more ("greedy"), similar to {0,}

+ one or more ("greedy"), similar to {1,}

? zero or one ("greedy"), similar to {0,1}

{n} exactly n times ("greedy")

{n,} at least n times ("greedy")

{n,m} at least n but not more than m times ("greedy")

*? zero or more ("non-greedy"), similar to {0,}?

+? one or more ("non-greedy"), similar to {1,}?

?? zero or one ("non-greedy"), similar to {0,1}?

{n}? exactly n times ("non-greedy")

{n,}? at least n times ("non-greedy")

{n,m}? at least n but not more than m times ("non-greedy")

So, digits in curly brackets of the form {n,m}, specify the minimum number of times to match the item n and the maximum m. The form {n} is equivalent to {n,n} and matches exactly n times. The form {n,} matches n or more times. There is no limit to the size of n or m, but large numbers will chew up more memory and slow down r.e. execution.

If a curly bracket occurs in any other context, it is treated as a regular character.

Examples:

foob.*r matchs strings like 'foobar', 'foobalkjdflkj9r' and 'foobr'

foob.+r matchs strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr'

foob.?r matchs strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r'

fooba{2}r matchs the string 'foobaar'

fooba{2,}r matchs strings like 'foobaar', 'foobaaar', 'foobaaaar' etc.

fooba{2,3}r matchs strings like 'foobaar', or 'foobaaar' but not 'foobaaaar'

A little explanation about "greediness". "Greedy" takes as many as possible, "non-greedy" takes as few as possible. For example, 'b+' and 'b*' applied to string 'abbbbc' return 'bbbb', 'b+?' returns 'b', 'b*?' returns empty string, 'b{2,3}?' returns 'bb', 'b{2,3}' returns 'bbb'.

Hexidecimal

You can use hexidecimal codes to replace any characters with any other characters.

For example, to replace all tabs (x09) with a carriage-return and linebreak (x0D x0A), use this:

Find: \x09

Replace: \x0D\x0A

More Examples:

hello matchs string 'hello'

\^FooBarPtr matchs '^FooBarPtr'

examples:

^HELLO matchs string 'HELLO' at the beginning of line.

GOODBYE$ matchs string 'GOODBYE' at the end of line.

^HELLO$ matchs string 'HELLO' if it's the only string in the line.

H.+O matches strings like 'HELLO', 'HI HO',

FOOB.R matchs strings like 'FOOBAR', 'FOOBBR', 'FOOB1R', etc.

=$    Any line ENDING with the '=' sign.

IMAGES\.NAME   - Here I'm trying to exclude "IMAGES.NAME", but  need to ESCape the period.