Introduction
Meta characters are the building blocks of regular expressions. Characters in RegEx are understood to be either a meta character with a special meaning or a regular character with a literal meaning. The following are some common RegEx meta characters and examples of what they would match or not match in RegEx.
- \ : Marks the next character as either a special character or a literal. For example, n matches the character n, whereas \n matches a newline character. The sequence \\ matches \ and \( matches (. \d matches whole number in 0 – 9.
- . : Matches any single character except a newline character.
Anchor Meta Characters
- ^: Matches beginning of the string; in multiline mode, it must occur at the beginning of the line
- $ : Matched end of the string or before
\n
at the end of the string; in multiline mode, it must occur at the end of the line or before \n at the end of the line. - \Z : Matches only the end of a string, or before a newline character at the end.
- \A : Matches only at beginning of a string, or before a newline character at the end.
- \z : Matches only the end of a string.
- \b : Matches a word boundary, that is, the position between a word and a space.
- \B : Matches a non word boundary.
Grouping Meta Characters
- (abc) : Character group, matches the characters abc in that exact order.
Alteration Meta Characters
- x|y : Matches either x or y.
Quantified Meta Characters
Quantifiers can be used to specify number of times a token should be matched by the regex engine.
- * : Matches the preceding character zero or more times.
- + : Matches the preceding character one or more times.
- ? : Matches the preceding character zero or one time.
- {n} : n is a non-negative integer. Matches exactly n times.
- {n,} : n is a non-negative integer. Matches the preceding character at least n times.
- {n,m} : The m and n variables are non-negative integers. Matches the preceding character at least n and at most m times.
Range Meta Characters
- [a-z] : Matches any character in the specified range. For example, [a-z] matches any lowercase alphabetic character in the English alphabet.
- [^m-z] : Matches any character that is not in the specified range. For example, [m-z] matches any character that is not in the range m through z.
- [xyz] : A character set. Matches any one of the enclosed characters. For example, [abc] matches the a in plain.
- [^xyz] : Matches any character that is not enclosed. For example, [^abc] matches the p in plain.
Shorthand Meta Characters
- \d : Matches a digit character.
- \D : Matches a non-digit character.
- \f : Matches a form-feed character.
- \n : Matches a newline character.
- \r : Matches a carriage return character.
- \s : Matches any white space including spaces, tabs, form-feed characters, and so on.
- \S : Matches any non-white space character.
- \t : Matches a tab character.
- \v : Matches a vertical tab character.
- \w : Matches any word character including underscore. This expression is equivalent to [A-Za-z0-9_].
- \W : Matches any non-word character. This expression is equivalent to [^A-Za-z0-9_].
Escaping meta characters
Backslash (\) in a regular expression to remove the meaning of a meta character. This s also called “escaping a character”. Let’s say we want to find literally a dot. Not “any character”, but just a dot. To use a special character as a regular one, prepend it with a backslash.
\d\.\d = 5.1
Example
\d\d\d = 327 \d\d\d ≠ 24631 \w\w\w = dog \w\w\w = 467 \w\w\w ≠ boat // Doesn't return boat because boat contains 4 characters. pand[ora] = panda pand[ora] = pando pand[ora] ≠ pandora pand(ora) = pandora pand(123) = pand123 pand(oar) ≠ pandora pand(abc|123) = pandora OR pand123 colou?r = colour (u is found 1 time) colou?r = color (u is found 0 times) tre*= tree (e is found 2 times) tre* = tre (e is found 1 time) tre* = tr (e is found 0 times) tre* ≠ trees tre+ = tree (e is found 2 times) tre+ = tre (e is found 1 time) tre+ ≠ tr (e is found 0 times) ton. = tone ton. = ton# ton. ≠ tones tr.* = tr tr.* = tre tr.* = tree tr.* = trees tr.* = trough tr.* = treadmill \d{3} = 836 \d{3} = 139 pand[ora]{2} = pandar pand(ora){2} = pandoraora pand[ora]{2} ≠ pandora \d{2,5} = 97430 \d{2,5} = 9743 \d{2,5} = 97 \d{2,5} ≠ 9 c\nh = c h [A-Za-z]= A [A-Za-z]≠ AB [50-99] = [0-9] // [50-99] does not mean integer from 55 to 99