Introduction

Meta characters are the building blocks of regular expressions. Characters in RegEx are understood to be either a meta character with a special meaning or a regular character with a literal meaning. The following are some common RegEx meta characters and examples of what they would match or not match in RegEx.

  • \ : Marks the next character as either a special character or a literal. For example, n matches the character n, whereas \n matches a newline character. The sequence \\ matches \ and \( matches (. \d matches whole number in 0 – 9.
  • . : Matches any single character except a newline character.

Anchor Meta Characters

  • ^: Matches beginning of the string; in multiline mode, it must occur at the beginning of the line
  • $ : Matched end of the string or before \n at the end of the string; in multiline mode, it must occur at the end of the line or before \n at the end of the line.
  • \Z : Matches only the end of a string, or before a newline character at the end.
  • \A : Matches only at beginning of a string, or before a newline character at the end.
  • \z : Matches only the end of a string.
  • \b : Matches a word boundary, that is, the position between a word and a space.
  • \B : Matches a non word boundary.

Grouping Meta Characters

  • (abc) : Character group, matches the characters abc in that exact order.

Alteration Meta Characters

  • x|y : Matches either x or y.

Quantified Meta Characters

Quantifiers can be used to specify number of times a token should be matched by the regex engine.

  • * : Matches the preceding character zero or more times.
  • + : Matches the preceding character one or more times.
  • ? : Matches the preceding character zero or one time.
  • {n} : n is a non-negative integer. Matches exactly n times.
  • {n,} : n is a non-negative integer. Matches the preceding character at least n times.
  • {n,m} : The m and n variables are non-negative integers. Matches the preceding character at least n and at most m times.

Range Meta Characters

  • [a-z] : Matches any character in the specified range. For example, [a-z] matches any lowercase alphabetic character in the English alphabet.
  • [^m-z] : Matches any character that is not in the specified range. For example, [m-z] matches any character that is not in the range m through z.
  • [xyz] : A character set. Matches any one of the enclosed characters. For example, [abc] matches the a in plain.
  • [^xyz] : Matches any character that is not enclosed. For example, [^abc] matches the p in plain.

Shorthand Meta Characters

  • \d : Matches a digit character.
  • \D : Matches a non-digit character.
  • \f : Matches a form-feed character.
  • \n : Matches a newline character.
  • \r : Matches a carriage return character.
  • \s : Matches any white space including spaces, tabs, form-feed characters, and so on.
  • \S : Matches any non-white space character.
  • \t : Matches a tab character.
  • \v : Matches a vertical tab character.
  • \w : Matches any word character including underscore. This expression is equivalent to [A-Za-z0-9_].
  • \W : Matches any non-word character. This expression is equivalent to [^A-Za-z0-9_].

Escaping meta characters

Backslash (\) in a regular expression to remove the meaning of a meta character. This s also called “escaping a character”. Let’s say we want to find literally a dot. Not “any character”, but just a dot. To use a special character as a regular one, prepend it with a backslash.

\d\.\d = 5.1

Example

  
\d\d\d = 327
\d\d\d ≠  24631

\w\w\w = dog
\w\w\w = 467
\w\w\w ≠  boat // Doesn't return boat because boat contains 4 characters.


pand[ora] = panda
pand[ora] = pando
pand[ora] ≠  pandora


pand(ora) = pandora
pand(123) = pand123
pand(oar) ≠  pandora

pand(abc|123) = pandora OR pand123

colou?r = colour (u is found 1 time)
colou?r = color (u is found 0 times)


tre*= tree (e is found 2 times)
tre* = tre (e is found 1 time)
tre* = tr (e is found 0 times)
tre* ≠  trees

tre+ = tree (e is found 2 times)
tre+ = tre (e is found 1 time)
tre+ ≠  tr (e is found 0 times)

ton. = tone
ton. = ton#
ton. ≠  tones

tr.* = tr
tr.* = tre
tr.* = tree
tr.* = trees
tr.* = trough
tr.* = treadmill

\d{3} = 836
\d{3} = 139
pand[ora]{2} = pandar
pand(ora){2} = pandoraora
pand[ora]{2} ≠  pandora

\d{2,5} = 97430
\d{2,5} = 9743
\d{2,5} = 97
\d{2,5} ≠  9

c\nh  = c
        h

[A-Za-z]= A
[A-Za-z]≠ AB
[50-99] = [0-9] // [50-99] does not mean integer from 55 to 99

Reference

Metacharacter