Capturing Group

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters “d” “o” and “g”. The portion of the input string that matches the capturing group will be saved in memory for later recall via backreferences.

Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:

  1. ((A)(B(C)))
  2. (A)
  3. (B(C))
  4. (C)

 

Non-Capturing Groups

If you want a group to not be numbered by the engine, You may declare it non-capturing. A non-capturing group looks like this:

?:

 

There are other kinds of groups that use the (? syntax in combination with other characters than the colon). Regex flavors that support named capture often have an option to turn all unnamed groups into non-capturing groups.

They are particularly useful to repeat a certain pattern any number of times, since a group can also be used as an “atom”. Consider:

(\d{4}(?:-\d{2}){2} \d{2}:\d{2}.\d{3}) (.*)[\r\n]+\1 \2