Skip to content
Mark Whitaker edited this page Aug 5, 2022 · 2 revisions

Contents

Overview

Groups are used in a regex for one (or both) of two purposes:

  1. To group a number of elements together so a quantifier can be applied to the whole group.
  2. To "remember" part of the text matched by the regex so we can extract it later by indexing into the object returned by RegExp.exec().

Group quantifiers

Group quantifiers are pretty self-explanatory. A quantifier passed to group() will apply to the whole group. For example, this is a very simple regex to match a normal sentence:

const regex = new RegexBuilder()
    .group(r => r
        .wordCharacter(RegexQuantifier.oneOrMore)
        .whitespace(RegexQuantifier.oneOrMore),
        RegexQuantifier.oneOrMore
    )
    .wordCharacter(RegexQuantifier.oneOrMore)
    .text(".")
    .buildRegex();

Remembering parts of the match

Say we want to match a person's name (two consecutive words each beginning with a capital letter) and then greet them by their first name, we could build a regex like this:

const regex = new RegexBuilder()
    .wordBoundary()
    .group(r => r
        .uppercaseLetter()
        .lowercaseLetter(RegexQuantifier.oneOrMore)
    )
    .whitespace()
    .uppercaseLetter()
    .lowercaseLetter(RegexQuantifier.oneOrMore)
    .wordBoundary()
    .buildRegex();

We can then extract the first name from a successful match like this:

const match = regex.exec(inputString);
const firstName = match[1];

Note that match is indexed from 1, not 0. For reasons documented elsewhere, match[0] will return the whole matched string.

Nesting groups

As with raw regexes, RegexBuilder allows you to nest groups to arbitrary depth. If you use capturing groups, match[1] will refer to the first started group, and so on. For example:

const regex = new RegexBuilder()
    .wordBoundary()
    .group(r1 => r1                        // start of group 1
        .group(r2 => r2                    // start of group 2
            .uppercaseLetter()
        )                                  // end of group 2
        .lowercaseLetter(RegexQuantifier.oneOrMore),
        RegexQuantifier.oneOrMore
    )                                      // end of group 1
    .wordBoundary()
    .buildRegex();

const match = regex.exec("sorry Dave, I can't let you do that");
const name = match[1];     // "Dave"
const initial = match[2];  // "D"

Non-capturing groups

Non-capturing groups can be used for applying a quantifier to a section of the regex, but cannot be extracted later from the object returned by RegExp.exec(). This can be useful if you have more than one group in a regex, and you don't want to a group that's purely for quantifiers to disrupt the indices of your capturing groups.

Example:

const regex = new RegexBuilder()
    .nonCapturingGroup(r => r
        .letter()
        .digit()
    )
    .buildRegex();

Named groups

Named groups enable you to use meaningful names rather than array indices to retrieve captured group values.

Example:

const regex = new RegexBuilder()
    .namedGroup("firstName", r => r
        .uppercaseLetter()
        .lowercaseLetter(RegexQuantifier.oneOrMore)
    )
    .buildRegex();

const match = regex.exec("say hello to Mark");
const firstName = match.groups.firstName;
Clone this wiki locally