Skip to content

Elements

Mark Whitaker edited this page Aug 5, 2022 · 3 revisions

Contents

Overview

Elements are the building blocks that make up a regex. Building a regex with RegexBuilder broadly consists of the following:

var regex = new RegexBuilder()

    // add elements here

    .buildRegex();

All the element methods return a reference to the RegexBuilder object, so they can be called in a fluent chained style.

With the exception of the anchor methods, all the methods below take an optional RegexQuantifier parameter which is used to define how many instances of the element should be matched. Without a quantifier parameter, each method matches the element exactly once. Read more about quantifiers in Quantifiers.

All elements may be added to a group: see Groups for more details on those.

Simple text matches

Method Matches Raw regex equivalent
anyCharacter() Any character at all, including white space and control characters .
carriageReturn() A carriage return character \r
digit() Any decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) [0-9]
hexDigit() Any hexadecimal digit (uppercase or lowercase letters) [a-fA-F0-9]
letter() Any Unicode letter, uppercase or lowercase \p{L}
letterOrDigit() Any letter (uppercase or lowercase) or digit [\p{L}0-9]
lineFeed() A line feed character \n
lowercaseHexDigit() Any hexadecimal digit (lowercase letters only) [a-f0-9]
lowercaseLetter() Any lowercase Unicode letter \p{Ll}
nonDigit() Any character that is not a decimal digit (including white space and control characters) [^0-9]
nonHexDigit() Any character that is not a hexadecimal digit [^a-fA-F0-9]
nonLetter() Any character that is not a Unicode letter (including white space and control characters) \\P{L}
nonLetterOrDigit() Any character that is not a letter or digit (including white space and control characters) [^\p{L}0-9]
nonWhitespace() Any non-white space character (including control characters) \S
nonWordCharacter() Any character that is not a letter, decimal digit or underscore (including white space and control characters) [^\p{L}0-9_]
possibleWhitespace() Zero or more white space characters \s*
space() A space character
tab() A tab character \t
uppercaseHexDigit() Any hexadecimal digit (uppercase letters only) [A-F0-9]
uppercaseLetter() Any uppercase Unicode letter \p{Lu}
whitespace() Any white space character (space, tab, newline or carriage return) \s
wordCharacter() Any letter, decimal digit or underscore [\p{L}0-9_]

User-defined text matches

Method Matches
text(text) Any arbitrary text. If the string passed in contains reserved regex characters they will be escaped to avoid the regex doing unexpected things. For example, if you pass the string ":)", it will be escaped to ":\)".
regexText(text) Raw regex text. Reserved regex characters are not escaped, so this is only for tinkerers who know what they're doing.
anyCharacterFrom(string characters) Any of the characters in the supplied string. For example, anyCharacterFrom("abc") will match "a", "b" or "c".
anyCharacterExcept(string characters) Any characters not in the supplied string (including white space and control characters). For example, anyCharacterExcept("abc") will match "1", "d" or "&" but not "a".
anyOf(strings) Any of the strings supplied, in their entirety. For example, anyOf(new []{"Mr", "Mrs", "Ms"}) will match "Mr", "Mrs" or "Ms" but not "M".

Anchors

Anchors (known in a regex world as "zero-width assertions") match a point in a string that isn't represented by a character (hence "zero-width"). They're useful for crafting regexes that match text occurring at a particular position within a string, rather than just anywhere.

Method Matches Raw regex equivalent
startOfString() The start of the string. ^
endOfString() The end of the string. $
wordBoundary() The boundary between a word character (letter, digit or underscore) and a non-word character. \b
Clone this wiki locally