-
Notifications
You must be signed in to change notification settings - Fork 0
Elements
Elements are the building blocks that make up a regex. Building a regex with RegexBuilder
broadly consists of the following:
var regex = new RegexBuilder()
// add elements here
.buildRegex();
All the element methods return a reference to the RegexBuilder
object, so they can be called in a fluent chained style.
With the exception of the anchor methods, all the methods below take an optional RegexQuantifier
parameter which is used to define how many instances of the element should be matched. Without a quantifier parameter, each method matches the element exactly once. Read more about quantifiers in Quantifiers.
All elements may be added to a group: see Groups for more details on those.
Method | Matches | Raw regex equivalent |
---|---|---|
anyCharacter() |
Any character at all, including white space and control characters | . |
carriageReturn() |
A carriage return character | \r |
digit() |
Any decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) | [0-9] |
hexDigit() |
Any hexadecimal digit (uppercase or lowercase letters) | [a-fA-F0-9] |
letter() |
Any Unicode letter, uppercase or lowercase | \p{L} |
letterOrDigit() |
Any letter (uppercase or lowercase) or digit | [\p{L}0-9] |
lineFeed() |
A line feed character | \n |
lowercaseHexDigit() |
Any hexadecimal digit (lowercase letters only) | [a-f0-9] |
lowercaseLetter() |
Any lowercase Unicode letter | \p{Ll} |
nonDigit() |
Any character that is not a decimal digit (including white space and control characters) | [^0-9] |
nonHexDigit() |
Any character that is not a hexadecimal digit | [^a-fA-F0-9] |
nonLetter() |
Any character that is not a Unicode letter (including white space and control characters) | \\P{L} |
nonLetterOrDigit() |
Any character that is not a letter or digit (including white space and control characters) | [^\p{L}0-9] |
nonWhitespace() |
Any non-white space character (including control characters) | \S |
nonWordCharacter() |
Any character that is not a letter, decimal digit or underscore (including white space and control characters) | [^\p{L}0-9_] |
possibleWhitespace() |
Zero or more white space characters | \s* |
space() |
A space character | |
tab() |
A tab character | \t |
uppercaseHexDigit() |
Any hexadecimal digit (uppercase letters only) | [A-F0-9] |
uppercaseLetter() |
Any uppercase Unicode letter | \p{Lu} |
whitespace() |
Any white space character (space, tab, newline or carriage return) | \s |
wordCharacter() |
Any letter, decimal digit or underscore | [\p{L}0-9_] |
Method | Matches |
---|---|
text(text) |
Any arbitrary text. If the string passed in contains reserved regex characters they will be escaped to avoid the regex doing unexpected things. For example, if you pass the string ":)" , it will be escaped to ":\)" . |
regexText(text) |
Raw regex text. Reserved regex characters are not escaped, so this is only for tinkerers who know what they're doing. |
anyCharacterFrom(string characters) |
Any of the characters in the supplied string. For example, anyCharacterFrom("abc") will match "a" , "b" or "c" . |
anyCharacterExcept(string characters) |
Any characters not in the supplied string (including white space and control characters). For example, anyCharacterExcept("abc") will match "1" , "d" or "&" but not "a" . |
anyOf(strings) |
Any of the strings supplied, in their entirety. For example, anyOf(new []{"Mr", "Mrs", "Ms"}) will match "Mr" , "Mrs" or "Ms" but not "M" . |
Anchors (known in a regex world as "zero-width assertions") match a point in a string that isn't represented by a character (hence "zero-width"). They're useful for crafting regexes that match text occurring at a particular position within a string, rather than just anywhere.
Method | Matches | Raw regex equivalent |
---|---|---|
startOfString() |
The start of the string. | ^ |
endOfString() |
The end of the string. | $ |
wordBoundary() |
The boundary between a word character (letter, digit or underscore) and a non-word character. | \b |
RegexToolbox: Now you can be a hero without knowing regular expressions.