?
u
/
See the proposal repository for background material and discussion.
A RegExp object contains a regular expression and the associated flags.
The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
The RegExp
Each \u
u
u
\u
A number of productions in this section are given alternative definitions in section
The descriptions below use the following aliases:
(
pattern character that is matched by the (
terminal of the Furthermore, the descriptions below use the following internal data structures:
i
, m
, or s
, or if it contains the same code point more than once.
i
, m
, or s
, or contains the same code point more than once.
i
, m
, or s
, or contains the same code point more than once.
A Modifiers Record is a
Modifiers Records have the fields listed in
Field Name | Value | Meaning |
---|---|---|
[[DotAll]] | a Boolean | Indicates whether the |
[[IgnoreCase]] | a Boolean | Indicates whether the |
[[Multiline]] | a Boolean | Indicates whether the |
The
A Pattern compiles to an
The
This section is amended in B.1.2.4.
It is defined piecewise over the following productions:
The |
regular expression operator separates two alternatives. The pattern first tries to match the left |
produce
/a|ab/.exec("abc")
returns the result
/((a)|(ab))((c)|(bc))/.exec("abc")
returns the array
["abc", "a", "a", undefined, "bc", undefined, "bc"]
and not
["abc", "ab", undefined, "ab", "c", "c", undefined]
The order in which the two alternatives are tried is independent of the value of direction.
Consecutive
The resulting Matcher is independent of direction.
The
This section is amended in B.1.2.5.
It is defined piecewise over the following productions:
Even when the y
flag is used with a pattern, ^
always matches only at the beginning of Input, or (if Multilinemodifiers.[[Multiline]] is
The abstract operation IsWordChar takes arguments e (an
The
This section is amended in B.1.2.6.
It is defined piecewise over the following productions:
An escape sequence of the form \
followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (
The abstract operation CharacterSetMatcher takes arguments A (a CharSet), invert (a Boolean), direction (
The abstract operation BackreferenceMatcher takes arguments n (a positive
The abstract operation Canonicalize takes arguments ch (a character) and modifiers (a
Parentheses of the form (
)
serve both to group the components of the \
followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching (?:
)
instead.
The form (?=
)
specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside (?=
form (this unusual behaviour is inherited from Perl). This only matters when the
For example,
/(?=(a+))/.exec("baaabac")
matches the empty String immediately after the first b
and therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
The form (?!
)
specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a
not immediately followed by some positive number n of a
's, a b
, another n a
's (specified by the first \2
) and a c
. The second \2
is outside the negative lookahead, so it matches against
["baaabaac", "ba", undefined, "abaac"]
In case-insignificant matches when Unicode is ß
(U+00DF) to SS
. It may however map a code point outside the Basic Latin range to a character within, for example, ſ
(U+017F) to s
. Such characters are not mapped if Unicode is /[a-z]/i
, but they will match /[a-z]/ui
.
The
This section is amended in
It is defined piecewise over the following productions:
Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i
matches only the letters E
, F
, e
, and f
, while the pattern /[E-f]/i
matches all upper and lower-case letters in the Unicode Basic Latin block as well as the symbols [
, \
, ]
, ^
, _
, and `
.
A -
character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of
-
U+002D (HYPHEN-MINUS).A \b
, \B
, and backreferences. Inside a \b
means the backspace character, while \B
and backreferences raise errors. Using a backreference inside a
0
through 9
inclusive.General_Category
, s) is identical to a The abstract operation GetWordCharacters takes argument modifiers (a
The abstract operation UpdateModifiers takes arguments modifiers (a
The syntax of
This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the
When the same left-hand sides occurs with both [+UnicodeMode] and [~UnicodeMode] guards it is to control the disambiguation priority.
© 2024 Ron Buckton, Ecma International
All Software contained in this document ("Software") is protected by copyright and is being made available under the "BSD License", included below. This Software may be subject to third party rights (rights from parties other than Ecma International), including patent rights, and no licenses under such third party rights are granted under this license even if the third party concerned is a member of Ecma International. SEE THE ECMA CODE OF CONDUCT IN PATENT MATTERS AVAILABLE AT https://ecma-international.org/memento/codeofconduct.htm FOR INFORMATION REGARDING THE LICENSING OF PATENT CLAIMS THAT ARE REQUIRED TO IMPLEMENT ECMA INTERNATIONAL STANDARDS.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE ECMA INTERNATIONAL "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL ECMA INTERNATIONAL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.