Ecma 262

[ Pobierz całość w formacie PDF ]

the pattern inside Disjunction must match at the current position, but the current position is not
advanced before matching the sequel. If Disjunction can match at the current position in several ways,
only the first one is tried. Unlike other regular expression operators, there is no backtracking into a
(?=form (this unusual behaviour is inherited from Perl). This only matters when the Disjunction
contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.
For example,
/(?=(a+))/.exec("baaabac")
matches the empty string immediately after the firstband therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
The form(?! Disjunction )specifies a zero-width negative lookahead. In order for it to succeed,
the pattern inside Disjunction must fail to match at the current position. The current position is not
advanced before matching the sequel. Disjunction can contain capturing parentheses, but
backreferences to them only make sense from within Disjunction itself. Backreferences to these
capturing parentheses from elsewhere in the pattern always return undefined because the negative
lookahead must fail for the pattern to succeed. For example,
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a not immediately followed by some positive number n of a's, a b, another n a's
(specified by the first\2) and ac. The second\2is outside the negative lookahead, so it matches
against undefined and therefore always succeeds. The whole expression returns the array:
["baaabaac", "ba", undefined, "abaac"]
In case-insignificant matches all characters are implicitly converted to upper case immediately before
they are compared. However, if converting a character to upper case would expand that character into
more than one character (such as converting"�"(\u00DF) into"SS"), then the character is left as-
is instead. The character is also left as-is if it is not an ASCII character but converting it to upper case
would make it into an ASCII character. This prevents Unicode characters such as \u0131 and
\u017Ffrom matching regular expressions such as/[a-z]/i, which are only intended to match
ASCII letters. Furthermore, if these conversions were allowed, then/[^\W]/iwould match each of
a,b, & ,h, but notiors.
15.10.2.9 AtomEscape
The production AtomEscape :: DecimalEscape evaluates as follows:
1. Evaluate DecimalEscape to obtain an EscapeValue E.
2. If E is not a character then go to step 6.
- 140 -
3. Let ch be E's character.
4. Let A be a one-element CharSet containing the character ch.
5. Call CharacterSetMatcher(A, false) and return its Matcher result.
6. E must be an integer. Let n be that integer.
7. If n=0 or n>NCapturingParens then throw a SyntaxError exception.
8. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and
performs the following:
1. Let cap be x's captures internal array.
2. Let s be cap[n].
3. If s is undefined, then call c(x) and return its result.
4. Let e be x's endIndex.
5. Let len be s's length.
6. Let f be e+len.
7. If f>InputLength, return failure.
8. If there exists an integer i between 0 (inclusive) and len (exclusive) such that
Canonicalize(s[i]) is not the same character as Canonicalize(Input [e+i]), then return failure.
9. Let y be the State (f, cap).
10. Call c(y) and return its result.
The production AtomEscape :: CharacterEscape evaluates as follows:
1. Evaluate CharacterEscape to obtain a character ch.
2. Let A be a one-element CharSet containing the character ch.
3. Call CharacterSetMatcher(A, false) and return its Matcher result.
The production AtomEscape :: CharacterClassEscape evaluates as follows:
1. Evaluate CharacterClassEscape to obtain a CharSet A.
2. Call CharacterSetMatcher(A, false) and return its Matcher result.
Informative comments: An escape sequence of the form\followed by a nonzero decimal number n
matches the result of the nth set of capturing parentheses (see 15.10.2.11). It is an error if the regular
expression has fewer than n capturing parentheses. If the regular expression has n or more capturing
parentheses but the nth one is undefined because it hasn't captured anything, then the backreference
always succeeds.
15.10.2.10 CharacterEscape
The production CharacterEscape :: ControlEscape evaluates by returning the character according to
the table below:
ControlEscape Unicode Value Name Symbol
t \u0009
horizontal tab
n \u000A
line feed (new line)
v \u000B
vertical tab
f \u000C
form feed
r \u000D
carriage return
The production CharacterEscape ::c ControlLetter evaluates as follows:
1. Let ch be the character represented by ControlLetter.
2. Let i be ch's code point value.
3. Let j be the remainder of dividing i by 32.
4. Return the Unicode character numbered j.
The production CharacterEscape :: HexEscapeSequence evaluates by evaluating the CV of the
HexEscapeSequence (see 7.8.4) and returning its character result.
- 141 -
The production CharacterEscape :: UnicodeEscapeSequence evaluates by evaluating the CV of the
UnicodeEscapeSequence (see 7.8.4) and returning its character result.
The production CharacterEscape :: IdentityEscape evaluates by returning the character represented
by IdentityEscape.
15.10.2.11 DecimalEscape
The production DecimalEscape :: DecimalIntegerLiteral [lookahead " DecimalDigit] evaluates as follows.
1. Let i be the MV of DecimalIntegerLiteral.
2. If i is zero, return the EscapeValue consisting of a character (Unicode value 0000).
3. Return the EscapeValue consisting of the integer i.
The definition of the MV of DecimalIntegerLiteral is in 7.8.3.
Informative comments: If\is followed by a decimal number n whose first digit is not0, then the
escape sequence is considered to be a backreference. It is an error if n is greater than the total number
of left capturing parentheses in the entire regular expression.\0represents the NUL character and
cannot be followed by a decimal digit.
15.10.2.12 CharacterClassEscape
The production CharacterClassEscape ::devaluates by returning the ten-element set of characters
containing the characters0through9inclusive.
The production CharacterClassEscape :: D evaluates by returning the set of all characters not
included in the set returned by CharacterClassEscape ::d.
The production CharacterClassEscape ::sevaluates by returning the set of characters containing the
characters that are on the right-hand side of the WhiteSpace (7.2) or LineTerminator (7.3)
productions.
The production CharacterClassEscape :: S evaluates by returning the set of all characters not
included in the set returned by CharacterClassEscape ::s.
The production CharacterClassEscape ::wevaluates by returning the set of characters containing the
sixty-three characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 _
The production CharacterClassEscape :: W evaluates by returning the set of all characters not
included in the set returned by CharacterClassEscape ::w.
15.10.2.13 CharacterClass
The production CharacterClass :: [ [lookahead " {^}] ClassRanges ] evaluates by evaluating
ClassRanges to obtain a CharSet and returning that CharSet and the boolean false.
The production CharacterClass :: [ ^ ClassRanges ] evaluates by evaluating ClassRanges to [ Pobierz całość w formacie PDF ]

Odnośniki