Namespace: go.std.regexp.syntax
v1.0Contents
Summary
Provides a low-level interface to the regexp/syntax package.
Package syntax parses regular expressions into parse trees and compiles
parse trees into programs. Most clients of regular expressions will use the
facilities of package regexp (such as Compile and Match) instead of this package.
# Syntax
The regular expression syntax understood by this package when parsing with the Perl flag is as follows.
Parts of the syntax can be disabled by passing alternate flags to Parse.
Single characters:
. any character, possibly including newline (flag s=true)
[xyz] character class
[^xyz] negated character class
\d Perl character class
\D negated Perl character class
[[:alpha:]] ASCII character class
[[:^alpha:]] negated ASCII character class
\pN Unicode character class (one-letter name)
\p{Greek} Unicode character class
\PN negated Unicode character class (one-letter name)
\P{Greek} negated Unicode character class
Composites:
xy x followed by y
x|y x or y (prefer x)
Repetitions:
x* zero or more x, prefer more
x+ one or more x, prefer more
x? zero or one x, prefer one
x{n,m} n or n+1 or ... or m x, prefer more
x{n,} n or more x, prefer more
x{n} exactly n x
x*? zero or more x, prefer fewer
x+? one or more x, prefer fewer
x?? zero or one x, prefer zero
x{n,m}? n or n+1 or ... or m x, prefer fewer
x{n,}? n or more x, prefer fewer
x{n}? exactly n x
Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n}
reject forms that create a minimum or maximum repetition count above 1000.
Unlimited repetitions are not subject to this restriction.
Grouping:
(re) numbered capturing group (submatch)
(?P<name>re) named & numbered capturing group (submatch)
(?:re) non-capturing group
(?flags) set flags within current group; non-capturing
(?flags:re) set flags during re; non-capturing
Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). The flags are:
i case-insensitive (default false)
m multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s let . match \n (default false)
U ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)
Empty strings:
^ at beginning of text or line (flag m=true)
$ at end of text (like \z not \Z) or line (flag m=true)
\A at beginning of text
\b at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B not at ASCII word boundary
\z at end of text
Escape sequences:
\a bell (== \007)
\f form feed (== \014)
\t horizontal tab (== \011)
\n newline (== \012)
\r carriage return (== \015)
\v vertical tab character (== \013)
\* literal *, for any punctuation character *
\123 octal character code (up to three digits)
\x7F hex character code (exactly two digits)
\x{10FFFF} hex character code
\Q...\E literal text ... even if ... has punctuation
Character class elements:
x single character
A-Z character range (inclusive)
\d Perl character class
[:foo:] ASCII character class foo
\p{Foo} Unicode character class Foo
\pF Unicode character class F (one-letter name)
Named character classes as character class elements:
[\d] digits (== \d)
[^\d] not digits (== \D)
[\D] not digits (== \D)
[^\D] not not digits (== \d)
[[:name:]] named ASCII class inside character class (== [:name:])
[^[:name:]] named ASCII class inside negated character class (== [:^name:])
[\p{Name}] named Unicode property inside character class (== \p{Name})
[^\p{Name}] named Unicode property inside negated character class (== \P{Name})
Perl character classes (all ASCII-only):
\d digits (== [0-9])
\D not digits (== [^0-9])
\s whitespace (== [\t\n\f\r ])
\S not whitespace (== [^\t\n\f\r ])
\w word characters (== [0-9A-Za-z_])
\W not word characters (== [^0-9A-Za-z_])
ASCII character classes:
[[:alnum:]] alphanumeric (== [0-9A-Za-z])
[[:alpha:]] alphabetic (== [A-Za-z])
[[:ascii:]] ASCII (== [\x00-\x7F])
[[:blank:]] blank (== [\t ])
[[:cntrl:]] control (== [\x00-\x1F\x7F])
[[:digit:]] digits (== [0-9])
[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]] lower case (== [a-z])
[[:print:]] printable (== [ -~] == [ [:graph:]])
[[:punct:]] punctuation (== [!-/:-@[-`{-~])
[[:space:]] whitespace (== [\t\n\v\f\r ])
[[:upper:]] upper case (== [A-Z])
[[:word:]] word characters (== [0-9A-Za-z_])
[[:xdigit:]] hex digit (== [0-9A-Fa-f])
Unicode character classes are those in unicode.Categories and unicode.Scripts.
Index
- *EmptyOp
- *Error
- *ErrorCode
- *Flags
- *Inst
- *InstOp
- *Op
- *Prog
- *Regexp
- ClassNL
- Compile
- DotNL
- EmptyBeginLine
- EmptyBeginText
- EmptyEndLine
- EmptyEndText
- EmptyNoWordBoundary
- EmptyOp
- EmptyOpContext
- EmptyWordBoundary
- ErrInternalError
- ErrInvalidCharClass
- ErrInvalidCharRange
- ErrInvalidEscape
- ErrInvalidNamedCapture
- ErrInvalidPerlOp
- ErrInvalidRepeatOp
- ErrInvalidRepeatSize
- ErrInvalidUTF8
- ErrMissingBracket
- ErrMissingParen
- ErrMissingRepeatArgument
- ErrNestingDepth
- ErrTrailingBackslash
- ErrUnexpectedParen
- Error
- ErrorCode
- Flags
- FoldCase
- Inst
- InstAlt
- InstAltMatch
- InstCapture
- InstEmptyWidth
- InstFail
- InstMatch
- InstNop
- InstOp
- InstRune
- InstRune1
- InstRuneAny
- InstRuneAnyNotNL
- IsWordChar
- Literal
- MatchNL
- NonGreedy
- OneLine
- Op
- OpAlternate
- OpAnyChar
- OpAnyCharNotNL
- OpBeginLine
- OpBeginText
- OpCapture
- OpCharClass
- OpConcat
- OpEmptyMatch
- OpEndLine
- OpEndText
- OpLiteral
- OpNoMatch
- OpNoWordBoundary
- OpPlus
- OpQuest
- OpRepeat
- OpStar
- OpWordBoundary
- POSIX
- Parse
- Perl
- PerlX
- Prog
- Regexp
- Simple
- UnicodeGroups
- WasDollar
- arrayOfEmptyOp
- arrayOfError
- arrayOfErrorCode
- arrayOfFlags
- arrayOfInst
- arrayOfInstOp
- arrayOfOp
- arrayOfProg
- arrayOfRegexp
Legend
-
Constant
Variable
Function
Macro
Special form
Type
GoVar
Receiver/Method
Constants
Constants are variables with :const true in their metadata. Joker currently does not recognize them as special; as such, it allows redefining them or their values.-
(None.)
Variables
-
ClassNL
GoObject v1.0allow character classes like [^a-z] and [[:space:]] to match newline
-
DotNL
GoObject v1.0allow . to match newline
-
EmptyBeginLine
GoObject v1.0 -
EmptyBeginText
GoObject v1.0 -
EmptyEndLine
GoObject v1.0 -
EmptyEndText
GoObject v1.0 -
EmptyNoWordBoundary
GoObject v1.0 -
EmptyWordBoundary
GoObject v1.0 -
ErrInternalError
GoObject v1.0Unexpected error
-
ErrInvalidCharClass
GoObject v1.0Parse errors
-
ErrInvalidCharRange
GoObject v1.0 -
ErrInvalidEscape
GoObject v1.0 -
ErrInvalidNamedCapture
GoObject v1.0 -
ErrInvalidPerlOp
GoObject v1.0 -
ErrInvalidRepeatOp
GoObject v1.0 -
ErrInvalidRepeatSize
GoObject v1.0 -
ErrInvalidUTF8
GoObject v1.0 -
ErrMissingBracket
GoObject v1.0 -
ErrMissingParen
GoObject v1.0 -
ErrMissingRepeatArgument
GoObject v1.0 -
ErrNestingDepth
GoObject v1.0 -
ErrTrailingBackslash
GoObject v1.0 -
ErrUnexpectedParen
GoObject v1.0 -
FoldCase
GoObject v1.0case-insensitive match
-
InstAlt
GoObject v1.0 -
InstAltMatch
GoObject v1.0 -
InstCapture
GoObject v1.0 -
InstEmptyWidth
GoObject v1.0 -
InstFail
GoObject v1.0 -
InstMatch
GoObject v1.0 -
InstNop
GoObject v1.0 -
InstRune
GoObject v1.0 -
InstRune1
GoObject v1.0 -
InstRuneAny
GoObject v1.0 -
InstRuneAnyNotNL
GoObject v1.0 -
Literal
GoObject v1.0treat pattern as literal string
-
MatchNL
GoObject v1.0 -
NonGreedy
GoObject v1.0make repetition operators default to non-greedy
-
OneLine
GoObject v1.0treat ^ and $ as only matching at beginning and end of text
-
OpAlternate
GoObject v1.0matches alternation of Subs
-
OpAnyChar
GoObject v1.0matches any character
-
OpAnyCharNotNL
GoObject v1.0matches any character except newline
-
OpBeginLine
GoObject v1.0matches empty string at beginning of line
-
OpBeginText
GoObject v1.0matches empty string at beginning of text
-
OpCapture
GoObject v1.0capturing subexpression with index Cap, optional name Name
-
OpCharClass
GoObject v1.0matches Runes interpreted as range pair list
-
OpConcat
GoObject v1.0matches concatenation of Subs
-
OpEmptyMatch
GoObject v1.0matches empty string
-
OpEndLine
GoObject v1.0matches empty string at end of line
-
OpEndText
GoObject v1.0matches empty string at end of text
-
OpLiteral
GoObject v1.0matches Runes sequence
-
OpNoMatch
GoObject v1.0matches no strings
-
OpNoWordBoundary
GoObject v1.0matches word non-boundary `\B`
-
OpPlus
GoObject v1.0matches Sub[0] one or more times
-
OpQuest
GoObject v1.0matches Sub[0] zero or one times
-
OpRepeat
GoObject v1.0matches Sub[0] at least Min times, at most Max (Max == -1 is no limit)
-
OpStar
GoObject v1.0matches Sub[0] zero or more times
-
OpWordBoundary
GoObject v1.0matches word boundary `\b`
-
POSIX
GoObject v1.0POSIX syntax
-
Perl
GoObject v1.0as close to Perl as possible
-
PerlX
GoObject v1.0allow Perl extensions
-
Simple
GoObject v1.0regexp contains no counted repetition
-
UnicodeGroups
GoObject v1.0allow \p{Han}, \P{Han} for Unicode group and negation
-
WasDollar
GoObject v1.0regexp OpEndText was $, not \z
Functions, Macros, and Special Forms
-
Compile
Function v1.0(Compile re)
Compile compiles the regexp into a program to be executed.
The regexp should have been simplified already (returned from re.Simplify).
Go input arguments: (re *Regexp)
Go returns: (*Prog, error)
Joker input arguments: [^*Regexp re]
Joker returns: [^*Prog, ^Error] -
EmptyOpContext
Function v1.0(EmptyOpContext r1 r2)
EmptyOpContext returns the zero-width assertions
satisfied at the position between the runes r1 and r2.
Passing r1 == -1 indicates that the position is
at the beginning of the text.
Passing r2 == -1 indicates that the position is
at the end of the text.
Go input arguments: (r1 rune, r2 rune)
Go returns: EmptyOp
Joker input arguments: [^Char r1, ^Char r2]
Joker returns: ^EmptyOp -
IsWordChar
Function v1.0(IsWordChar r)
IsWordChar reports whether r is considered a “word character”
during the evaluation of the \b and \B zero-width assertions.
These assertions are ASCII-only: the word characters are [A-Za-z0-9_].
Go input arguments: (r rune)
Go returns: bool
Joker input arguments: [^Char r]
Joker returns: ^Boolean -
Parse
Function v1.0(Parse s flags)
Parse parses a regular expression string s, controlled by the specified
Flags, and returns a regular expression parse tree. The syntax is
described in the top-level comment.
Go input arguments: (s string, flags Flags)
Go returns: (*Regexp, error)
Joker input arguments: [^String s, ^Flags flags]
Joker returns: [^*Regexp, ^Error]
Types
-
*EmptyOp
Concrete Type v1.0An EmptyOp specifies a kind or mixture of zero-width assertions.
-
*Error
Concrete Type v1.0An Error describes a failure to parse a regular expression
and gives the offending expression.
-
Error
Receiver for *Error v1.0([])
-
*ErrorCode
Concrete Type v1.0An ErrorCode describes a failure to parse a regular expression.
-
*Flags
Concrete Type v1.0Flags control the behavior of the parser and record information about regexp context.
-
*Inst
Concrete Type v1.0An Inst is a single instruction in a regular expression program.
-
MatchEmptyWidth
Receiver for *Inst v1.0([before after])
MatchEmptyWidth reports whether the instruction matches
an empty string between the runes before and after.
It should only be called when i.Op == InstEmptyWidth.
-
MatchRune
Receiver for *Inst v1.0([r])
MatchRune reports whether the instruction matches (and consumes) r.
It should only be called when i.Op == InstRune.
-
MatchRunePos
Receiver for *Inst v1.0([r])
MatchRunePos checks whether the instruction matches (and consumes) r.
If so, MatchRunePos returns the index of the matching rune pair
(or, when len(i.Rune) == 1, rune singleton).
If not, MatchRunePos returns -1.
MatchRunePos should only be called when i.Op == InstRune.
-
String
Receiver for *Inst v1.0([])
-
*InstOp
Concrete Type v1.0An InstOp is an instruction opcode.
-
*Op
Concrete Type v1.0An Op is a single regular expression operator.
-
*Prog
Concrete Type v1.0A Prog is a compiled regular expression program.
-
Prefix
Receiver for *Prog v1.0([])
Prefix returns a literal string that all matches for the
regexp must start with. Complete is true if the prefix
is the entire match.
-
StartCond
Receiver for *Prog v1.0([])
StartCond returns the leading empty-width conditions that must
be true in any match. It returns ^EmptyOp(0) if no matches are possible.
-
String
Receiver for *Prog v1.0([])
-
*Regexp
Concrete Type v1.0A Regexp is a node in a regular expression syntax tree.
-
CapNames
Receiver for *Regexp v1.0([])
CapNames walks the regexp to find the names of capturing groups.
-
Equal
Receiver for *Regexp v1.0([y])
Equal reports whether x and y have identical structure.
-
MaxCap
Receiver for *Regexp v1.0([])
MaxCap walks the regexp to find the maximum capture index.
-
Simplify
Receiver for *Regexp v1.0([])
Simplify returns a regexp equivalent to re but without counted repetitions
and with various other simplifications, such as rewriting /(?:a+)+/ to /a+/.
The resulting regexp will execute correctly but its string representation
will not produce the same parse tree, because capturing parentheses
may have been duplicated or removed. For example, the simplified form
for /(x){1,2}/ is /(x)(x)?/ but both parentheses capture as $1.
The returned regexp may share structure with or be the original.
-
String
Receiver for *Regexp v1.0([])
-
EmptyOp
Concrete Type v1.0An EmptyOp specifies a kind or mixture of zero-width assertions.
-
Error
Concrete Type v1.0An Error describes a failure to parse a regular expression
and gives the offending expression.
-
ErrorCode
Concrete Type v1.0An ErrorCode describes a failure to parse a regular expression.
-
String
Receiver for ErrorCode v1.0([])
-
Flags
Concrete Type v1.0Flags control the behavior of the parser and record information about regexp context.
-
Inst
Concrete Type v1.0An Inst is a single instruction in a regular expression program.
-
InstOp
Concrete Type v1.0An InstOp is an instruction opcode.
-
String
Receiver for InstOp v1.0([])
-
Op
Concrete Type v1.0An Op is a single regular expression operator.
-
String
Receiver for Op v1.0([])
-
Prog
Concrete Type v1.0A Prog is a compiled regular expression program.
-
Regexp
Concrete Type v1.0A Regexp is a node in a regular expression syntax tree.
-
arrayOfEmptyOp
Concrete Type v1.0An EmptyOp specifies a kind or mixture of zero-width assertions.
-
arrayOfError
Concrete Type v1.0An Error describes a failure to parse a regular expression
and gives the offending expression.
-
arrayOfErrorCode
Concrete Type v1.0An ErrorCode describes a failure to parse a regular expression.
-
arrayOfFlags
Concrete Type v1.0Flags control the behavior of the parser and record information about regexp context.
-
arrayOfInst
Concrete Type v1.0An Inst is a single instruction in a regular expression program.
-
arrayOfInstOp
Concrete Type v1.0An InstOp is an instruction opcode.
-
arrayOfOp
Concrete Type v1.0An Op is a single regular expression operator.
-
arrayOfProg
Concrete Type v1.0A Prog is a compiled regular expression program.
-
arrayOfRegexp
Concrete Type v1.0A Regexp is a node in a regular expression syntax tree.