Namespace: go.std.regexp.syntax

v1.0

Summary
Index
Constants
Variables
Functions, Macros, and Special Forms
Types

Summary

Provides a low-level interface to the regexp/syntax package.

Package syntax parses regular expressions into parse trees and compiles
parse trees into programs. Most clients of regular expressions will use the
facilities of package regexp (such as Compile and Match) instead of this package.

# Syntax

The regular expression syntax understood by this package when parsing with the Perl flag is as follows.
Parts of the syntax can be disabled by passing alternate flags to Parse.

Single characters:

. any character, possibly including newline (flag s=true)
[xyz] character class
[^xyz] negated character class
\d Perl character class
\D negated Perl character class
[[:alpha:]] ASCII character class
[[:^alpha:]] negated ASCII character class
\pN Unicode character class (one-letter name)
\p{Greek} Unicode character class
\PN negated Unicode character class (one-letter name)
\P{Greek} negated Unicode character class

Composites:

xy x followed by y
x|y x or y (prefer x)

Repetitions:

x* zero or more x, prefer more
x+ one or more x, prefer more
x? zero or one x, prefer one
x{n,m} n or n+1 or ... or m x, prefer more
x{n,} n or more x, prefer more
x{n} exactly n x
x*? zero or more x, prefer fewer
x+? one or more x, prefer fewer
x?? zero or one x, prefer zero
x{n,m}? n or n+1 or ... or m x, prefer fewer
x{n,}? n or more x, prefer fewer
x{n}? exactly n x

Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n}
reject forms that create a minimum or maximum repetition count above 1000.
Unlimited repetitions are not subject to this restriction.

Grouping:

(re) numbered capturing group (submatch)
(?P<name>re) named & numbered capturing group (submatch)
(?:re) non-capturing group
(?flags) set flags within current group; non-capturing
(?flags:re) set flags during re; non-capturing

Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). The flags are:

i case-insensitive (default false)
m multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s let . match \n (default false)
U ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)

Empty strings:

^ at beginning of text or line (flag m=true)
$ at end of text (like \z not \Z) or line (flag m=true)
\A at beginning of text
\b at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B not at ASCII word boundary
\z at end of text

Escape sequences:

\a bell (== \007)
\f form feed (== \014)
\t horizontal tab (== \011)
\n newline (== \012)
\r carriage return (== \015)
\v vertical tab character (== \013)
\* literal *, for any punctuation character *
\123 octal character code (up to three digits)
\x7F hex character code (exactly two digits)
\x{10FFFF} hex character code
\Q...\E literal text ... even if ... has punctuation

Character class elements:

x single character
A-Z character range (inclusive)
\d Perl character class
[:foo:] ASCII character class foo
\p{Foo} Unicode character class Foo
\pF Unicode character class F (one-letter name)

Named character classes as character class elements:

[\d] digits (== \d)
[^\d] not digits (== \D)
[\D] not digits (== \D)
[^\D] not not digits (== \d)
[[:name:]] named ASCII class inside character class (== [:name:])
[^[:name:]] named ASCII class inside negated character class (== [:^name:])
[\p{Name}] named Unicode property inside character class (== \p{Name})
[^\p{Name}] named Unicode property inside negated character class (== \P{Name})

Perl character classes (all ASCII-only):

\d digits (== [0-9])
\D not digits (== [^0-9])
\s whitespace (== [\t\n\f\r ])
\S not whitespace (== [^\t\n\f\r ])
\w word characters (== [0-9A-Za-z_])
\W not word characters (== [^0-9A-Za-z_])

ASCII character classes:

[[:alnum:]] alphanumeric (== [0-9A-Za-z])
[[:alpha:]] alphabetic (== [A-Za-z])
[[:ascii:]] ASCII (== [\x00-\x7F])
[[:blank:]] blank (== [\t ])
[[:cntrl:]] control (== [\x00-\x1F\x7F])
[[:digit:]] digits (== [0-9])
[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]] lower case (== [a-z])
[[:print:]] printable (== [ -~] == [ [:graph:]])
[[:punct:]] punctuation (== [!-/:-@[-`{-~])
[[:space:]] whitespace (== [\t\n\v\f\r ])
[[:upper:]] upper case (== [A-Z])
[[:word:]] word characters (== [0-9A-Za-z_])
[[:xdigit:]] hex digit (== [0-9A-Fa-f])

Unicode character classes are those in unicode.Categories and unicode.Scripts.

Index

*EmptyOp
*Error
*ErrorCode
*Flags
*Inst
*InstOp
*Op
*Prog
*Regexp
ClassNL
Compile
DotNL
EmptyBeginLine
EmptyBeginText
EmptyEndLine
EmptyEndText
EmptyNoWordBoundary
EmptyOp
EmptyOpContext
EmptyWordBoundary
ErrInternalError
ErrInvalidCharClass
ErrInvalidCharRange
ErrInvalidEscape
ErrInvalidNamedCapture
ErrInvalidPerlOp
ErrInvalidRepeatOp
ErrInvalidRepeatSize
ErrInvalidUTF8
ErrMissingBracket
ErrMissingParen
ErrMissingRepeatArgument
ErrNestingDepth
ErrTrailingBackslash
ErrUnexpectedParen
Error
ErrorCode
Flags
FoldCase
Inst
InstAlt
InstAltMatch
InstCapture
InstEmptyWidth
InstFail
InstMatch
InstNop
InstOp
InstRune
InstRune1
InstRuneAny
InstRuneAnyNotNL
IsWordChar
Literal
MatchNL
NonGreedy
OneLine
Op
OpAlternate
OpAnyChar
OpAnyCharNotNL
OpBeginLine
OpBeginText
OpCapture
OpCharClass
OpConcat
OpEmptyMatch
OpEndLine
OpEndText
OpLiteral
OpNoMatch
OpNoWordBoundary
OpPlus
OpQuest
OpRepeat
OpStar
OpWordBoundary
POSIX
Parse
Perl
PerlX
Prog
Regexp
Simple
UnicodeGroups
WasDollar
arrayOfEmptyOp
arrayOfError
arrayOfErrorCode
arrayOfFlags
arrayOfInst
arrayOfInstOp
arrayOfOp
arrayOfProg
arrayOfRegexp

Legend

Constant

Variable

Function

Macro

Special form

Type

GoVar

Receiver/Method

Constants

Constants are variables with :const true in their metadata. Joker currently does not recognize them as special; as such, it allows redefining them or their values.

(None.)

Variables

ClassNL
GoObject v1.0
allow character classes like [^a-z] and [[:space:]] to match newline
DotNL
GoObject v1.0
allow . to match newline
EmptyBeginLine
GoObject v1.0
EmptyBeginText
GoObject v1.0
EmptyEndLine
GoObject v1.0
EmptyEndText
GoObject v1.0
EmptyNoWordBoundary
GoObject v1.0
EmptyWordBoundary
GoObject v1.0
ErrInternalError
GoObject v1.0
Unexpected error
ErrInvalidCharClass
GoObject v1.0
Parse errors
ErrInvalidCharRange
GoObject v1.0
ErrInvalidEscape
GoObject v1.0
ErrInvalidNamedCapture
GoObject v1.0
ErrInvalidPerlOp
GoObject v1.0
ErrInvalidRepeatOp
GoObject v1.0
ErrInvalidRepeatSize
GoObject v1.0
ErrInvalidUTF8
GoObject v1.0
ErrMissingBracket
GoObject v1.0
ErrMissingParen
GoObject v1.0
ErrMissingRepeatArgument
GoObject v1.0
ErrNestingDepth
GoObject v1.0
ErrTrailingBackslash
GoObject v1.0
ErrUnexpectedParen
GoObject v1.0
FoldCase
GoObject v1.0
case-insensitive match
InstAlt
GoObject v1.0
InstAltMatch
GoObject v1.0
InstCapture
GoObject v1.0
InstEmptyWidth
GoObject v1.0
InstFail
GoObject v1.0
InstMatch
GoObject v1.0
InstNop
GoObject v1.0
InstRune
GoObject v1.0
InstRune1
GoObject v1.0
InstRuneAny
GoObject v1.0
InstRuneAnyNotNL
GoObject v1.0
Literal
GoObject v1.0
treat pattern as literal string
MatchNL
GoObject v1.0
NonGreedy
GoObject v1.0
make repetition operators default to non-greedy
OneLine
GoObject v1.0
treat ^ and $ as only matching at beginning and end of text
OpAlternate
GoObject v1.0
matches alternation of Subs
OpAnyChar
GoObject v1.0
matches any character
OpAnyCharNotNL
GoObject v1.0
matches any character except newline
OpBeginLine
GoObject v1.0
matches empty string at beginning of line
OpBeginText
GoObject v1.0
matches empty string at beginning of text
OpCapture
GoObject v1.0
capturing subexpression with index Cap, optional name Name
OpCharClass
GoObject v1.0
matches Runes interpreted as range pair list
OpConcat
GoObject v1.0
matches concatenation of Subs
OpEmptyMatch
GoObject v1.0
matches empty string
OpEndLine
GoObject v1.0
matches empty string at end of line
OpEndText
GoObject v1.0
matches empty string at end of text
OpLiteral
GoObject v1.0
matches Runes sequence
OpNoMatch
GoObject v1.0
matches no strings
OpNoWordBoundary
GoObject v1.0
matches word non-boundary `\B`
OpPlus
GoObject v1.0
matches Sub[0] one or more times
OpQuest
GoObject v1.0
matches Sub[0] zero or one times
OpRepeat
GoObject v1.0
matches Sub[0] at least Min times, at most Max (Max == -1 is no limit)
OpStar
GoObject v1.0
matches Sub[0] zero or more times
OpWordBoundary
GoObject v1.0
matches word boundary `\b`
POSIX
GoObject v1.0
POSIX syntax
Perl
GoObject v1.0
as close to Perl as possible
PerlX
GoObject v1.0
allow Perl extensions
Simple
GoObject v1.0
regexp contains no counted repetition
UnicodeGroups
GoObject v1.0
allow \p{Han}, \P{Han} for Unicode group and negation
WasDollar
GoObject v1.0
regexp OpEndText was $, not \z

Functions, Macros, and Special Forms

Compile
Function v1.0
```
(Compile re)
```
Compile compiles the regexp into a program to be executed.
The regexp should have been simplified already (returned from re.Simplify).

Go input arguments: (re *Regexp)

Go returns: (*Prog, error)

Joker input arguments: [^*Regexp re]

Joker returns: [^*Prog, ^Error]
EmptyOpContext
Function v1.0
```
(EmptyOpContext r1 r2)
```
EmptyOpContext returns the zero-width assertions
satisfied at the position between the runes r1 and r2.
Passing r1 == -1 indicates that the position is
at the beginning of the text.
Passing r2 == -1 indicates that the position is
at the end of the text.

Go input arguments: (r1 rune, r2 rune)

Go returns: EmptyOp

Joker input arguments: [^Char r1, ^Char r2]

Joker returns: ^EmptyOp
IsWordChar
Function v1.0
```
(IsWordChar r)
```
IsWordChar reports whether r is considered a “word character”
during the evaluation of the \b and \B zero-width assertions.
These assertions are ASCII-only: the word characters are [A-Za-z0-9_].

Go input arguments: (r rune)

Go returns: bool

Joker input arguments: [^Char r]

Joker returns: ^Boolean
Parse
Function v1.0
```
(Parse s flags)
```
Parse parses a regular expression string s, controlled by the specified
Flags, and returns a regular expression parse tree. The syntax is
described in the top-level comment.

Go input arguments: (s string, flags Flags)

Go returns: (*Regexp, error)

Joker input arguments: [^String s, ^Flags flags]

Joker returns: [^*Regexp, ^Error]

Types

*EmptyOp
Concrete Type v1.0
An EmptyOp specifies a kind or mixture of zero-width assertions.
*Error
Concrete Type v1.0
An Error describes a failure to parse a regular expression
and gives the offending expression.
Error
Receiver for *Error v1.0
```
([])
```
*ErrorCode
Concrete Type v1.0
An ErrorCode describes a failure to parse a regular expression.
*Flags
Concrete Type v1.0
Flags control the behavior of the parser and record information about regexp context.
*Inst
Concrete Type v1.0
An Inst is a single instruction in a regular expression program.
MatchEmptyWidth
Receiver for *Inst v1.0
```
([before after])
```
MatchEmptyWidth reports whether the instruction matches
an empty string between the runes before and after.
It should only be called when i.Op == InstEmptyWidth.
MatchRune
Receiver for *Inst v1.0
```
([r])
```
MatchRune reports whether the instruction matches (and consumes) r.
It should only be called when i.Op == InstRune.
MatchRunePos
Receiver for *Inst v1.0
```
([r])
```
MatchRunePos checks whether the instruction matches (and consumes) r.
If so, MatchRunePos returns the index of the matching rune pair
(or, when len(i.Rune) == 1, rune singleton).
If not, MatchRunePos returns -1.
MatchRunePos should only be called when i.Op == InstRune.
String
Receiver for *Inst v1.0
```
([])
```
*InstOp
Concrete Type v1.0
An InstOp is an instruction opcode.
*Op
Concrete Type v1.0
An Op is a single regular expression operator.
*Prog
Concrete Type v1.0
A Prog is a compiled regular expression program.
Prefix
Receiver for *Prog v1.0
```
([])
```
Prefix returns a literal string that all matches for the
regexp must start with. Complete is true if the prefix
is the entire match.
StartCond
Receiver for *Prog v1.0
```
([])
```
StartCond returns the leading empty-width conditions that must
be true in any match. It returns ^EmptyOp(0) if no matches are possible.
String
Receiver for *Prog v1.0
```
([])
```
*Regexp
Concrete Type v1.0
A Regexp is a node in a regular expression syntax tree.
CapNames
Receiver for *Regexp v1.0
```
([])
```
CapNames walks the regexp to find the names of capturing groups.
Equal
Receiver for *Regexp v1.0
```
([y])
```
Equal reports whether x and y have identical structure.
MaxCap
Receiver for *Regexp v1.0
```
([])
```
MaxCap walks the regexp to find the maximum capture index.
Simplify
Receiver for *Regexp v1.0
```
([])
```
Simplify returns a regexp equivalent to re but without counted repetitions
and with various other simplifications, such as rewriting /(?:a+)+/ to /a+/.
The resulting regexp will execute correctly but its string representation
will not produce the same parse tree, because capturing parentheses
may have been duplicated or removed. For example, the simplified form
for /(x){1,2}/ is /(x)(x)?/ but both parentheses capture as $1.
The returned regexp may share structure with or be the original.
String
Receiver for *Regexp v1.0
```
([])
```
EmptyOp
Concrete Type v1.0
An EmptyOp specifies a kind or mixture of zero-width assertions.
Error
Concrete Type v1.0
An Error describes a failure to parse a regular expression
and gives the offending expression.
ErrorCode
Concrete Type v1.0
An ErrorCode describes a failure to parse a regular expression.
String
Receiver for ErrorCode v1.0
```
([])
```
Flags
Concrete Type v1.0
Flags control the behavior of the parser and record information about regexp context.
Inst
Concrete Type v1.0
An Inst is a single instruction in a regular expression program.
InstOp
Concrete Type v1.0
An InstOp is an instruction opcode.
String
Receiver for InstOp v1.0
```
([])
```
Op
Concrete Type v1.0
An Op is a single regular expression operator.
String
Receiver for Op v1.0
```
([])
```
Prog
Concrete Type v1.0
A Prog is a compiled regular expression program.
Regexp
Concrete Type v1.0
A Regexp is a node in a regular expression syntax tree.
arrayOfEmptyOp
Concrete Type v1.0
An EmptyOp specifies a kind or mixture of zero-width assertions.
arrayOfError
Concrete Type v1.0
An Error describes a failure to parse a regular expression
and gives the offending expression.
arrayOfErrorCode
Concrete Type v1.0
An ErrorCode describes a failure to parse a regular expression.
arrayOfFlags
Concrete Type v1.0
Flags control the behavior of the parser and record information about regexp context.
arrayOfInst
Concrete Type v1.0
An Inst is a single instruction in a regular expression program.
arrayOfInstOp
Concrete Type v1.0
An InstOp is an instruction opcode.
arrayOfOp
Concrete Type v1.0
An Op is a single regular expression operator.
arrayOfProg
Concrete Type v1.0
A Prog is a compiled regular expression program.
arrayOfRegexp
Concrete Type v1.0
A Regexp is a node in a regular expression syntax tree.

Namespace: go.std.regexp.syntax

Contents

Summary

Index

Legend

Constants

Variables

ClassNL

DotNL

EmptyBeginLine

EmptyBeginText

EmptyEndLine

EmptyEndText

EmptyNoWordBoundary

EmptyWordBoundary

ErrInternalError

ErrInvalidCharClass

ErrInvalidCharRange

ErrInvalidEscape

ErrInvalidNamedCapture

ErrInvalidPerlOp

ErrInvalidRepeatOp

ErrInvalidRepeatSize

ErrInvalidUTF8

ErrMissingBracket

ErrMissingParen

ErrMissingRepeatArgument

ErrNestingDepth

ErrTrailingBackslash

ErrUnexpectedParen

FoldCase

InstAlt

InstAltMatch

InstCapture

InstEmptyWidth

InstFail

InstMatch

InstNop

InstRune

InstRune1

InstRuneAny

InstRuneAnyNotNL

Literal

MatchNL

NonGreedy

OneLine

OpAlternate

OpAnyChar

OpAnyCharNotNL

OpBeginLine

OpBeginText

OpCapture

OpCharClass

OpConcat

OpEmptyMatch

OpEndLine

OpEndText

OpLiteral

OpNoMatch

OpNoWordBoundary

OpPlus

OpQuest

OpRepeat

OpStar

OpWordBoundary

POSIX

Perl

PerlX

Simple

UnicodeGroups

WasDollar

Functions, Macros, and Special Forms

Compile

EmptyOpContext

IsWordChar

Parse

Types

*EmptyOp

*Error

Error