view doc/regexprops-generic.texi @ 29191:7af7b64f913c

* doc/regexprops-generic.texi: change "an close-group" to "a close-group" and "illegal" to "not allowed".
author James Youngman <jay@gnu.org>
date Mon, 03 Dec 2007 09:53:02 -0800
parents ffab45d60f09
children 60ed1a52905e
line wrap: on
line source

@c Copyright (C) 1994, 1996, 1998, 2000, 2001, 2003, 2004, 2005, 2006, 2007
@c Free Software Foundation, Inc.
@c 
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.2 or
@c any later version published by the Free Software Foundation; with no
@c Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
@c Texts.  A copy of the license is included in the ``GNU Free
@c Documentation License'' file as part of this distribution.

@c this regular expression description is for: generic

@menu
* awk regular expression syntax::
* egrep regular expression syntax::
* ed regular expression syntax::
* emacs regular expression syntax::
* gnu-awk regular expression syntax::
* grep regular expression syntax::
* posix-awk regular expression syntax::
* posix-basic regular expression syntax::
* posix-egrep regular expression syntax::
* posix-extended regular expression syntax::
* posix-minimal-basic regular expression syntax::
* sed regular expression syntax::
@end menu

@node awk regular expression syntax
@subsection @samp{awk} regular expression syntax


The character @samp{.} matches any single character except the null character.  


@table @samp

@item +
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item ?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item \+
matches a @samp{+}
@item \?
matches a @samp{?}.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} can be used to quote the following character.  Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.  

GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.  

Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit matches that digit.  

The alternation operator is @samp{|}.  

The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.  

@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
@enumerate

@item At the beginning of a regular expression

@item After an open-group, signified by 
@samp{(}
@item After the alternation operator @samp{|}

@end enumerate




The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node egrep regular expression syntax
@subsection @samp{egrep} regular expression syntax


The character @samp{.} matches any single character except newline.  


@table @samp

@item +
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item ?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item \+
matches a @samp{+}
@item \?
matches a @samp{?}.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  Non-matching lists @samp{[^@dots{}]} do not ever match newline.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with parentheses @samp{()}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.  

The alternation operator is @samp{|}.  

The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.  

The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.  



The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node ed regular expression syntax
@subsection @samp{ed} regular expression syntax


The character @samp{.} matches any single character except the null character.  


@table @samp

@item \+
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item \?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item + and ? 
match themselves.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.  

The alternation operator is @samp{\|}. 

The character @samp{^} only represents the beginning of a string when it appears:
@enumerate

@item 
At the beginning of a regular expression

@item After an open-group, signified by 
@samp{\(}

@item After the alternation operator @samp{\|}

@end enumerate


The character @samp{$} only represents the end of a string when it appears:
@enumerate

@item At the end of a regular expression

@item Before a close-group, signified by 
@samp{\)}
@item Before the alternation operator @samp{\|}

@end enumerate


@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
@enumerate

@item At the beginning of a regular expression

@item After an open-group, signified by 
@samp{\(}
@item After the alternation operator @samp{\|}

@end enumerate


Intervals are specified by @samp{\@{} and @samp{\@}}.  Invalid intervals such as @samp{a\@{1z} are not accepted.  

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node emacs regular expression syntax
@subsection @samp{emacs} regular expression syntax


The character @samp{.} matches any single character except newline.  


@table @samp

@item +
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item ?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item \+
matches a @samp{+}
@item \?
matches a @samp{?}.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored.  Within square brackets, @samp{\} is taken literally.  Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.  

The alternation operator is @samp{\|}. 

The character @samp{^} only represents the beginning of a string when it appears:
@enumerate

@item 
At the beginning of a regular expression

@item After an open-group, signified by 
@samp{\(}

@item After the alternation operator @samp{\|}

@end enumerate


The character @samp{$} only represents the end of a string when it appears:
@enumerate

@item At the end of a regular expression

@item Before a close-group, signified by 
@samp{\)}
@item Before the alternation operator @samp{\|}

@end enumerate


@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
@enumerate

@item At the beginning of a regular expression

@item After an open-group, signified by 
@samp{\(}
@item After the alternation operator @samp{\|}

@end enumerate




The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node gnu-awk regular expression syntax
@subsection @samp{gnu-awk} regular expression syntax


The character @samp{.} matches any single character.  


@table @samp

@item +
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item ?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item \+
matches a @samp{+}
@item \?
matches a @samp{?}.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} can be used to quote the following character.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.  

The alternation operator is @samp{|}.  

The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.  

@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
@enumerate

@item At the beginning of a regular expression

@item After an open-group, signified by 
@samp{(}
@item After the alternation operator @samp{|}

@end enumerate




The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node grep regular expression syntax
@subsection @samp{grep} regular expression syntax


The character @samp{.} matches any single character except newline.  


@table @samp

@item \+
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item \?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item + and ? 
match themselves.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  Non-matching lists @samp{[^@dots{}]} do not ever match newline.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.  

The alternation operator is @samp{\|}. 

The character @samp{^} only represents the beginning of a string when it appears:
@enumerate

@item 
At the beginning of a regular expression

@item After an open-group, signified by 
@samp{\(}

@item After a newline

@item After the alternation operator @samp{\|}

@end enumerate


The character @samp{$} only represents the end of a string when it appears:
@enumerate

@item At the end of a regular expression

@item Before a close-group, signified by 
@samp{\)}
@item Before a newline

@item Before the alternation operator @samp{\|}

@end enumerate


@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
@enumerate

@item At the beginning of a regular expression

@item After an open-group, signified by 
@samp{\(}
@item After a newline

@item After the alternation operator @samp{\|}

@end enumerate


Intervals are specified by @samp{\@{} and @samp{\@}}.  Invalid intervals such as @samp{a\@{1z} are not accepted.  

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node posix-awk regular expression syntax
@subsection @samp{posix-awk} regular expression syntax


The character @samp{.} matches any single character except the null character.  


@table @samp

@item +
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item ?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item \+
matches a @samp{+}
@item \?
matches a @samp{?}.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} can be used to quote the following character.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  

GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.  

Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.  

The alternation operator is @samp{|}.  

The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.  

@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
@enumerate

@item At the beginning of a regular expression

@item After an open-group, signified by 
@samp{(}
@item After the alternation operator @samp{|}

@end enumerate


Intervals are specified by @samp{@{} and @samp{@}}.  Invalid intervals such as @samp{a@{1z} are not accepted.  

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node posix-basic regular expression syntax
@subsection @samp{posix-basic} regular expression syntax
This is a synonym for ed.
@node posix-egrep regular expression syntax
@subsection @samp{posix-egrep} regular expression syntax


The character @samp{.} matches any single character except newline.  


@table @samp

@item +
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item ?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item \+
matches a @samp{+}
@item \?
matches a @samp{?}.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  Non-matching lists @samp{[^@dots{}]} do not ever match newline.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with parentheses @samp{()}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.  

The alternation operator is @samp{|}.  

The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.  

The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.  

Intervals are specified by @samp{@{} and @samp{@}}.  Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node posix-extended regular expression syntax
@subsection @samp{posix-extended} regular expression syntax


The character @samp{.} matches any single character except the null character.  


@table @samp

@item +
indicates that the regular expression should match one or more occurrences of the previous atom or regexp.  
@item ?
indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.  
@item \+
matches a @samp{+}
@item \?
matches a @samp{?}.  
@end table


Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with parentheses @samp{()}.  An unmatched @samp{)} matches just itself.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{(}.  

The alternation operator is @samp{|}.  

The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets.  Within brackets, @samp{^} can be used to invert the membership of the character class being specified.  

@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
@enumerate

@item At the beginning of a regular expression

@item After an open-group, signified by 
@samp{(}
@item After the alternation operator @samp{|}

@end enumerate


Intervals are specified by @samp{@{} and @samp{@}}.  Invalid intervals such as @samp{a@{1z} are not accepted.  

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node posix-minimal-basic regular expression syntax
@subsection @samp{posix-minimal-basic} regular expression syntax


The character @samp{.} matches any single character except the null character.  



Bracket expressions are used to match ranges of characters.  Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid.  Within square brackets, @samp{\} is taken literally.  Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.  

GNU extensions are supported:
@enumerate

@item @samp{\w} matches a character within a word

@item @samp{\W} matches a character which is not within a word

@item @samp{\<} matches the beginning of a word

@item @samp{\>} matches the end of a word

@item @samp{\b} matches a word boundary

@item @samp{\B} matches characters which are not a word boundary

@item @samp{\`} matches the beginning of the whole input

@item @samp{\'} matches the end of the whole input

@end enumerate


Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}.  A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number.  For example @samp{\2} matches the second group expression.  The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.  



The character @samp{^} only represents the beginning of a string when it appears:
@enumerate

@item 
At the beginning of a regular expression

@item After an open-group, signified by 
@samp{\(}

@end enumerate


The character @samp{$} only represents the end of a string when it appears:
@enumerate

@item At the end of a regular expression

@item Before a close-group, signified by 
@samp{\)}
@end enumerate




Intervals are specified by @samp{\@{} and @samp{\@}}.  Invalid intervals such as @samp{a\@{1z} are not accepted.  

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.  


@node sed regular expression syntax
@subsection @samp{sed} regular expression syntax
This is a synonym for ed.