Ordinary Characters
An ordinary character is an RE that matches itself. An ordinary character
is any character in the supported character set except <newline> and the
regular expression special characters listed in Special Characters below.
An ordinary character preceded by a backslash ( \ ) is treated as the ordinary
character itself, except when the character is (, ), {, or }, or the digits
1 through 9 (see REs Matching Multiple Characters). Matching is based on
the bit pattern used for encoding the character; not on the graphic representation
of the character.
Special Characters
A regular expression special character
preceded by a backslash is a regular expression that matches the special
character itself. When not preceded by a backslash, such characters have
special meaning in the specification of REs. Regular expression special
characters and the contexts in which they have special meaning are:
- [ \
- The period, left square bracket, and backslash are special except when used in a bracket expression (see RE Bracket Expression).
- The asterisk is special except when used in a bracket expression,
- as the first character of a regular expression, or as the first character following the character pair \( (see REs Matching Multiple Characters).
- ^
- The circumflex is special when used as the first character of an entire RE (see Expression Anchoring) or as the first character of a bracket expression.
- $
- The dollar sign is special when used as the last character of an entire RE (see Expression Anchoring).
- delimiter
- Any character used to bound (i.e., delimit) an entire RE is special for that RE.
Period
A period ( . ), when used outside of a bracket expression, is an RE that matches any printable or nonprintable character except <newline>.
The following rules apply to bracket expressions:
- bracket expression
- A bracket expression is either a matching list expression or a non-matching list expression, and consists of one or more expressions in any order. Expressions can be: collating elements, collating symbols, noncollating characters, equivalence classes, range expressions, or character classes. The right bracket ( ] ) loses its special meaning and represents itself in a bracket expression if it occurs first in the list (after an initial ^, if any). Otherwise, it terminates the bracket expression (unless it is the ending right bracket for a valid collating symbol, equivalence class, or character class, or it is the collating element within a collating symbol or equivalence class expression). The special characters
- . [ \
- RE RE
- The concatenation of REs is an RE that matches the first encountered concatenation of the strings matched by each component of the RE. For example, the RE bc matches the second and third characters of the string abcdefabcdef.
- An RE matching a single character followed by an asterisk
- () is an RE that matches zero or more occurrences of the RE preceding the asterisk. The first encountered string that permits a match is chosen, and the matched string will encompass the maximum number of characters permitted by the RE. For example, in the string abbbcdeabbbbbbcde, both the RE bc and the RE bbbc are matched by the substring bbbc in the second through fifth positions. An asterisk as the first character of an RE loses this special meaning and is treated as itself.
- \(RE\)
- A subexpression can be defined within an RE by enclosing it between the character pairs \( and \). Such a subexpression matches whatever it would have matched without the \( and \). Subexpressions can be arbitrarily nested. An asterisk immediately following the \( loses its special meaning and is treated as itself. An asterisk immediately following the \) is treated as an invalid character.
- \n
- The expression \n matches the same string of characters as was matched by a subexpression enclosed between \( and \) preceding the \n. The character n must be a digit from 1 through 9, specifying the n-th subexpression (the one that begins with the n-th \( and ends with the corresponding paired \). For example, the expression ^\(.\)\1$ matches a line consisting of two adjacent appearances of the same string.
- If the
- \n is followed by an asterisk, it matches zero or more occurrences of the subexpression referred to. For example, the expression \(ab\(cd\)ef\)Z\2Z\1 matches the string abcdefZcdcdZabcdef.
- An RE matching a single character followed by
- \{m\}, \{m,\}, or \{m,n\} is an RE that matches repeated occurrences of the RE. The values of m and n must be decimal integers in the range 0 through 255, with m specifying the exact or minimum number of occurrences and n specifying the maximum number of occurrences. \{m\} matches exactly m occurrences of the preceding RE, \{m,\} matches at least m occurrences, and \{m,n\} matches any number of occurrences between m and n, inclusive.
- The first encountered string that matches the expression is chosen;
- it will contain as many occurrences of the RE as possible. For example, in the string abbbbbbbc the RE b\{3\} is matched by characters two through four, the RE b\{3,\} is matched by characters two through eight, and the RE b\{3,5\}c is matched by characters four through nine.
- A circumflex
- ( ^ ) as the first character of an RE anchors the expression to the beginning of a line; only strings starting at the first character of a line are matched by the RE. For example, the RE ^ab matches the string ab in the line abcdef, but not the same string in the line cdefab.
- A dollar sign
- ($) as the last character of an RE anchors the expression to the end of a line; only strings ending at the last character of a line are matched by the RE. For example, the RE ab$ matches the string ab in the line cdefab, but not the same string in the line abcdef.
- An RE anchored by both
- ^ and $ matches only strings that are lines. For example, the RE ^abcdef$ matches only lines consisting of the string abcdef.
Ordinary Characters
An
ordinary character is an ERE that matches itself. An ordinary character
is any character in the supported character set except <newline> and the
regular expression special characters listed in Special Characters below.
An ordinary character preceded by a backslash (\) is treated as the ordinary
character itself. Matching is based on the bit pattern used for encoding
the character, not on the graphic representation of the character.
Special
Characters
A regular expression special character preceded by a backslash
is a regular expression that matches the special character itself. When
not preceded by a backslash, such characters have special meaning in the
specification of EREs. The extended regular expression special characters
and the contexts in which they have their special meaning are:
- . [ \ ( ) + ? $ |
- The period, left square bracket, backslash, left parenthesis, right parenthesis, asterisk, plus sign, question mark, dollar sign, and vertical bar are special except when used in a bracket expression (see ERE Bracket Expression).
- ^
- The circumflex is special except when used in a bracket expression in a non-leading position.
- delimiter
- Any character used to bound (i.e., delimit) an entire ERE is special for that ERE.
Period
A period ( . ), when used outside of a bracket expression, is an ERE that matches any printable or nonprintable character except <newline>.
- RE RE
- A concatenation of EREs matches the first encountered concatenation of the strings matched by each component of the ERE. Such a concatenation of EREs enclosed in parentheses matches whatever the concatenation without the parentheses matches. For example, both the ERE bc and the ERE (bc) matches the second and third characters of the string abcdefabcdef. The longest overall string is matched.
- The special character plus
- ( + ), when following an ERE matching a single character, or a concatenation of EREs enclosed in parenthesis, is an ERE that matches one or more occurrences of the ERE preceding the plus sign. The string matched will contain as many occurrences as possible. For example, the ERE b+c matches the fourth through seventh characters in the string acabbbcde.
- The special character asterisk
- ( ), when following an ERE matching a single character, or a concatenation of EREs enclosed in parenthesis, is an ERE that matches zero or more occurrences of the ERE preceding the asterisk. For example, the ERE bc matches the first character in the string cabbbcde. If there is any choice, the longest left-most string that permits a match is chosen. For example, the ERE bcd matches the third through seventh characters in the string cabbbcdebbbbbbcdbc.
- The special character question mark
- ( ? ), when following an ERE matching a single character, or a concatenation of EREs enclosed in parenthesis, is an ERE that matches zero or one occurrences of the ERE preceding the question mark. The string matched will contain as many occurrences as possible. For example, the ERE b?c matches the second character in the string acabbbcde.
- interval expression that functions the same way
- as basic regular expression syntax,
alternation
- [ ]
- square brackets
- + ?
- asterisk, plus sign, question mark
- ^ $
- anchoring
- concatenation
- |
For example, the ERE abba|cde is interpreted as "match either abba or cde. It does not mean "match abb followed by a or c followed in turn by de (because concatenation has a higher order of precedence than alternation).
- A circumflex
- ( ^ ) matches the beginning of a line (anchors the expression to the beginning of a line). For example, the ERE ^ab matches the string ab in the line abcdef, but not the same string in the line cdefab.
- A dollar sign
- ( $ ) matches the end of a line (anchors the expression to the end of a line). For example, the ERE ab$ matches the string ab in the line cdefab, but not the same string in the line abcdef.
- An ERE anchored by both
- ^ and $ matches only strings that are lines. For example, the ERE ^abcdef$ matches only lines consisting of the string abcdef. Only empty lines match the ERE ^$.
Ordinary Characters
An ordinary character
is a pattern that matches itself. An ordinary character is any character
in the supported character set except <newline> and the pattern matching
special characters listed in Special Characters below. Matching is based
on the bit pattern used for encoding the character, not on the graphic
representation of the character.
Special Characters
A pattern matching
special character preceded by a backslash ( \ ) is a pattern that matches
the special character itself. When not preceded by a backslash, such characters
have special meaning in the specification of patterns. The pattern matching
special characters and the contexts in which they have their special meaning
are:
- ? [
- The question mark, asterisk, and left square bracket are special except when used in a bracket expression (see Pattern Bracket Expression).
Question Mark
A question mark ( ? ), when used outside of a bracket expression, is a pattern that matches any printable or nonprintable character except <newline>.
The exclamation point character ( ! ) replaces the circumflex character ( ^ ) in its role in a non-matching list in the regular expression notation.
The backslash is used as an escape character within bracket expressions.
- The asterisk
- ( ) is a pattern that matches any string, including the null string.
- RE RE
- The concatenation of patterns matching a single character is a valid pattern that matches the concatenation of the single characters or collating elements matched by each of the concatenated patterns. For example, the pattern a[bc] matches the string ab and ac.
- The concatenation of one or more patterns matching a single character with
- one or more asterisks is a valid pattern. In such patterns, each asterisk matches a string of zero or more characters, up to the first character that matches the character following the asterisk in the pattern.
- For example, the pattern
- ad matches the strings ad, abd, and abcd; but not the string abc. When an asterisk is the first or last character in a pattern, it matches zero or more characters that precede or follow the characters matched by the remainder of the pattern. For example, the pattern ad matches the strings ad, abcd, abcdef, aaaad, and adddd; the pattern ad matches the strings ad, abcd, efabcd, aaaad, and adddd.
If a filename (including the component of a pathname that follows the slash ( / ) character) begins with a period ( . ), the period must be explicitly matched by using a period as the first character of the pattern; it cannot be matched by either the asterisk special character, the question mark special character, or a bracket expression. This rule does not apply to make(1).
The slash character in a pathname must be explicitly matched by using a slash in the pattern; it cannot be matched by either the asterisk special character, the question mark special character, or a bracket expression. For make(1) only the part of the pathname following the last slash character can be matched by a special character. That is, all special characters preceding the last slash character lose their special meaning.
Specified patterns are matched against existing filenames and pathnames, as appropriate. If the pattern matches any existing filenames or pathnames, the pattern is replaced with those filenames and pathnames, sorted according to the collating sequence in effect. If the pattern does not match any existing filenames or pathnames, the pattern string is left unchanged.
If the pattern begins with a tilde ( ~ ) character, all of the ordinary characters preceding the first slash (or all characters if there is no slash) are treated as a possible login name. If the login name is null (i.e., the pattern contains only the tilde or the tilde is immediately followed by a slash), the tilde is replaced by a pathname of the process's home directory, followed by a slash. Otherwise, the combination of tilde and login name are replaced by a pathname of the home directory associated with the login name, followed by a slash. If the system cannot identify the login name, the result is implementation-defined. This rule does not apply to sh(1) or make(1).
If the pattern contains a $ character, variable substitution can take place. Environmental variables can be embedded within patterns as:
- $name
or:
- ${ name }
Braces are used to guarantee that characters following name are not interpreted as belonging to name. Substitution occurs in the order specified only once; that is, the resulting string is not examined again for new names that occurred because of the substitution.
Multiple alternative patterns in a single clause can be specified by separating individual patterns with the vertical bar character ( | ); strings matching any of the patterns separated this way will cause the corresponding command list to be selected.