Related Topics
About Regular Expressions
A regular expression is a group of letters, numbers, and special characters used to match data. You can use Perl-compatible regular expressions (PCRE) in your Firebox configuration to match certain types of traffic in proxy actions. For example, you can use one regular expression to block connections to some websites and allow connections to other websites. You can also deny SMTP connections when the recipient is not a valid email address for your company. For example, if you want to block parts of a website that violate your company’s Internet use policy, you can use a regular expression in the URL Paths category of the HTTP proxy configuration.
General Guidelines
- Regular expressions in Fireware OS are case-sensitive — When you create a regular expression, you must be careful to match the case of the letters in your regular expression to the letters of the text you want to match. You can change the regular expression to not be case-sensitive when you put the (?i) modifier at the start of a group.
- Regular expressions in Fireware OS are different from MS-DOS and Unix wildcard characters — When you change files with MS-DOS or the Windows Command Prompt, you can use ? or * to match one or more characters in a file name. These simple wildcard characters do not operate the same way in Fireware.
How to Build a Regular Expression
The most simple regular expression is made from the text you want to match. Letters, numbers, and other printable characters all match the same letter, number, or character that you type. A regular expression made from letters and numbers can match only a character sequence that includes all of those letters and numbers in order.
Example: fat matches fat, fatuous, and infatuated, as well as many other sequences.
Fireware OS accepts any character sequence that includes the regular expression. A regular expression frequently matches more than one sequence. If you use a regular expression as the source for a Deny rule, you can block some network traffic by accident. We recommend that you fully test your regular expressions before you save the configuration to your Firebox.
To match different sequences of characters at the same time, you must use a special character. The most common special character is the period (.), which is similar to a wildcard. When you put a period in a regular expression, it matches any character, space, or tab. The period does not match line breaks (\r\n or \n).
Example: f..t matches foot, feet, f&#t, f -t, and f\t3t.
To match a special character, such as the period, you must add a backslash (\) before the character. If you do not add a backslash to the special character, the rule may not operate correctly. It is not necessary to add a second backslash if the character usually has a backslash, such as \t (tab stop).
You must add a backslash to each of these special characters to match the real character: ? . * | + $ \ ^ ( ) [
Example: \$9\.99 matches $9.99
Hexadecimal Characters
To match hexadecimal characters, use \x or %0x%. Hexadecimal characters are not affected by the case-insensitive modifier.
Example: \x66 or %0x66% matches f, but cannot match F.
Repetition
To match a variable amount of characters, you must use a repetition modifier. You can apply the modifier to a single character, or a group of characters. There are four types of repetition modifiers:
- Numbers inside curly braces (such as {2,4}) match as few as the first number, or as many as the second number.
Example: 3{2,4} matches 33, 333, or 3333. It does not match 3 or 33333. - The question mark (?) matches zero or one occurrence of the preceding character, class, or group.
Example: me?et matches met and meet. - The plus sign (+) matches one or more occurrences of the preceding character, class, or group.
Example: me+t matches met, meet, and meeeeeeeeet. - The asterisk (*) matches zero or more occurrences of the preceding character, class, or group.
Example: me*t matches mt, met, meet, and meeeeeeeeet.
To apply modifiers to many characters at once, you must make a group. To group a sequence of characters, put parentheses around the sequence.
Example: ba(na)* matches ba, bana, banana, and banananananana.
Character Classes
To match one character from a group, use square brackets instead of parentheses to create a character class. You can apply repetition modifiers to the character class. The order of the characters inside the class does not matter.
The only special characters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-).
Example: gr[ae]y matches gray and grey.
To use a caret in the character class, do not make it the first character.
To use a hyphen in the character class, make it the first character.
A negated character class matches everything but the specified characters. Type a caret (^) at the beginning of any character class to make it a negated character class.
Example: [Qq][^u] matches Qatar, but not question or Iraq.
Ranges
Character classes are often used with character ranges to select any letter or number. A range is two letters or numbers, separated by a hyphen (-), that mark the start and finish of a character group. Any character in the range can match. If you add a repetition modifier to a character class, the preceding class is repeated.
Example: [1-3][0-9]{2} matches 100 and 399, as well as any number in between.
Some ranges that are used frequently have a shorthand notation. You can use shorthand character classes inside or outside other character classes. A negated shorthand character class matches the opposite of what the shorthand character class matches. The table below includes several common shorthand character classes and their negated values.
ClassEquivalent to | NegatedEquivalent to |
---|---|
\w Any letter or number [A-Za-z0-9] | \W Not a letter or number |
\s Any whitespace character [ \t\r\n] | \S Not whitespace |
\d Any number [0-9] | \D Not a number |
Anchors
To match the beginning or end of a line, you must use an anchor. The caret (^) matches the beginning of a line, and the dollar sign ($) matches the end of a line.
Example: ^am.*$ matches ampere if ampere is the only word on the line. It does not match dame.
You can use \b to match a word boundary, or \B to match any position that is not a word boundary.
There are three kinds of word boundaries:
- Before the first character in the character sequence, if the first character is a word character (\w)•
- After the last character in the character sequence, if the last character is a word character (\w)•
- Between a word character (\w) and a non-word character (\W)
Alternation
You can use alternation to match a single regular expression out of several possible regular expressions. The alternation operator in a regular expression is the pipe character (|). It is similar to the boolean operator OR.
Example: m(oo|a|e)n matches the first occurrence of moon, man, or men.
Common Regular Expressions
Match the PDF content type (MIME type)
^%PDF-
Match any valid IP address
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9] [0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]? [0-9][0-9]?)
Match most email addresses
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b