Regular Functions

Applies To: WatchGuard Advanced EPDR

Regular functions work separately on each row in an SQL query. With regular functions, the result of the function on a row does not depend on the other rows.

In the Advanced SQL Query tool, regular functions have these characteristics:

  • Strong Typing — Unlike standard SQL, WatchGuard Endpoint Security does not make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you must use type conversion functions.
  • Common Subexpression Elimination — All expressions in a query that have the same AST (the same record or same result of syntactic parsing) are considered to have identical values. These expressions are concatenated and executed once. Identical subqueries are also eliminated this way.
  • Types of Results — All functions return a single value as the result (not multiple values, and not zero values). The type of result is usually defined only by the types of arguments, not by the values.
  • Constants — For simplicity, some functions can only work with constants for some arguments. For example, the right argument of the LIKE operator must be a constant. Almost all functions return a constant for constant arguments. The exception is functions that generate random numbers. The now function returns different values for queries that were run at different times. The result is considered a constant, because constancy is only important within a single query. A constant expression is also considered a constant (for example, the right half of the LIKE operator can be constructed from multiple constants).
  • NULL Processing — If at least one of the arguments of the function is NULL, the function result is also NULL, except in functions where it is specified otherwise.
  • Constancy — Functions cannot change the values of their arguments. Any changes are returned as the result. Therefore, the result of calculating separate functions does not depend on the order in which the functions are written in the query.
  • Error Handling — Some functions can generate an exception if the data is invalid. In this case, the query is canceled and returns an error message to the client.
  • Argument Evaluation — In almost all programming languages, some arguments might not be evaluated with some operators, such as &&, ||, and ?:. In the Advanced SQL Query tool, the arguments of functions (operators) are always evaluated. This is because whole parts of the columns are evaluated at the same time instead of calculating each row separately.

These are the most important functions:

Arithmetic Functions

For all arithmetic functions, the result type is calculated as the smallest number type that the result fits in, if there is such a type. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it is a floating-point number. If there are not enough bits, the highest bit type is taken.

Arithmetic functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.

Function Description

plus(a, b)

a + b operator

Calculates the sum of two numbers. You can also add integer numbers with a date (Date) or date and time (DateTime). In the case of a date, when you add an integer, it corresponds to the number of added days. For a date with time, when you add an integer, it corresponds to the number of added seconds.

minus(a, b)

a - b operator

Calculates the difference between two numbers. The result is always signed. You can also calculate integer numbers from a date (Date) or date with time (DateTime).

divide (a, b)

a / b operator

Calculates the quotient of the numbers. The result type is always a floating-point type. It is not integer division. For integer division, use the intDiv function. When you divide by zero, you get inf, -inf, or nan.

intDiv (a, b)

Calculates the quotient of the numbers. Divides into integers and rounds down (by the absolute value). An exception is thrown when it divides by zero or when it divides a minimal negative number by minus one.

intDivOrZero (a, b)

Differs from intDiv because it returns zero when divided by zero or when a minimal negative number is divided by minus one.

module (a, b)

a % b operator

Calculates the remainder after division. If arguments are floating-point numbers, they are pre-converted to integers. The remainder is taken in the same sense as C++. Truncated division is used for negative numbers. An exception is thrown when divided by zero or when a minimal negative number is divided by minus one.

negate(a)

-a operator

Calculates a number with the reverse sign. The result is always signed.

abs (a)

Calculates the absolute value of the number (a). That is, if a < 0, it returns -a. For unsigned types, it does not do anything. For signed integer types, it returns an unsigned number.

gcd (a, b)

Returns the greatest common divisor of the numbers. An exception is generated when divided by zero or when a minimal negative number is divided by minus one.

lcm(a, b)

Returns the least common multiple of the numbers. An exception is generated when divided by zero or when a minimal negative number is divided by minus one.

Comparison Functions

Comparison functions always return 0 or 1 (Uint8). These types can be compared:

  • Numbers
  • Character strings (String) and fixed-length character strings (FixedString(N))
  • Dates (Date)
  • Dates with times (DateTime)

For example, you cannot compare a date with a string. You must use a function to convert the string to a date, or vice versa.

Strings are compared by bytes. A shorter string is smaller than all strings that start with it and that contain at least one more character.

The comparison operators are:

  • Equalsa = b and a == b
  • NotEqualsa != b and a <> b
  • Lessa < b
  • Greatera > b
  • LessOrEqualsa <= b
  • GreaterOrEqualsa >= b

Logical Functions

Logical functions accept any numeric types, but return a UInt8 number equal to 0 or 1.

Zero as an argument is considered "false," while any non-zero value is considered "true".

  • ANDAND

  • OROR

  • NotNOT

  • XorXOR

Type Conversion Functions

The basic supported conversions are:

  • Conversion to unsigned data types — toUInt8, toUInt16, toUInt32, toUInt64

  • Conversion to signed data types — toInt8, toInt16, toInt32, toInt64, toFloat32, toFloat64, toDate, toDateTime

  • Conversion to zero if error — toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero, toDateOrZero, toDateTimeOrZero

  • Conversion to null if error — toUInt8OrNull, toUInt16OrNull, toUInt32OrNull, toUInt64OrNull, toInt8OrNull, toInt16OrNull, toInt32OrNull, toInt64OrNull, toFloat32OrNull, toFloat64OrNull, toDateOrNull, toDateTimeOrNull

These are more complex data type conversions:

Function Description

toDecimal32(value, S), toDecimal64(value, S), toDecimal128(value, S)

This function converts value to decimal of precision S. The value can be a number or a string. The S parameter specifies the number of decimal places.

toString

This function converts numbers, strings (but not fixed strings), dates (Dates), and dates with times (DateTime). All these functions accept one argument.

When the function converts to or from a string, the value is formatted or parsed with the same rules as for the tab-separated format (TSV). If the string cannot be parsed, an exception is thrown and the request is canceled.

When the function converts dates to numbers or vice versa, the date corresponds to the number of days since the beginning of the Unix time (1/1/1970).

These are the date and date-with-time formats for the toDate/toDateTime functions:

YYYY-MM-DD

YYYY-MM-DD hh: mm: ss

As an exception, when the function converts from UInt32, Int32, UInt64, or Int64 numeric types to Date, and the number is greater than or equal to 65536, then the number is interpreted as a Unix timestamp (and not as the number of days). This supports writing toDate (unix_timestamp), which otherwise would cause an error and require writing toDate (toDateTime (unix_timestamp)).

Conversion between a date (Date) and date with time (DateTime) is performed naturally (that is, adds a null time or deletes the time).

Conversion between numeric types uses the same rules as assignments between different numeric types in C++.

Additionally, the toString function of the DateTime argument can take a second String argument that contains the name of the time zone.

toFixedString(s, N)

This function converts a String type argument to a FixedString(N) type (a string with fixed length N). N must be a constant. If the string has fewer bytes than N, it is passed with null bytes to the right. If the string has more bytes than N, an exception is thrown.

toStringCutToZero(s)

This function accepts a String or FixedString argument. It returns the String with the content truncated at the first zero byte found.

reinterpretAsUInt8, reinterpretAsUInt16, reinterpretAsUInt32, reinterpretAsUInt64

reinterpretAsInt8, reinterpretAsInt16, reinterpretAsInt32, reinterpretAsInt64

reinterpretAsFloat32, reinterpretAsFloat64

reinterpretAsDate, reinterpretAsDateTime

These functions accept a string and interpret the bytes placed at the beginning of the string as a number (little endian). If the string is not long enough, the functions work as if the string is padded with the necessary number of null bytes. If the string is longer than needed, the extra bytes are ignored. A date (Date) is interpreted as the number of days since the beginning of Unix time (1/1/1970), and a date with time (DateTime) is interpreted as the number of seconds since the beginning of Unix time.

reinterpretAsString

This function accepts a number or date (Date) or date with time (DateTime), and returns a string containing bytes that represent the corresponding value in little endian format. Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.

reinterpretAsFixedString

This function accepts a number, date (Date) or date with time (DateTime), and returns a FixedString that contains bytes that represent a corresponding value in little endian format. Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.

CAST(x, t)

This function converts x to the t data type.

toIntervalYear, toIntervalQuarter, toIntervalMonth, toIntervalWeek, toIntervalDay, toIntervalHour, toIntervalMinute, toIntervalSecond

This function converts a Number type argument to an Interval type (duration). The Interval type is very useful because you can use this type of data to perform arithmetic operations directly with Date or DateTime.

parseDateTimeBestEffort

This function parses a Number type argument to a Date or DateTime type. Unlike toDate and toDateTime, parseDateTimeBestEffort can return a more complex date format.

parseDateTimeBestEffortOrNull

This function parses a Number type argument to a Date or DateTime type. Unlike toDate and toDateTime, parseDateTimeBestEffort can return a more complex date format. Also, it returns null when it encounters a date format that cannot be processed.

parseDateTimeBestEffortOrZero

This function parses a Number type argument to a Date or DateTime type. Unlike toDate and toDateTime, parseDateTimeBestEffort can return a more complex date format. Also, it returns zero date or zero date time when it encounters a date format that cannot be processed.

Functions to Work with Dates and Times

All functions that work with the date and time and have a logical use for the time zone can accept a second optional time zone argument. Only time zones that differ from UTC by a whole number of hours are supported.

Function Description

toTimeZone

Converts time (Date) or date and time (DateTime) to the specified time zone.

toYear

Converts a date (Date) or date with time (DateTime) to a UInt16 number with the year number.

toQuarter

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the quarter number.

toMonth

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the month number (1-12).

toDayOfYear

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the number of the day of the year (1-366).

toDayOfMonth

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the number of the day of the month (1-31).

toDayOfWeek

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the number of the day of the week (Monday is 1, and Sunday is 7).

toHour

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the number of the hour in 24-hour time (0-23). This function assumes that when clocks move ahead for daylight saving time, it is by one hour and occurs at 2 AM. If clocks move back, it is by one hour and occurs at 3 AM.

toMinute

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the number of the minute of the hour (0-59).

toSecond

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the number of the second in the minute (0-59). Leap seconds are not accounted for.

toUnixTimestamp

Converts a date (Date) or date with time (DateTime) to a Unix time stamp.

toStartOfYear

Rounds down a date (Date) or date with time (DateTime) to the first day of the year. It returns the date (Date).

toStartOfISOYear

Rounds down a date (Date) or date with time (DateTime) to the first day of ISO year. It returns the date (Date).

toStartOfQuarter

Rounds down a date (Date) or date with time (DateTime) to the first day of the quarter (1 January, 1 April, 1 July, or 1 October). It returns the date (Date).

toStartOfMonth

Rounds down a date (Date) or date with time (DateTime) to the first day of the month. It returns the date (Date).

toMonday

Rounds down a date (Date) or date with time (DateTime) to the nearest Monday. It returns the date (Date).

toStartOfDay

Rounds down a date (Date) or date with time (DateTime) to the start of the day.

toStartOfHour

Rounds down a date (Date) or date with time (DateTime) to the start of the hour.

toStartOfMinute

Rounds down a date (Date) or date with time (DateTime) to the start of the minute.

toStartOfFiveMinute

Rounds down a date (Date) or date with time (DateTime) to the start of the five-minute interval.

toStartOfTenMinutes

Rounds down a date (Date) or date with time (DateTime) to the start of the ten-minute interval.

toStartOfFifteenMinutes

Rounds down a date (Date) or date with time (DateTime) to the start of the fifteen-minute interval.

toTime

Converts a date (Date) or date with time (DateTime) to a certain fixed date, and preserves the time.

toRelativeYearNum

Converts a date (Date) or date with time (DateTime) to the number of the year, from a certain fixed point in the past.

toRelativeQuarterNum

Converts a date (Date) or date with time (DateTime) to the number of the quarter, from a certain fixed point in the past.

toRelativeMonthNum

Converts a date (Date) or date with time (DateTime) to the number of the month, from a certain fixed point in the past.

toRelativeWeekNum

Converts a date (Date) or date with time (DateTime) to the number of the week, from a certain fixed point in the past.

toRelativeDayNum

Converts a date (Date) or date with time (DateTime) to the number of the day, from a certain fixed point in the past.

toRelativeHourNum

Converts a date (Date) or date with time (DateTime) to the number of the hour, from a certain fixed point in the past.

toRelativeMinuteNum

Converts a date (Date) or date with time (DateTime) to the number of the minute, starting from a certain fixed point in the past.

toRelativeSecondNum

Converts a date (Date) or date with time (DateTime) to the number of the second, from a certain fixed point in the past.

toISOYear

Converts a date (Date) or date with time (DateTime) to a UInt16 number with the ISO year number.

toISOWeek

Converts a date (Date) or date with time (DateTime) to a UInt8 number with the ISO week number.

now

Accepts zero arguments. It returns the current time at the time of function execution. This function returns a constant.

today

Accepts zero arguments. It returns the current date at the time of function execution. The same as toDate (now ()).

yesterday

Accepts zero arguments. It returns yesterday's date at the time of function execution. The same as today () - 1.

timeSlot

Rounds the time to the half hour.

toYYYYMM

Converts a date (Date) or date with time (DateTime) to a UInt32 number with the year and month number (YYYY * 100 + MM).

toYYYYMMDD

Converts a date (Date) or date with time (DateTime) to a UInt32 number with the year and month number (YYYY * 10000 + MM * 100 + DD).

toYYYYMMDDhhmmss

Converts a date (Date) or date with time (DateTime) to a UInt64 number with the year and month number (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).

addYears, addMonths, addWeeks, addDays, addHours, addMinutes, addSeconds, addQuarters

This function adds a Date/DateTime interval to a Date/DateTime variable and then returns the Date/DateTime.

subtractYears, subtractMonths, subtractWeeks, subtractDays, subtractHours, subtractMinutes, subtractSeconds, subtractQuarters

This function subtracts a Date/DateTime interval to a Date/DateTime variable and then returns the Date/DateTime.

dateDiff('unit', t1, t2, [timezone])

Returns the difference between two times expressed in unit, for example hours. t1 and t2 can be Date or DateTime. If timezone is specified, it is applied to both arguments. If not, time zones from data types t1 and t2 are used. If the time zones are not the same, the result is unspecified.

Supported unit values: second, minute, hour, day, week, month, quarter, year.

formatDateTime(Time, Format[, Timezone])

This function formats Time according to the format specified in the Format string. For more information about the format parameters, go to %%.

Format Codes for the formatDateTime Function

This table describes the format codes for the formatDateTime function with examples:

Code Description Example

%C

Year divided by 100 and truncated to integer (00-99).

20

%d

Day of the month, zero-padded (01-31).

02

%D

Short MM/DD/YY date, equivalent to %m/%d/%y.

01/07/2023

%e

Day of the month, space-padded ( 1-31)

2

%F

Short YYYY-MM-DD date, equivalent to %Y-%m-%d.

2023-01-07

%H

Hour in 24h format (00-23).

22

%I

Hour in 12h format (01-12).

10

%j

Day of the year (001-366).

007

%m

Month as a decimal number (01-12).

01

%M

Minute (00-59).

33

%n

New-line character. '\n'.

 

%p

AM or PM.

PM

%R

24-hour HH:MM time, equivalent to %H:%M.

22:33

%S

Second (00-59).

44

%t

Horizontal-tab character (‘\t').

 

%T

ISO 8601 time format (HH:MM:SS), equivalent to %H:%M:%S.

22:33:44

%u

ISO 8601 weekday as number with Monday as 1 (1-7).

2

%V

ISO 8601 week number (01-53).

01

%w

Weekday as a decimal number with Sunday as 0 (0-6).

2

%y

Year, last two digits (00-99).

23

%Y

Year

2023

%%

A % sign.

 

Functions to Work with Strings

This table describes the Strings functions:

Function Description

empty

Returns 1 for an empty string or 0 for a non-empty string. The result type is UInt8. A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.

notEmpty

Returns 0 for an empty string or 1 for a non-empty string. The result type is UInt8. The function also works for arrays.

length

Returns the length of a string in bytes (not in characters, and not in code points). The result type is UInt64.

lengthUTF8

Returns the length of a string in Unicode code points (not in characters), when the string contains a set of bytes that make up UTF-8 encoded text. If the string does not meet the UTF-8 requirement, it does not generate an exception. The result type is UInt64.

char_length, CHAR_LENGTH

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not generate an exception. The result type is UInt64.

character_length, CHARACTER_LENGTH

Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it does not generate an exception. The result type is UInt64.

lower, lcase

Converts ASCII Latin symbols in a string to lowercase.

upper, ucase

Converts ASCII Latin symbols in a string to uppercase.

lowerUTF8

Converts a string to lowercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. If the length of the UTF-8 byte sequence is different for upper and lowercase of a code point, the result could be incorrect. If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.

upperUTF8

Converts a string to uppercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text. It does not detect the language. If the length of the UTF-8 byte sequence is different for upper and lowercase of a code point, the result could be incorrect. If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.

isValidUTF8

Returns 1, if the set of bytes constitutes valid UTF-8-encoded text, otherwise it returns 0.

reverse

Reverses the string (as a sequence of bytes).

reverseUTF8

Reverses a sequence of Unicode code points, assuming that the string contains a set of bytes representing a UTF-8 text. Otherwise, it does not throw an exception.

concat(s1, s2, ...)

Concatenates the strings listed in the arguments, without a separator.

concatAssumeInjective(s1, s2, ...)

Same as concat. The difference is that you must make sure that concat (s1, s2, s3) -> s4 is injective. It is used to optimize the GROUP BY clause.

substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)

Returns a substring that starts with the byte from the offset index that is length bytes long. Character indexing starts from one (as in standard SQL). The offset and length arguments must be constants.

substringUTF8(s, offset, length)

The same as substring, but for Unicode code points. It works under the assumption that the string contains a set of bytes that represent a UTF-8 encoded text. If this assumption is not met, it does not generate an exception.

appendTrailingCharIfAbsent(s, c)

If the s string contains characters, but does not contain the c character at the end, it appends the c character to the end.

convertCharset(s, from, to)

Returns the string s that was converted from the encoding in from to the encoding in to.

base64Encode(s)

Encodes the s string into base64.

base64Decode(s)

Decodes the base64-encoded string s into its original string. In case of failure, it generates an exception.

tryBase64Decode(s)

Decodes the base64-encoded string s into its original string. In case of failure, it returns an empty string.

endsWith(s, suffix)

Returns 1 if the string ends with the specified suffix, otherwise it returns 0.

startsWith(s, prefix)

Returns 1 if the string starts with the specified prefix, otherwise it returns 0.

trimLeft(s)

Returns a string that removes the whitespace characters on the left side.

trimRight(s)

Returns a string that removes the whitespace characters on the right side.

trimBoth(s)

Returns a string that removes the whitespace characters on both sides.

Functions to Search Strings

Search is case-sensitive by default in all these functions. There are separate variants for case insensitive search. This table describes the search functions:

Function Description

position(haystack, needle), locate(haystack, needle)

Searches for the substring needle in the string haystack. It returns the position (in bytes) of the found substring, starting from 1, or returns 0 if the substring was not found.

For a case-insensitive search, use the function positionCaseInsensitive.

positionUTF8(haystack, needle)

The same as position, but the position is returned in Unicode code points when the string contains a set of bytes that represent a UTF-8 encoded text. If this requirement is not met, it does not generate an exception.

For a case-insensitive search, use the function positionCaseInsensitiveUTF8.

multiSearchFirstPosition(haystack, [needle1, needle2, ..., needlen])

The same as position but returns the leftmost offset of the string haystack that matches some of the needle.

For a case-insensitive search and a search in UTF-8 format, use these functions: multiSearchFirstPositionCaseInsensitive

multiSearchFirstPositionUTF8

multiSearchFirstPositionCaseInsensitiveUTF8

multiSearchFirstIndex(haystack, [needle1, needle2, ..., needlen])

Returns the index i (starting from 1) of the leftmost found needle in the string haystack and 0 otherwise.

For a case-insensitive search and a search in UTF-8 format, use these functions: multiSearchFirstIndexCaseInsensitive multiSearchFirstIndexUTF8 multiSearchFirstIndexCaseInsensitiveUTF8

multiSearchAny(haystack, [needle1, needle2, ..., needlen])

Returns 1, if at least one string needle matches the string haystack and 0 otherwise.

For a case-insensitive search or/and in UTF-8 format, use these functions: multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8.

In all multiSearch * functions, the number of needle parameters must be fewer than 28.

match(haystack, pattern)

Checks whether the string matches the pattern regular expression. The syntax of the re2 regular expressions is more limited than the syntax of the Perl regular expressions.

Returns 0 if it does not match, or returns 1 if it matches.

The backslash symbol (\) is used to escape the regular expression. The same symbol is used to escape string literals. To escape the symbol in a regular expression, you must write two backslashes (\\) in a string literal.

The regular expression works with the string as if it were a set of bytes. The regular expression cannot contain null bytes. For patterns to search for substrings in a string, it is better to use LIKE or position because they work faster.

multiMatchAny(haystack, [pattern1, pattern2, ..., patternn])

Checks whether the string matches the pattern regular expression. It returns 0 if none of the regular expressions match and 1 if any of the patterns match. For patterns to search substrings in a string, it is better to use multiSearchAny because it works faster.

The length of any of the haystack strings must be less than 232 bytes, otherwise an exception is generated.

multiMatchAnyIndex(haystack, [pattern1, pattern2, ..., patternn])

Checks whether the string matches the pattern regular expression. It returns 0 if none of the regular expressions match and 1 if any of the patterns match. It returns any index that matches the haystack.

multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, ..., patternn])

The same as multiMatchAny, but it returns 1 if any pattern matches the haystack within a constant edit distance. This function is in experimental mode and can be veryslow.

multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, ..., patternn])

The same as multiFuzzyMatchAny, but it returns any index that matches the haystack within a constant edit distance.

multiFuzzyMatch * functions do not support UTF-8 regular expressions. They are treated as bytes.

extract(haystack, pattern)

Extracts a fragment of a string using a regular expression. If haystack does not match the pattern regex, an empty string is returned. If the regex does not contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.

like(haystack, pattern), haystack LIKE pattern operator

Checks whether a string matches a simple regular expression. The regular expression can contain theses symbols:

  • ‘%’ — Indicates any quantity of any bytes (including zero characters).

  • ‘_’ — Indicates any one byte.

Use the backslash (\) to escape metasymbols.

The backslash symbol (\) is used to escape the regular expression. The same symbol is used to escape string literals. To escape the symbol in a regular expression, you must write two backslashes (\\) in a string literal.

For regular expressions such as %needle%, the code is more optimal and works as fast as the position function. For other regular expressions, the code is the same as for the match function.

notLike(haystack, pattern), haystack NOT LIKE pattern operator

Filter records that do not match a specified pattern. The regular expression can contain theses symbols:

  • ‘%’ — Indicates any quantity of any bytes (including zero characters).

  • ‘_’ — Indicates any one byte.

Use the backslash (\) to escape metasymbols.

Functions to Replace Strings

This table describes the functions to replace strings:

Function Description

replaceOne(haystack, pattern, replacement)

Replaces the first occurrence, if it exists, of the pattern substring in haystack with the replacement substring. The pattern and replacement must be constants.

replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)

Replaces all occurrences of the pattern substring in haystack with the replacement substring.

replaceRegexpOne(haystack, pattern, replacement)

Replacement using the pattern regular expression. A re2 regular expression. It replaces only the first occurrence, if it exists. A pattern can be specified as replacement. This pattern can include substitutions \0 - \9. The substitution \0 includes the entire regular expression. Substitutions \1 - \9 correspond to the subpattern numbers. To use the \ character in a template, use another \ to escape it. Remember that a string literal requires an extra escape.

For example, this function will copy a string ten times:

SELECT replaceRegexpOne ('Hello, World!', '. *', '\\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0) AS res

Result:

Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! Hello, World! ¦

replaceRegexpAll(haystack, pattern, replacement)

Replacement using the pattern regular expression. It replaces all occurrences.

Example:

SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res

HHeelllloo,, WWoorrlldd!!

If a regular expression worked on an empty substring, the replacement is not made more than once.

Example:

SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res

here: Hello, World!

regexpQuoteMeta(s)

The function adds a backslash before some predefined characters in the string. Predefined characters: '0', '\', '|', '(', ')', '^', '$', '.', '[', ']', '?', '*' , '+', '{', ':', '-'.

This implementation is different from re2. It escapes zero byte as \0 instead of \x00, and escapes only required characters.

Mathematical Functions

All these functions return a Float64 number. The accuracy of the result is close to the maximum precision possible, but it might not match the machine representable number nearest to the corresponding real number.

Function Description

e()

Returns a Float64 number that is close to the number e.

pi()

Returns a Float64 number that is close to the number π.

exp(x)

Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.

log(x), ln(x)

Accepts a numeric argument and returns a Float64 number close to the natural logarithm of the argument.

exp2(x)

Accepts a numeric argument and returns a Float64 number close to 2 to the power of x.

log2(x)

Accepts a numeric argument and returns a Float64 number close to the binary logarithm of the argument.

exp10(x)

Accepts a numeric argument and returns a Float64 number close to 10 to the power of x.

log10(x)

Accepts a numeric argument and returns a Float64 number close to the decimal logarithm of the argument.

sqrt(x)

Accepts a numeric argument and returns a Float64 number close to the square root of the argument.

cbrt(x)

Accepts a numeric argument and returns a Float64 number close to the cubic root of the argument.

erf(x)

If x is non-negative, then erf is the probability that a random variable having a normal distribution with standard deviation s takes the value that is separated from the expected value by more than x.

erfc(x)

Accepts a numeric argument and returns a Float64 number close to 1 - erf (x), but without loss of precision for large x values.

lgamma(x)

The logarithm of the gamma function.

tgamma(x)

Gamma function.

sin(x)

Sine.

cos(x)

Cosine.

tan(x)

Tangent.

asin(x)

Arc sine.

acos(x)

Arc cosine.

atan(x)

Arc tangent.

pow(x, y), power(x, y)

Takes two numeric arguments x and y and returns a Float64 number close to x to the power of y.

intExp2

Accepts a numeric argument and returns a UInt64 number close to 2 to the power of x.

intExp10

Accepts a numeric argument and returns a UInt64 number close to 10 to the power of x.

Rounding Functions

Function Description

floor(x[, N])

Returns the largest round number that is less than or equal to x. A round number is a multiple of 1/10N, or the nearest number of the appropriate data type if 1 / 10N is not exact. N is an integer constant, optional parameter. By default, it is zero. This rounds the number to an integer. N can be negative.

Examples:

floor (123.45, 1) = 123.4

floor (123.45, -1) = 120.

x is any numeric type. The result is a number of the same type. For integer arguments, it makes sense to round with a negative N value. For a non-negative N, the function does not do anything.

ceil(x[, N]), ceiling(x[, N])

Returns the smallest round number that is greater than or equal to x. In every other way, it is the same as the floor function.

round(x[, N])

Rounds x to a specified number of decimal places (N). It rounds to the nearest even integer. When the given number has equal distance to surrounding numbers, the function returns the number that has the closest even digit.

Parameters:

  • x — The number to round. It can be any expression returning the numeric data type.
  • N — An integer value.
    • If N > 0 then the function rounds the value to the right of the decimal point.
    • If N < 0 then the function rounds the value to the left of the decimal point.
    • If N = 0 then the function rounds the value to integer. In this case, the argument can be omitted.

The returned value is the rounded number of the same type as the input number.

roundToExp2(num)

Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to the nearest (whole non-negative) degree of two.

Random Number Generation Functions

Non-cryptographic generators of pseudo-random numbers are used. All the functions accept zero arguments or one argument. If an argument is passed, it can be any type, and its value is not used for anything. The purpose of this argument is to prevent common subexpression elimination, so that two different instances of the same function return different columns with different random numbers.

Function Description

rand

Returns a pseudo-random UInt32 number, evenly distributed among all UInt32-type numbers. It uses a linear congruential generator.

rand64

Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers. It uses a linear congruential generator.

randConstant

Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.

Encoding Functions

This table describes encoding functions:

Function Description

hex

Accepts arguments of types: String, UInt, Date, or DateTime. It returns a string containing the hexadecimal representation of the argument. It uses uppercase letters A-F. It does not use 0x prefixes or h suffixes. For character strings, all bytes are simply encoded as two hexadecimal numbers. Numbers are converted to big endian format. Date is encoded as the number of days since the beginning of Unix time. DateTime is encoded as the number of seconds since the beginning of Unix time.

unhex(str)

Accepts a string containing any number of hexadecimal digits, and returns a string that contains the corresponding bytes. It supports both uppercase and lowercase letters A-F. The number of hexadecimal digits does not have to be even. If it is odd, the last digit is interpreted as the younger half of the 00-0F byte. If the argument string contains anything other than hexadecimal digits, an exception is not generated. If you want to convert the result to a number, you can use the reverse and reinterpretAsType functions.

bitmaskToList(num)

Accepts an integer. It returns a string that contains the list of powers of two that total the source number when summed. They are comma-separated without spaces in text format, in ascending order.

Functions to Work with URLs

This table describes functions that work with URLs:

Function Description

protocol

Returns the protocol (for example, http, ftp, mailto, imap).

domain

Returns the domain.

domainWithoutWWW

Returns the domain and removes no more than one 'www.' from the beginning of it, if present.

topLevelDomain

Returns the top-level domain. For example, .com.

firstSignificantSubdomain

Returns the first significant subdomain.

  • The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'.
  • Otherwise, it is a third-level domain.

cutToFirstSignificantSubdomain

Returns the part of the domain that includes top-level subdomains up to the first significant subdomain.

path

Returns the path. The path does not include the query string.

pathFull

Returns the path that includes query string and fragment.

Example: /top/news.html?page=2#comments

queryString

Returns the query string. The query string does not include these:

  • Initial question mark (?),
  • #
  • Anything after #.

fragmenttext

Returns the fragment identifier. It does not include the initial hash # symbol.

queryStringAndFragment

Returns the query string and fragment identifier.

extractURLParameter(URL, name)

Returns the value of the name parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function assumes that the parameter name is encoded in the URL exactly the same way as in the passed argument.

extractURLParameters(URL)

Returns an array of name=value strings that correspond to the URL parameters. The values are not decoded in any way.

extractURLParameterNames(URL)

Returns an array of name strings that correspond to the names of URL parameters. The values are not decoded in any way.

URLHierarchy(URL)

Returns an array that contains the URL, truncated at the end by the symbols /,? in the path and querystring. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.

Example:

URLPathHierarchy('https://example.com/browse/CONV-6788') =
[
'/browse/',
'/browse/CONV-6788'
]

URLPathHierarchy(URL)

Returns the same result as URLHierarchy(URL), but without the protocol and host in the result. The /’ element (root) is not included.

decodeURLComponent(URL)

Returns the decoded URL.

cutWWW

Removes no more than one 'www.' from the beginning of the URL domain, if present.

cutQueryString

Removes the query string. The question mark is also removed.

cutFragment

Removes the fragment identifier. The number sign is also removed.

cutQueryStringAndFragment

Removes the query string and fragment identifier. The question mark and number sign are also removed.

cutURLParameter(URL, name)

Removes the 'name' URL parameter, if present. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.

Functions to Work with IP Addresses

This table describes functions that work with IP addresses:

Function Description

IPv4NumToString (num)

Takes a UInt32 number and interprets it as an IPv4 address in big endian. It returns a string that contains the corresponding IPv4 address in the format A.B.C.D (dot-separated numbers in decimal form).

IPv4StringToNum(s)

The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it returns 0.

IPv4NumToStringClassC(num)

Similar to IPv4NumToString, but uses xxx instead of the last octet.

IPv6NumToString(x)

Accepts a FixedString(16) value that contains the IPv6 address in binary format. It returns a string that contains this address in text format. IPv6-mapped IPv4 addresses are output in the format :: ffff: 111.222.33.44.

IPv6StringToNum(s)

The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it returns a string of null bytes. HEX can be uppercase or lowercase.

IPv4ToIPv6(x)

Takes a UInt32 number and interprets it as an IPv4 address in big endian. It returns a FixedString(16) value containing the IPv6 address in binary format.

cutIPv6(x, bitsToCutForIPv6, bitsToCutForIPv4)

Accepts a FixedString(16) value containing the IPv6 address in binary format. It returns a string that contains the address of the specified number of bits removed in text format.

IPv4CIDRtoIPv4Range(ipv4, cidr)

Accepts an IPv4 and an UInt8 value containing the CIDR. It returns a tuple with two IPv4 values that contains the lower range and the higher range of the subnet.

IPv6CIDRtoIPv6Range(ipv6, cidr)

Accepts an IPv6 and an UInt8 value containing the CIDR. It returns a tuple with two IPv6 values that contains the lower range and the higher range of the subnet.

Functions to Work with Nullable Arguments

Function Description

isNull(x)

Checks whether the argument is NULL.

Parameters:

  • x — A value with a non-compound data type.

Returned value:

  • 1 — If x is NULL.
  • 0 — If x is not NULL.

isNotNull (x)

Parameters:

  • x — A value with a non-compound data type.

Returned value:

  • 0 — If x is NULL.
  • 1 — If x is not NULL.

coalesce(x,...)

  • Parameters — Any number of parameters of a non-compound type. All parameters must be compatible by data type.
  • Returned values — The first non-NULL argument. NULL, if all arguments are NULL.

ifNull

Returns an alternative value if the main argument is NULL.

ifNull (x, alt)

Parameters:

  • x — The value to check for NULL.
  • alt — The value that the function returns if x is NULL.

Returned values:

  • The value x, if x is not NULL.
  • The value alt, if x is NULL.

nullIf(x,y)

Returns NULL if the arguments are equal.

Parameters:

  • x, y — Values for comparison. They must be compatible types, or the solution generates an exception.

Returned values:

  • NULL, if the arguments are equal.
  • The x value, if the arguments are not equal.

assumeNotNull(x)

Parameters:

  • x — The original value.

Returned values:

  • The original value from the non-Nullable type, if it is not NULL.
  • The default value for the non-Nullable type if the original value was NULL.

toNullable(x)

Converts the argument type to Nullable.

Parameters:

  • x — The value of any non-compound type.

Returned value:

  • The input value with a non-Nullable type.

Related Topics

Advanced Query SQL Syntax

Aggregate Functions

Select Clause Syntax

Supported Data Types