Select Clause Syntax
Applies To: WatchGuard Advanced EPDR
This topic describes the general syntax of an SQL SELECT statement.
SELECT [DISTINCT] expr_list
[FROM [db.] table | (subquery) | table_function]
[SAMPLE sample_coeff]
[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr_list] [WITH TOTALS]
[HAVING expr]
[ORDER BY expr_list]
[LIMIT [n, ]m]
[UNION ALL...]
[LIMIT n BY columns]
You can use these clauses in an SQL query:
- FROM Clause
- SAMPLE Clause
- JOIN Clause
- WHERE Clause
- PREWHERE Clause
- GROUP BY Clause
- LIMIT N BY Columns Clause
- HAVING Clause
- ORDER BY Clause
- SELECT Clause
- DISTINCT Clause
- LIMIT m Clause
- UNION ALL Clause
- IN Operators
- The Asterisk Symbol (*)
FROM Clause
The FROM clause specifies the source to read data from (for example, a table, subquery, or JOIN clause). You must specify subqueries in parentheses. Unlike standard SQL, you do not need to specify a synonym after a subquery. For compatibility purposes, you can write ‘name AS’ after a subquery, but the specified name will not be used.
SAMPLE Clause
The SAMPLE clause processes the queries for approximated data.
When you use the SAMPLE clause, the query is not performed on all of the data, but only on a percentage of the data (sample). For example, if you have to calculate statistics for a number of events, it is enough to execute the query on 10% of all the events and then multiply the result by 10.
Approximated query processing can be useful in these cases:
- When you want to speed up result collection.
- When your raw data is not accurate, so approximation does not noticeably degrade the quality of results.
These are the features of data sampling:
- It is a deterministic mechanism. The result of the same query is always the same.
- Works consistently for different tables. For tables with a single sampling key, a sample with the same coefficient always selects the same subset of possible data. This means that you can use the sample in subqueries in the IN clause. You can also use the JOIN clause to join samples.
- Reads less data from a disk. You must specify the sampling key correctly.
Syntax for the SAMPLE clause:
Syntax | Description |
---|---|
SAMPLE k |
Number from 0 to 1. The query is run on a k percentage of data. For example, SAMPLE 0.1 runs the query on 10% of data. |
SAMPLE n |
n is a sufficiently large integer. The query is run on a sample of at least n rows (but not significantly more than this). For example, SAMPLE 10000000 runs the query on a minimum of 10,000,000 rows. |
SAMPLE k OFFSET m |
k and m are numbers from 0 to 1. The query is run on a sample of k percentage of the data. The data used for the sample is offset by m percentage. |
JOIN Clause
JOIN indicates a join operation in relational algebra that combines columns from one or more tables and creates a new group that can be stored in a table or be used as is. To create a new table, JOIN uses the common values in multiple tables and then combine the columns.
These are the supported types of the JOIN clause:
- INNER JOIN (or JOIN) — Compares each row in table A with rows in table B to find all pairs of rows that satisfy the join-predicate specified in the ON clause. When the join-predicate matches non-NULL values, column values for each matched pair of rows of tables A and B are combined into a result row.
- LEFT JOIN (or LEFT OUTER JOIN) — Always contains all rows of the "left" table (A), even if the join-condition does not match rows in the "right" table (table B). This means that if the ON clause matches 0 (zero) rows in table B (for a given row in table A), the join will still return a row in the result (for that row)—but with NULL in each column from table B.
- RIGHT JOIN (or RIGHT OUTER JOIN) — Always contains all rows of the "right" table (table B), even if the join-condition does not match rows in the "left" table (table A). This means that if the ON clause matches 0 (zero) rows in table A (for a given row in table B), the join will still return a row in the result (for that row)—but with NULL in each column from table A.
- FULL JOIN (or FULL OUTER JOIN) — Combines the effect of LEFT JOIN and RIGHT JOIN. Where rows in the full outer joined tables do not match, the result set will have NULL values for every column of the table that lacks a matching row. For those rows that do match, a single row is produced in the result set that contains columns populated from both tables.
- CROSS JOIN (or , ) — Returns the Cartesian product of rows from tables in the join. It produces rows which combine each row from the first table with each row from the second table.
- ANY or ALL modifier — If ALL is specified and the right table has several matching rows, the data is multiplied by the number of rows. This is the normal behavior of a JOIN clause in standard SQL. If ANY is specified and the right table has several matching rows, only the first one found is joined. If the right table has only one matching row, the results of ANY and ALL are the same.
WHERE Clause
If there is a WHERE clause, it must contain an expression with the UInt8 type. This is usually an expression with comparison and logical operators. This expression is used to filter data before all the transformations included in the statement.
PREWHERE Clause
This clause has the same meaning as the WHERE clause. The difference is the data that is read from the table. With PREWHERE, the columns necessary to execute the clause are read first. Then, the columns that are needed to run the rest of the query are read when the PREWHERE expression is true.
PREWHERE filters data more efficiently and reads less data from the disk for query execution.
GROUP BY Clause
This clause groups results by one or more columns. For grouping, Endpoint Security interprets NULL as a value.
If the WITH TOTALS modifier is specified, another row is calculated. This row has key columns that contain default values (zeros or empty lines), and columns of aggregate functions with the values calculated across all the rows (the Total values). This extra row is only produced in JSON, TabSeparated, and Pretty formats. In JSON formats, this row is output as a separate Totals field. In TabSeparated formats, the row comes after the main result, preceded by an empty row. In Pretty formats, the row is output as a separate table after the main result.
You can use WITH TOTALS in subqueries, that includes subqueries in the JOIN clause (in this case, the respective total values are combined).
Caution: Unlike MySQL (and standard SQL), the GROUP BY statement does not support positional arguments. For example, GROUP BY 1, 2 is interpreted as group by constant (that is, all rows in one).
LIMIT N BY Columns Clause
This clause selects the first N rows for each group of columns. LIMIT N BY is not related to LIMIT. They can both be used in the same query. LIMIT N BY can contain any number of columns or expressions.
HAVING Clause
This clause enables you to filter the aggregation results produced by GROUP BY. It is similar to the WHERE clause, but the difference is that WHERE is performed before aggregation (GROUP BY), while HAVING is performed after it. You cannot use HAVING if aggregation is not performed.
Caution: Unlike MySQL (and conforming to standard SQL), the GROUP BY statement does not support positional arguments.
ORDER BY Clause
The ORDER BY clause contains a list of expressions, which can each be attributed with a DESC (descending) or ASC (ascending) modifier to determine the sorting direction. If the direction is not specified, ASC is assumed. The sorting direction applies to a single expression, not to the entire list.
Rows that have identical values for the list of sorting expressions are output in an arbitrary order, which can also be different each time. If the ORDER BY clause is omitted, the order of the rows is also undefined, and might be different each time.
COLLATE
When you sort by string values, you can use COLLATE to specify the alphabet you want to use. For example, use ORDER BY SearchPhrase COLLATE 'tr' to sort by keyword in ascending order, with the Turkish alphabet, case insensitive. This assumes that the strings are UTF-8 encoded.
You can specify COLLATE or not for each expression independently. If you specify ASC or DESC, specify COLLATE after it. When you use COLLATE, sorting is always case-insensitive.
We recommend that you use COLLATE only for final sorting of a small number of rows, because sorting with COLLATE is less efficient than normal sorting by bytes.
NaN and NULL Sorting Order:
- With the NULLS FIRST modifier — First NULL, then NaN, then other values.
- With the NULLS LAST modifier — First the values, then NaN, then NULL.
- Default — The same as with the NULLS LAST modifier.
When floating point numbers are sorted, NaNs are separate from the other values. Regardless of the sorting order, NaNs come at the end. For ascending sorting they are placed as if they are larger than all the other numbers, while for descending sorting they are placed as if they are smaller than other numbers.
SELECT Clause
Expressions specified in the SELECT clause are calculated after all the operations in the clauses described above finish. If expressions in the SELECT clause contain aggregate functions, then the solution processes aggregate functions and the expressions used as their arguments during the GROUP BY aggregation. These expressions work as if they apply to separate rows in the result.
DISTINCT Clause
If you specify DISTINCT, only a single row remains out of all the sets of fully matching rows in the result. It is possible to obtain the same result by applying GROUP BY across the same set of values as specified as SELECT clause, without the use of any aggregate functions. The DISTINCT clause has these differences from the GROUP BY clause:
-
DISTINCT can be applied together with GROUP BY.
-
When you omit ORDER BY and define LIMIT, the query stops immediately after the required number of different rows are read.
-
Data blocks are output as they are processed, without waiting for the entire query to finish.
DISTINCT works with NULL as if NULL were a specific value. In other words, in the DISTINCT results, different combinations with NULL occur only once.
LIMIT m Clause
This clause selects the first m rows from the result.
LÍMIT n, m selects the first m rows from the result after skipping the first n rows. The LIMIT m OFFSET n syntax is equivalent. n and m must be non-negative integers.
If there is no ORDER BY clause that explicitly sorts results, the choice of rows for the result might be arbitrary and non-deterministic.
UNION ALL Clause
You can use this clause to combine any number of queries. Only UNION ALL is supported. UNION (UNION DISTINCT) is not supported. If you need UNION DISTINCT, you can type SELECT DISTINCT from a subquery that contains UNION ALL. Queries that are part of UNION ALL can run simultaneously, and their results can be mixed together.
The structure of results (the number and type of columns) must match for the queries, but the column names can differ. In this case, the column names for the final result are taken from the first query. Type casting is performed for unions. For example, if two queries are combined with the same field with non-Nullable and Nullable types from a compatible type, the resulting UNION ALL has a Nullable type field.
You cannot enclose queries that are part of UNION ALL in brackets. ORDER BY and LIMIT are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with UNION ALL in a subquery in the FROM clause.
IN Operators
This section describes the IN, NOT IN, GLOBAL IN, and GLOBAL NOT IN operators.
The left side of the IN operator is either a single column or a tuple. For example SELECT UserID IN (123, 456) FROM ...
If the left side is a single column that is in the index, and the right side is a set of constants, then the system uses the index to process the query.
The right side of the operator can be a set of constant expressions, a set of tuples, the name of a database table, or a SELECT subquery in brackets. The subquery can specify more than one column to filter tuples. For example:
SELECT (CounterID, UserID) IN (SELECT CounterID, UserID FROM ...) FROM ....
The columns to the left and right of the IN operator must have the same type.
The IN operator and the subquery can occur in any part of the query, including in aggregate functions and lambda functions.
NULL Processing
The IN operator assumes that the result of an operation with NULL is always equal to 0, regardless of whether NULL is on the right or left side of the operator. NULL values are not included in any dataset, do not correspond to each other, and cannot be compared.
The Asterisk Symbol (*)
You can include an asterisk in any part of a query instead of an expression. When the query is analyzed, the asterisk is expanded to a list of all table columns.
You can use an asterisk in these cases:
- To create a table dump.
- For tables that contain only a few columns. When there is strong filtration on a small number of columns, use PREWHERE.
- To get information about the columns in a table. In this case, set LIMIT 1. We recommend that you use the DESC TABLE query.
- In subqueries because columns that are not needed for the external query are excluded from subqueries.
In all other cases, to prevent performance issues, we do not recommend that you use the asterisk.