Skip to main content
Version: 4.0.4

SQL Reference

Apache Kylin relies on Apache Calcite to parse and optimize the SQL statements. As an OLAP engine, Kylin supports SELECT statements, while doesn't support others like INSERT, UPDATE and DELETE operations in SQL, so Kylin's SQL grammer is a subset of Apache Calcite. This page lists the SQL grammar, the functions and the basic data types that Kylin supports. You can also check Calcite SQL reference for more detailed info.

Grammar​

QUERY SYNTAX SELECT γ€€STATEMENT γ€€EXPRESSION SUBQUERY JOIN γ€€INNER JOIN γ€€LEFT JOIN UNION UNION ALL

Function​

COUNT γ€€COUNT(COLUMN) γ€€COUNT(*) COUNT_DISTINCT MAX MIN PERCENTILE SUM TOP_N WINDOW γ€€ROW_NUMBER γ€€AVG γ€€RANK γ€€DENSE_RANK γ€€FIRST_VALUE γ€€LAST_VALUE γ€€LAG γ€€LEAD γ€€NTILE γ€€CASE WHEN γ€€CAST SUSTRING COALESCE STDDEV_SUM INTERSECT_COUNT INTERSECT_VALUE

Data Type​

DATA TYPE

Sql Execution Plan​

You can using explain plan for to displays execution plan, such as:

explain plan for select count(*) from KYLIN_SALES

The front-end displays the string of a one line execution plan. It is better to export the execution plan through the result export to view it.

QUERY SYNTAX​

statement:
| query

query:
values
| WITH withItem [ , withItem ]* query
| {
select
| selectWithoutFrom
| query UNION [ ALL | DISTINCT ] query
| query INTERSECT [ ALL | DISTINCT ] query
}
[ ORDER BY orderItem [, orderItem ]* ]
[ LIMIT { count | ALL } ]
[ OFFSET start { ROW | ROWS } ]
[ FETCH { FIRST | NEXT } [ count ] { ROW| ROWS } ]

withItem:
name
['(' column [, column ]* ')' ]
AS '(' query ')'

orderItem:
expression [ ASC | DESC ]οΌ» NULLS FIRST |NULLS LAST οΌ½

select:
SELECT [ ALL | DISTINCT]
{ * | projectItem [, projectItem ]* }
FROM tableExpression
[ WHERE booleanExpression ]
[ GROUP BY { groupItem [, groupItem ]* }]
[ HAVING booleanExpression ]
[ WINDOW windowName AS windowSpec [,windowName AS windowSpec ]* ]

selectWithoutFrom:
SELECT [ ALL | DISTINCT ]
{ * | projectItem [, projectItem ]* }

projectItem:
expression [ [ AS ] columnAlias ]
| tableAlias . *

tableExpression:
tableReference [, tableReference ]*
| tableExpression [ NATURAL ]οΌ»( LEFT | RIGHT | FULL ) [ OUTER ] οΌ½ JOIN tableExpression [ joinCondition ]

joinCondition:
ON booleanExpression
| USING '(' column [, column ]* ')'

tableReference:
tablePrimary
[ matchRecognize ]
[ [ AS ] alias [ '(' columnAlias [,columnAlias ]* ')' ] ]

tablePrimary:
[ [ catalogName . ] schemaName . ] tableName
'(' TABLE [ [ catalogName . ] schemaName. ] tableName ')'
| [ LATERAL ] '(' query ')'
| UNNEST '(' expression ')' [ WITH ORDINALITY ]
| [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]*')' ')'

values:
VALUES expression [, expression ]*

groupItem:
expression
| '('')'
| '('expression [, expression ]* ')'
| GROUPING SETS '(' groupItem [, groupItem ]* ')'

windowRef:
windowName
| windowSpec

windowSpec:
[windowName ]
'('
[ ORDER BY orderItem [, orderItem ]* ]
[ PARTITION BY expression [, expression]* ]
[
RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
| ROWS numericExpression { PRECEDING | FOLLOWING }
]
')'

SELECT​

STATEMENT​

SELECT chooses the data from the table. COUNT is used for quantitative statistics. DISTINCT filters out duplicate results. AS is used to alias tables or columns. FROM identifies the table being queried. JOIN is used to connect two tables to get the desired data. WHERE is used to specify the standard of selection. LIKE is used to search for a specified pattern in a column in a WHERE clause. BETWEEN ... AND is used to select a range of data between two values. AND and OR is used to filter records based on more than one condition. GROUP BY groups the result by the given expression(s). HAVING filters rows after grouping. ORDER BY sorts the result based on the given expression, usually uses with TOPN function. LIMIT limits the number of rows returned by the query.

Example:

SELECT COUNT(*) FROM kylin_sales;

SELECT COUNT(DISTINCT seller_id) FROM kylin_sales;

SELECT seller_id, COUNT(1) FROM kylin_sales GROUP BY seller_id;

SELECT lstg_format_name, SUM(price) AS gmv, COUNT(DISTINCT seller_id) AS dist_seller FROM kylin_sales WHERE lstg_format_name='FP-GTC' GROUP BY lstg_format_name HAVING COUNT(DISTINCT seller_id) > 50;

SELECT lstg_format_name FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt WHERE NOT(lstg_format_name NOT LIKE '%ab%') GROUP BY lstg_format_name;

SELECT kylin_cal_dt.cal_dt FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt WHERE kylin_cal_dt.cal_dt BETWEEN DATE '2013-01-01' AND DATE '2013-06-04';

SELECT kylin_sales.lstg_format_name, SUM(kylin_sales.price) AS gmv, COUNT(*) AS trans_cnt FROM kylin_sales WHERE kylin_sales.lstg_format_name IS NULL GROUP BY kylin_sales.lstg_format_name HAVING SUM(price)>5000 AND COUNT(*)>72;

SELECT kylin_sales.lstg_format_name, SUM(kylin_sales.price) AS gmv, COUNT(*) AS trans_cnt FROM kylin_sales WHERE kylin_sales.lstg_format_name IS NOT NULL GROUP BY kylin_sales.lstg_format_name HAVING SUM(price)>5000 OR COUNT(*)>20;

SELECT lstg_format_name, SUM(price) AS gmv, COUNT(1) AS trans_cnt FROM kylin_sales WHERE lstg_format_name='FP-GTC' GROUP BY lstg_format_name ORDER BY lstg_format_name LIMIT 10;

EXPRESSION​

An expression in a SELECT statement. All columns in a table may be selected using *. Example:

  1. *
  2. ID AS VALUE
  3. VALUE + 1

SUBQUERY​

Example:

SELECT cal_dt ,sum(price) AS sum_price FROM (SELECT kylin_cal_dt.cal_dt, kylin_sales.price FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt INNER JOIN kylin_category_groupings ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id) t GROUP BY cal_dt;

JOIN​

INNER JOIN​

The INNER JOIN keyword returns rows when there is at least one match in the table.

Example:

SELECT kylin_cal_dt.cal_dt, kylin_sales.price FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt;

LEFT JOIN​

The LEFT JOIN keyword returns all rows from the left table (kylin_sales), even if there are no matching rows in the right table (kylin_category_groupings).

Example:

SELECT seller_id FROM kylin_sales LEFT JOIN kylin_category_groupings AS kylin_category_groupings ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id WHERE lstg_format_name='FP-GTC' GROUP BY seller_id LIMIT 20;

UNION​

The UNION operator is used to combine the result sets of two or more SELECT statements. Note that the SELECT statement inside UNION must have the same number of columns. Columns must also have similar data types. At the same time, the order of the columns in each SELECT statement must be the same. By default, the UNION operator picks a different value. If you allow duplicate values, use UNION ALL.

Example:

SELECT SUM(price) AS x, seller_id, COUNT(*) AS y FROM kylin_sales WHERE part_dt < DATE '2012-08-01' GROUP BY seller_id UNION (SELECT SUM(price) AS x, seller_id, COUNT(*) AS y FROM kylin_sales WHERE part_dt > DATE '2012-12-01' GROUP BY seller_id);

UNION ALL​

The UNION ALL command is almost equivalent to the UNION command, but the UNION ALL command lists all values.

Example:

SELECT COUNT(trans_id) AS trans_id FROM kylin_sales AS test_a WHERE trans_id <> 1 UNION ALL SELECT COUNT(trans_id) AS trans_id FROM kylin_sales AS test_b;

COUNT​

Returns the number of rows matching the specified criteria.

COUNT(COLUMN)​

Example:

SELECT COUNT(seller_id) FROM kylin_sales;

COUNT(*)​

Example:

SELECT COUNT(*) FROM kylin_sales;

COUNT_DISTINCT​

Example:

SELECT COUNT(DISTINCT seller_id) AS DIST_SELLER FROM kylin_sales;

MAX​

Returns the maximum value in a column. NULL values are not included in the calculation. Example:

SELECT MAX(lstg_site_id) FROM kylin_sales;

MIN​

Returns the minimum value in a column. NULL values are not included in the calculation. Example:

SELECT MIN(lstg_site_id) FROM kylin_sales;

PERCENTILE​

Example:

SELECT seller_id, PERCENTILE(price, 0.5) FROM kylin_sales GROUP BY seller_id;

SELECT seller_id, PERCENTILE_APPROX(price, 0.5) FROM kylin_sales GROUP BY seller_id;

SUM​

Returns the total number of numeric columns. Example:

SELECT SUM(price) FROM kylin_sales;

TOP_N​

Example:

SELECT SUM(price) AS gmv
FROM kylin_sales
INNER JOIN kylin_cal_dt AS kylin_cal_dt
ON kylin_sales.part_dt = kylin_cal_dt.cal_dt
INNER JOIN kylin_category_groupings
ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id
WHERE kylin_cal_dt.cal_dt between DATE '2013-09-01' AND DATE '2013-10-01' AND (lstg_format_name='FP-GTC' OR 'a' = 'b')
GROUP BY kylin_cal_dt.cal_dt;

SELECT kylin_sales.part_dt, seller_id
FROM kylin_sales
INNER JOIN kylin_cal_dt AS kylin_cal_dt
ON kylin_sales.part_dt = kylin_cal_dt.cal_dt
INNER JOIN kylin_category_groupings
ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id
AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id
GROUP BY
kylin_sales.part_dt, kylin_sales.seller_id ORDER BY SUM(kylin_sales.price) DESC LIMIT 20;

WINDOW​

The WINDOW function performs the calculation on a set of table rows associated with the current row. NOTE: OVER clause is necessary for window functions.

ROW_NUMBER​

Example:

SELECT lstg_format_name, SUM(price) AS gmv, ROW_NUMBER() OVER() FROM kylin_sales GROUP BY lstg_format_name;

AVG​

Returns the average of the numeric columns. NULL values are not included in the calculation. Example:

SELECT lstg_format_name, AVG(SUM(price)) OVER(PARTITION BY lstg_format_name) FROM kylin_sales GROUP BY part_dt, lstg_format_name;

RANK​

Example:

SELECT part_dt, lstg_format_name, SUM(price), RANK() OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "rank" FROM kylin_sales GROUP BY part_dt, lstg_format_name;

DENSE_RANK​

Example:

SELECT part_dt, lstg_format_name, SUM(price), DENSE_RANK() OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "dense_rank" FROM kylin_sales GROUP BY part_dt, lstg_format_name;

FIRST_VALUE​

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, FIRST_VALUE(SUM(price)) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "first" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

LAST_VALUE​

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LAST_VALUE(SUM(price)) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "current" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

LAG​

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "prev" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

LEAD​

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LEAD(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "next" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

NTILE​

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, NTILE(4) OVER (PARTITION BY lstg_format_name ORDER BY part_dt) AS "quarter" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

CASE WHEN​

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, (CASE LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) WHEN 0.0 THEN 0 ELSE SUM(price)/LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) END) AS "prev" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

CAST​

The keyword RANGE, INTERVAL to specify a range. PRECEDING means the first few days (second/minute/hour/month/year). FOLLOWING means the next few days (second/minute/hour/month/year). Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, FIRST_VALUE(SUM(price)) OVER (PARTITION BY lstg_format_name ORDER BY CAST(part_dt AS timestamp) RANGE INTERVAL '3' DAY PRECEDING) AS "prev 3 days", LAST_VALUE(SUM(price)) OVER (PARTITION BY lstg_format_name ORDER BY CAST(part_dt AS timestamp) RANGE INTERVAL '3' DAY FOLLOWING) AS "next 3 days" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

SUBSTRING​

Example:

SELECT SUBSTRING(lstg_format_name, 1) FROM kylin_sales;

COALESCE​

Example:

SELECT COALESCE(lstg_format_name, '888888888888') FROM kylin_sales;

STDDEV_SUM​

Example: The first query is shortcut of the second query. stddev_sum is a UDAF which introduced in KYLIN-3361.

select A, stddev_sum(sampling_dim, m)
from T
group by A
select A, stddev(SUM_M)
from (
select A, sampling_dim, sum(m) as SUM_M
from T
group by A, sampling_dim
) a
group by A

INTERSECT_COUNT​

INTERSECT_COUNT is used to calculate the retention rate. The measure to be calculated have defined precisely count distinct measure. Example 1: Refer to intersect_count

select city, version,
intersect_count(uuid, dt, array['20161014']) as first_day,
intersect_count(uuid, dt, array['20161015']) as second_day,
intersect_count(uuid, dt, array['20161016']) as third_day,
intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version

Example 1: Refer to KYLIN-4314

select city, version,
intersect_count(uuid, dt, array['20161014']) as first_day,
intersect_count(uuid, dt, array['20161015']) as second_day,
intersect_count(uuid, dt, array['20161016']) as third_day,
intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_count(uuid, dt, array['20161014|20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version

INTERSECT_VALUE​

INTERSECT_COUNT returns the bitmap details of the retained value. The measure to be calculated have defined precisely count distinct measure. Example:

select city, version,
intersect_value(uuid, dt, array['20161014']) as first_day,
intersect_value(uuid, dt, array['20161015']) as second_day,
intersect_value(uuid, dt, array['20161016']) as third_day,
intersect_value(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_value(uuid, dt, array['20161014|20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version

DATA TYPE​

| ---------- | ---------- | ---------- | ---------- | -------------------- | | ANY | CHAR | VARCHAR | STRING | BOOLEAN | | BYTE | BINARY | INT | SHORT | LONG | | INTEGER | TINYINT | SMALLINT | BIGINT | TIMESTAMP | | FLOAT | REAL | DOUBLE | DECIMAL | DATETIME | | NUMERIC | DATE | TIME | | |