RAWmeasure function is use to query the detail data on the measure column in Kylin.
We design the cube desc is the
DT,SITE_ID columns as dimensions, and
SUM(ITEM_COUNT) as measure. So, the base cuboid data will like this:
|Rowkey of base cuboid||SUM(ITEM_COUNT)|
For the first row in the base cuboid data, Kylin can extract the dimension column values
2016-05-01,0 from the HBase Rowkey, and in the measure cell will store the measure function’s aggregated results
300, we can’t get the raw value
200 which before the aggregation on the
The RAW function is use to make the SQL:
SELECT DT,SITE_ID,ITEM_COUNT FROM FACT_TABLE
to return the correct result:
How to use
- Choose the Kylin version
- Like the above case, we can make the
DT,SITE_IDas dimensions, and
- After the cube build, you can use the SQL to query the raw data:
SELECT DT,SITE_ID,ITEM_COUNT FROM FACT_TABLE WHERE SITE_ID = 0
The column which define
RAW measure will be encoded with dictionary by default. So, you must know you data’s cardinality and distribution characteristics.
- As far as possible to define the value uniform distribution column to dimensions, this will make the measure cell value size more uniform and avoid data skew.
- If choose the ultra high cardinality column to define
RAWmeasure, you can try the following to avoid the dictionary build error:
- Cut a big segment into several segments, if you were trying to build a large data set at once;
kylin.dictionary.max.cardinalityin conf/kylin.properties to a bigger value (default is 5000000).
To be improved
- Now, the maximum storage 1M values of
RAWmeasure in one cuboid. If exceed 1M values, it will throw
BufferOverflowExceptionin the cube build. This will be optimized in the later release.
- Only dimension column can use in
RAWmeasure column is not support.
- Custom one aggregation function RAW implement, the function’s return type depends on the column type.
- Make the RAW aggregation function to save the column raw data in the base cuboid data.
- The HBase value cell will store the dictionary id of the raw data to save space.
- The SQL which contains the RAW measure column will be routed to the base cuboid query.
- Extract the raw data from base cuboid data with dimension values to assemble into a complete row when query.