Skip to main content
Version: 3.1.3

Sample Dataset

Kylin binary package contains a sample dataset for testing. It consists of five tables, including the fact table which has 10,000 rows. Because of the small data size, it is convenient to carry out as a test in the virtual machine. You can import the Kylin built-in sample data into Hive using executable script.

Import Sample Dataset into Hive

The script is sample.sh. you can find it under $KYLIN_HOME/bin.

$KYLIN_HOME/bin/sample.sh

Once the script is complete, execute the following commands to enter Hive. Then you can confirm whether the tables are imported successfully.

hive

By default, the script imports 5 tables into Hive's default database. You can check the tables imported into Hive or query some tables:

hive> use default;
hive> show tables;
hive> select count(*) from kylin_sales;

Tip: If you need to import the table to the specified database in Hive, you can modify the configuration item kylin.source.hive.database-for-flat-table in the Kylin configuration file $KYLIN_HOME/conf/kylin.properties to the specified Hive database.

Table Introduction

Kylin supports both star schema and snowflake data model. In this manual, we will use a typical snowflake data model as our sample data set which contains five tables:

  • KYLIN_SALES This is the fact table, it contains detail information of sales orders. Each row holds information such as the seller, the commodity classification, the amount of orders, the quantity of goods, etc. Each row corresponds to a transaction.
  • KYLIN_CATEGORY_GROUPINGS This is a dimension table, it represents details of commodity classification, such as, name of commodity category, etc.
  • KYLIN_CAL_DT This is another dimension table which extends information of dates, such as beginning date of the year, beginning date of the month, beginning date of the week.
  • KYLIN_ACCOUNT This is the user account table. Each row represents a user who could be a buyer and/or a seller of a specific transaction, which links to KYLIN_SALES through the BUYER_ID or SELLER_ID.
  • KYLIN_COUNTRY This is the country dimension table linking to KYLIN_ACCOUNT.

The five tables together constitute the structure of the entire snowflake data model. Below is a relational diagram of them.

Sample Table

Data Dictionary

The generated hive tables contains too many columns which may confused you. So the following tables will list some key columns which referred in Kylin's Model/Cube, and describe the underlying business meaning of them.

TableFieldDescription
KYLIN_SALESTRANS_IDOrder ID
KYLIN_SALESPART_DTOrder Date
KYLIN_SALESLEAF_CATEG_IDID Of Commodity Category
KYLIN_SALESLSTG_SITE_IDSite ID
KYLIN_SALESSELLER_IDAccount ID Of Seller
KYLIN_SALESBUYER_IDAccount ID Of Buyer
KYLIN_SALESPRICEOrder Amount
KYLIN_SALESITEM_COUNTThe Number Of Purchased Goods
KYLIN_SALESLSTG_FORMAT_NAMEOrder Transaction Type
KYLIN_SALESOPS_USER_IDSystem User ID
KYLIN_SALESOPS_REGIONSystem User Region
KYLIN_CATEGORY_GROUPINGSUSER_DEFINED_FIELD1User Defined Fields 1
KYLIN_CATEGORY_GROUPINGSUSER_DEFINED_FIELD3User Defined Fields 3
KYLIN_CATEGORY_GROUPINGSUPD_DATEUpdate Date
KYLIN_CATEGORY_GROUPINGSUPD_USERUpdate User
KYLIN_CATEGORY_GROUPINGSMETA_CATEG_NAMELevel 1 Category
KYLIN_CATEGORY_GROUPINGSCATEG_LVL2_NAMELevel 2 Category
KYLIN_CATEGORY_GROUPINGSCATEG_LVL3_NAMELevel 3 Category
KYLIN_CAL_DTCAL_DTDate
KYLIN_CAL_DTWEEK_BEG_DTWeek Beginning Date
KYLIN_CAL_DTMONTH_BEG_DTMonth Beginning Date
KYLIN_CAL_DTYEAR_BEG_DTYear Beginning Date
KYLIN_ACCOUNTACCOUNT_IDID Number Of Account
KYLIN_ACCOUNTACCOUNT_COUNTRYCountry ID Where Account Resides
KYLIN_ACCOUNTACCOUNT_BUYER_LEVELBuyer Account Level
KYLIN_ACCOUNTACCOUNT_SELLER_LEVELSeller Account Level
KYLIN_ACCOUNTACCOUNT_CONTACTContact of Account
KYLIN_COUNTRYCOUNTRYCountry ID
KYLIN_COUNTRYNAMEDescriptive Name Of Country