Skip to main content

Sample dataset

Kylin is embedded with a standard SSB dataset (approximately 5.9 MB) for testing or trying out different functions. This SSB dataset contains 5 tables and 1 view. LINEORDER serves as a central fact table with 60,175 rows of data.

Introduction

The following table lists the 5 tables and 1 view of SSB sample dataset.

TableTypeDescription
LINEORDERFact tableContain detailed information about sales orders. Each row holds order information such as customer, supplier, order amount, and order date.
P_LINEORDERView based on fact tableContain details about sales orders and a pre-calculated row (V_REVENUE) with same transaction records as in LINEORDER.
CUSTOMERDimension tableContain customer information, such as customer name, customer address, and customer city.
SUPPLIERDimension tableContain supplier information, such as supplier name, supplier address, and supplier city.
DATESDimension tableContain information about the dates of 7 years, such as beginning date of the year, beginning date of the month, and beginning date of the week.
PARTDimension tableContain part information, such as part name, part category, part color, and part type.

The 5 tables together constitute the structure of the entire star data model. Below is an entity-relationship (E-R) diagram.

Entity-relationship diagram

Join Relationships:

P_LINEORDER INNER JOIN DATES ON P_LINEORDER.LO_ORDERDATE=DATES.D_DATEKEY
P_LINEORDER INNER JOIN CUSTOMER ON P_LINEORDER.LO_CUSTKEY=CUSTOMER.C_CUSTKEY
P_LINEORDER INNER JOIN PART ON P_LINEORDER.LO_PARTKEY=PART.P_PARTKEY
P_LINEORDER INNER JOIN SUPPLIER ON P_LINEORDER.LO_SUPPKEY=SUPPLIER.S_SUPPKEY

Import and check sample dataset

  1. Log on to the server command line, and run the following command to import the SSB sample dataset:

    $KYLIN_HOME/bin/sample.sh

    [!NOTE]

    Replace KYLIN_HOME with the actual path of Kylin.

  2. To check sample dataset:

    1. In the terminal, run hive command to enter Hive CLI.

    2. Run the following commands sequentially to check information about databases and tables.

      ## List all databases
      show databases;
      ## Enter database SSB
      use ssb;
      ## List all tables in database SSB
      show tables;
      ## Query the number of records in table SUPPLIER
      select count(*) from SUPPLIER;

Appendix: Tables and columns

LINEORDER

ColumnDescription
LO_ORDERKEYOrder ID
LO_CUSTKEYCustomer ID
LO_PARTKEYPart ID
LO_SUPPKEYSupplier ID
LO_ORDERDATEOrder date
LO_ORDERPRIORITYOrder priority
LO_SHIPPRIORITYShip priority
LO_LINENUMBERCompound primary key: L_ORDERKEY, L_LINENUMBER
LO_QUANTITYNumber of purchased goods
LO_EXTENDEDPRICEExtended price of order
LO_ORDTOTALPRICETotal price of order
LO_DISCOUNTOrder discount
LO_REVENUEOrder revenue
LO_SUPPLYCOSTSupplier cost
LO_TAXTax
LO_COMMITDATECommit date
LO_SHIPMODEShip mode

CUSTOMER

ColumnDescription
C_CUSTKEYCustomer ID
C_NAMECustomer name
C_ADDRESSCustomer address
C_CITYCustomer city
C_NATION_PREFIXNation prefix
C_NATIONCustomer nation
C_REGIONCustomer region
C_PHONECustomer phone number
C_MKTSEGMENTMarket segment

SUPPLIER

ColumnDescription
S_SUPPKEYSupplier ID
S_NAMESupplier name
S_ADDRESSSupplier address
S_CITYSupplier city
S_NATION_PREFIXNation prefix
S_NATIONSupplier nation
S_REGIONSupplier region
S_PHONESupplier phone number

DATES

ColumnDescription
D_DATEKEYDate ID
D_DATEDate
D_DAYOFWEEKDay of week
D_MONTHMonth
D_YEARYear
D_YEARMONTHNUMNum of year and month
D_YEARMONTHYear and month
D_DAYNUMINWEEKNum of days in a week
D_DAYNUMINMONTHNum of days in a month
D_DAYNUMINYEARNum of days in a year
D_MONTHINYEARNum of months in a year
D_WEEKNUMINYEARNum of weeks in a year
D_SELLINGSEASONSelling season
D_LASTDAYINWEEKFLLast day in one fiscal week
D_LASTDAYINMONTHFLLast day in one fiscal month
D_HOLIDAYFLHoliday in one fiscal year
D_WEEKDAYFLWeekday in one fiscal year

PART

ColumnDescription
P_PARTKEYPart ID
P_NAMEPart name
P_MFGRPart manufacturer
P_CATEGORYPart category
P_BRANDPart brand
P_COLORPart color
P_TYPEPart type
P_SIZEPart size
P_CONTAINERPart container