Skip to main content
Version: 5.0.0

Prerequisite

To ensure system performance and stability, we recommend you run Kylin on a dedicated Hadoop cluster.

Prior to installing Kylin, please check the following prerequisites are met.

Supported Hadoop Distributions

The following Hadoop distributions are verified to run on Kylin.

  • Apache Hadoop 3.2.1

Kylin requires some components, please make sure each server has the following components.

  • Hive
  • HDFS
  • Yarn
  • ZooKeeper

Prepare Environment

First, make sure you allocate sufficient resources for the environment. Please refer to Prerequisites for detailed resource requirements for Kylin. Moreover, please ensure that HDFS, YARN, Hive, ZooKeeper and other components are in normal state without any warning information.

Additional configuration required for Apache Hadoop version

Add the following two configurations in $KYLIN_HOME/conf/kylin.properties:

  • kylin.env.apache-hadoop-conf-dir Hadoop conf directory in Hadoop environment
  • kylin.env.apache-hive-conf-dir Hive conf directory in Hadoop environment

Jar package required by Apache Hadoop version

In Apache Hadoop 3.2.1, you also need to prepare the MySQL JDBC driver in the operating environment of Kylin.

Download MySQL 8.0 JDBC driver:https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.30/mysql-connector-java-8.0.30.jar. Please place the JDBC driver in the $KYLIN_HOME/lib/ext directory.

Java Environment

Kylin requires:

  • Requires your environment's default JDK version is 8 (JDK 1.8_162 or above small version)
java -version

You can use the following command to check the JDK version of your existing environment, for example, the following figure shows JDK 8

JDK version

Account Authority

The Linux account running Kylin must have the required access permissions to the cluster. These permissions include:

  • Read/Write permission of HDFS
  • Create/Read/Write permission of Hive table

Verify the user has access to the Hadoop cluster with account KyAdmin. Test using the steps below:

  1. Verify the user has HDFS read and write permissions

    Assuming the HDFS storage path for model data is /kylin, set it in conf/kylin.properties as:

    kylin.env.hdfs-working-dir=/kylin

    The storage folder must be created and granted with permissions. You may have to switch to HDFS administrator (usually the hdfs user), to do this:

    su hdfs
    hdfs dfs -mkdir /kylin
    hdfs dfs -chown KyAdmin /kylin
    hdfs dfs -mkdir /user/KyAdmin
    hdfs dfs -chown KyAdmin /user/KyAdmin

    Verify the KyAdmin user has read and write permissions

    hdfs dfs -put <any_file> /kylin
    hdfs dfs -put <any_file> /user/KyAdmin
  2. Verify the KyAdmin user has Hive read and write permissions

    Let's say you want to store a Hive table t1 in Hive database kylinDB, The t1 table contains two fields id, name.

    Then verify the Hive permissions:

    #hive
    hive> show databases;
    hive> use kylinDB;
    hive> show tables;
    hive> insert into t1 values(1, "kylin");
    hive> select * from t1;

Prepare Metadata DB

A configured metastore is required for this product.

We recommend using PostgreSQL 10.7 as the metastore, which is provided in our package. Please refer to Use PostgreSQL as Metastore (Default) for installation steps and details.

If you want to use your own PostgreSQL database, the supported versions are below:

  • PostgreSQL 9.1 or above

You can also choose to use MySQL but we currently don't provide a MySQL installation package or JDBC driver. Therefore, you need to finish all the prerequisites before setting up. Please refer to Use MySQL as Metastore for installation steps and details. The supported MySQL database versions are below:

  • MySQL 5.1-5.7
  • MySQL 5.7 (recommended)

Prepare Zookeeper

The following steps can be used to quickly verify the connectivity between ZooKeeper and Kylin after Kerberos is enabled.

  1. Find the ZooKeeper working directory on the node where the ZooKeeper Client is deployed

  2. Add or modify the Client section to the conf/jaas.conf file:

    Client {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    keyTab="/path/to/keytab_assigned_to_kylin"
    storeKey=true
    useTicketCache=false
    principal="principal_assigned_to_kylin";
    };
  3. export JVMFLAGS="-Djava.security.auth.login.config=/path/to/jaas.conf"

  4. bin/zkCli.sh -server ${kylin.env.zookeeper-connect-string}

  5. Verify that the ZooKeeper node can be viewed normally, for example: ls /

  6. Clean up the new Client section in step 2 and the environment variables unset JVMFLAGS declared in step 3

If you download ZooKeeper from the non-official website, you can consult the operation and maintenance personnel before performing the above operations.

Network Port Requirements

Kylin needs to communicate with different components. The following are the ports that need to be opened to Kylin. This table only includes the default configuration of the Hadoop environment, and does not include the configuration differences between Hadoop platforms.

ComponentPortFunctionRequired
SSH22SSH to connect to the port of the virtual machine where Kylin is locatedY
Kylin7070Kylin access portY
Kylin7443Kylin HTTPS access portN
HDFS8020HDFS receives client connection RPC portY
HDFS50010Access HDFS DataNode, data transmission portY
Hive10000HiveServer2 access portN
Hive9083Hive Metastore access portY
Zookeeper2181Zookeeper access portY
Yarn8088Yarn Web UI access portY
Yarn8090Yarn Web UI HTTPS access portN
Yarn8050 / 8032Yarn ResourceManager communication portY
Spark4041Kylin query engine Web UI default portY
Spark18080Spark History Server portN
Spark(1024, 65535]The ports occupied by Spark Driver and Executor are randomY
Influxdb8086Influxdb HTTP portN
Influxdb8088Influxdb RPC portN
PostgreSQL5432PostgreSQL access portY
MySQL3306MySQL access portY

Hadoop Cluster Resource Allocation

To ensure Kylin works efficiently, please ensure the Hadoop cluster configurations satisfy the following conditions:

  • yarn.nodemanager.resource.memory-mb larger than 8192 MB
  • yarn.scheduler.maximum-allocation-mb larger than 4096 MB
  • yarn.scheduler.maximum-allocation-vcores larger than 5

If you need to run Kylin in a sandbox or other virtual machine environment, please make sure the virtual machine environment has the following resources:

  • No less than 4 processors

  • Memory is no less than 10 GB

  • The value of the configuration item yarn.nodemanager.resource.cpu-vcores is no less than 8

We recommend the following hardware configuration to install Kylin:

  • 16 vCore, 64 GB memory
  • At least 500GB disk
  • For network port requirements, please refer to the Network Port Requirements chapter.

We recommend using the following version of the Linux operating system:

  • Ubuntu 18.04 and above (recommend LTS version)
  • Red Hat Enterprise Linux 6.4+ and above
  • CentOS 6.4+ and above
  • Operating System: macOS / Windows 7 and above
  • RAM: 8G or above
  • Browser version:
    • Chrome 45 or above
    • Internet Explorer 11 or above