Skip to main content
Version: 5.0.0

Single Node Mode

1. Prepare installation environment

Create a KYLIN_HOME path and a Linux account for Kylin.

Example
  • The installation location is /usr/local/
  • Linux account to run Kylin is KyAdmin.

follow the Prerequisite and finish the environment preparation.

2. Download Kylin binary package.

Please download official release binary from Download Page .

If you want to package from source code, please refer to How To Package.

Copy and unzip Kylin package to your target server.

Example

cd /usr/local tar -zxvf apache-kylin-[Version].tar.gz

The decompressed directory should be exported as $KYLIN_HOME on your server.

Example

export KYLIN_HOME=/usr/local/apache-kylin-[Version]/

4. Download Kylin built-in Spark

Example

bash $KYLIN_HOME/sbin/download-spark-user.sh

There will be a spark directory under $KYLIN_HOME .

5. Configure Metadata DB.

You can use either MySQL or PostgresQL as a metadata DB.

For production environments, we recommend setting up a dedicated metadata DB to ensure reliability.

6. Create a working directory on HDFS and grant permissions.

The default working directory is /kylin. Also ensure the Linux account has access to its home directory on HDFS. Meanwhile, create directory /kylin/spark-history to store the spark log files.

hadoop fs -mkdir -p /kylin
hadoop fs -chown root /kylin
hadoop fs -mkdir -p /kylin/spark-history
hadoop fs -chown root /kylin/spark-history

You can modify working directory in $KYLIN_HOME/conf/kylin.properties.

Example

kylin.env.hdfs-working-dir=hdfs://${nameservice}/kylin

Note

If you do not have the permission to create /kylin/spark-history, you can configure kylin.engine.spark-conf.spark.eventLog.dir and kylin.engine.spark-conf.spark.history.fs.logDirectory with an available directory.

Quick Configuration

In the conf directory under the root directory of the installation package, you should configure the parameters in the file kylin.properties as follows:

  1. According to the PostgreSQL configuration, configure the following metadata parameters. Pay attention to replace the corresponding {metadata_name}, {host} , {port}, {user}, {password} value, the maximum length of metadata_name allowed is 28.

    Example

    kylin.metadata.url={metadata_name}@jdbc,driverClassName=org.postgresql.Driver,url=jdbc:postgresql://{host}:{port}/kylin,username={user},password={password}

    For more PostgreSQL configuration, please refer to Use PostgreSQL as Metastore. For information for MySQL configuration, please refer to Use MySQL as Metastore.

    Note

    please name the {metadata_name} with characters, numbers, or underscores. The name should start with characters.

  2. When executing jobs, Kylin will submit the build task to Yarn. You can set and replace {queue} in the following parameters as the queue you actually use, and require the build task to be submitted to the specified queue.

    Example

    kylin.engine.spark-conf.spark.yarn.queue={queue_name}

  3. Configure ZooKeeper.

    Kylin uses ZooKeeper for service discovery, pleaser refer to Service Discovery for more details.

    Configure property in ${KYLIN_HOME}\conf\kylin.properties(.override).

    Example

    kylin.env.zookeeper-connect-string=10.1.2.1:2181,10.1.2.2:2181,10.1.2.3:2181

    If you use ACL for Zookeeper, need setting the follow configuration

    PropertiesDescription
    kylin.env.zookeeper-acl-enabledWhether to enable Zookeeper ACL. The default value is disabled.
    kylin.env.zookeeper.zk-authThe user password and authentication method used by Zookeeper. The default value is empty.
    kylin.env.zookeeper.zk-aclACL permission setting. The default value is world:anyone:rwcda. By default, all users can perform all operations.

    If you need to encrypt kylin.env.zookeeper.zk-auth , you can do it like this:

    1. run following commands in ${KYLIN_HOME}, it will print encrypted value
    Example

    ./bin/kylin.sh org.apache.kylin.tool.general.CryptTool -e AES -s <value>

    1. Add the property in ${KYLIN_HOME}\conf\kylin.properties(.override)
    kylin.env.zookeeper.zk-auth=ENC('${encrypted_value}')
  4. Configure Gluten Apache Gluten is required by internal table feature, it's enabled by default. Add the following config to your ${KYLIN_HOME}\conf\kylin.properties(.override)

    Example

    gluten for query
    kylin.storage.columnar.spark-conf.spark.gluten.sql.columnar.backend.ch.runtime_config.storage_configuration.disks.hdfs.endpoint=hdfs://olivernameservice/
    gluten for build
    kylin.engine.spark-conf.spark.gluten.sql.columnar.backend.ch.runtime_config.storage_configuration.disks.hdfs.endpoint=hdfs://olivernameservice/

  5. Configure Query & Build Cluster on Spark Standalone

    # Query on Stand Alone
    kylin.storage.columnar.spark-conf.spark.master=spark://${SPARK_MASTER_HOST}:7077
    kylin.storage.columnar.spark-conf.spark.gluten.sql.columnar.backend.ch.runtime_config.hdfs.libhdfs3_conf={path for hdfs-site.xml}
    kylin.storage.columnar.spark-conf.spark.gluten.sql.columnar.executor.libpath={path for libch.so}
    kylin.storage.columnar.spark-conf.spark.executorEnv.LD_PRELOAD={path for libch.so}
    kylin.storage.columnar.spark-conf.spark.gluten.sql.columnar.backend.ch.runtime_config.reuse_disk_cache=false (worker 不能保证只有一个 app)
    kylin.storage.columnar.spark-conf.spark.gluten.sql.executor.jar.path={path for gluten.jar}
    # Build on Stand Alone
    kylin.engine.spark-conf.spark.master=spark://${SPARK_MASTER_HOST}:7077
    kylin.engine.spark-conf.spark.gluten.sql.columnar.backend.ch.runtime_config.hdfs.libhdfs3_conf={path for hdfs-site.xml}
    kylin.engine.spark-conf.spark.gluten.sql.columnar.executor.libpath={path for libch.so}
    kylin.engine.spark-conf.spark.executorEnv.LD_PRELOAD={path for libch.so}
    kylin.engine.spark-conf.spark.gluten.sql.columnar.backend.ch.runtime_config.reuse_disk_cache=false (worker 不能保证只有一个 app)
    kylin.engine.spark-conf.spark.gluten.sql.driver.jar.path={path for gluten.jar}
    kylin.engine.spark-conf.spark.gluten.sql.executor.jar.path={path for gluten.jar}
  6. (optional) Configure Spark Client node information Since Spark is started in yarn-client mode, if the IP information of Kylin is not configured in the hosts file of the Hadoop cluster, please add the following configurations in kylin.properties: kylin.storage.columnar.spark-conf.spark.driver.host={hostIp} kylin.engine.spark-conf.spark.driver.host={hostIp}

You can modify the {hostIp} according to the following example:

kylin.storage.columnar.spark-conf.spark.driver.host=10.1.3.71
kylin.engine.spark-conf.spark.driver.host=10.1.3.71