Apache Kylin | Deploy Kylin on Kubernetes

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, it facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support and tools are widely available.

Apache Kylin is an open source, distributed analytical data warehouse for big data. Deploy Kylin on Kubernetes cluster will reduce cost of maintenance and extension.

Pre-requirements

A hadoop cluster.
A K8s cluster, with sufficient system resources.
kylin-client image.
An Elasticsearch cluster(maybe optional).

How to build docker image

Hadoop-client image

What is a hadoop-client docker image and why do we need this?

As we all know, the node you want to deploy Kylin should contain Hadoop dependencies(jars and configuration files), these dependencies let you have access to Hadoop Services, such as HDFS, HBase, Hive, which are needed by Apache Kylin. Unfortunately, each Hadoop distribution(CHD or HDP etc.) has its own specific jars. So, we can build specific images for specific Hadoop distributions, which will make image management task easier. This will have the following two benefits:

Someone who has more knowledge on Hadoop can do this work, and let kylin users build their Kylin image base on provided Hadoop-Client image.
Upgrade Kylin will be much easier.

Build Step
- Prepare and modify Dockerfile(If you are using other hadoop distribution, please consider build an image yourself).
- Place Spark binary(such as spark-2.3.2-bin-hadoop2.7.tgz) into dir provided-binary.
- Run build-image.sh to build the image.

Kylin-client image

What is a kylin-client docker image?

kylin-client is a docker image which based on hadoop-client, it will provide the flexibility of upgrade of Apache Kylin.

Build Step

Place Kylin binary(such as apache-kylin-3.0.1-bin-cdh57.tar.gz) and uncompress it into current dir.
Modify Dockerfile , change the value of KYLIN_VERSION and the name of base image(hadoop-client).
Run build-image.sh to build image.

How to deploy kylin on kubernetes

Here let’s take a look at how to deploy a kylin cluster which connects to CDH 5.7.

1 kubenetes/template/production/example/deployment is the working directory.

2 Update hadoop configuration files (kubenetes/template/production/example/config/hadoop) and filebeat’s configuration file.

3 Create statefulset and service for memcached.

Apply kubernetes objects.

$ kubectl apply -f memcached/
service/cache-svc created
statefulset.apps/kylin-memcached created

Check the hostname of cache service.

$ kubectl run -it--image=busybox:1.28.4--rm--restart=Never sh -n test-dns
If you don't see a command prompt, try pressing enter.
/ # nslookup cache-svc.kylin-example.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name:   cache-svc.kylin-example.svc.cluster.local
Address 1: 192.168.11.44 kylin-memcached-0.cache-svc.kylin-example.svc.cluster.local
/ #

4 Create statefulset and service for Apache Kylin.
- Modify memcached configuration.

// modify memcached hostname(session sharing)
// memcachedNodes="n1:kylin-memcached-0.cache-svc.kylin-example.svc.cluster.local:11211"
$ vim ../config/tomcat/context.xml
// modify memcached hostname(query cache)
// kylin.cache.memcached.hosts=kylin-memcached-0.cache-svc.kylin-example.svc.cluster.local:11211
$ vim ../config/kylin-job/kylin.properties
$ vim ../config/kylin-query/kylin.properties

Create the configMap

$ kubectl create configmap -n kylin-example hadoop-config \
--from-file=../config/hadoop/core-site.xml \
--from-file=../config/hadoop/hdfs-site.xml \
--from-file=../config/hadoop/yarn-site.xml \
--from-file=../config/hadoop/mapred-site.xml \
--dry-run -o yaml | kubectl apply -f -
$ kubectl create configmap -n kylin-example hive-config \
--from-file=../config/hadoop/hive-site.xml \
--dry-run -o yaml | kubectl apply -f -
$ kubectl create configmap -n kylin-example hbase-config \
--from-file=../config/hadoop/hbase-site.xml \
--dry-run -o yaml | kubectl apply -f -
$ kubectl create configmap -n kylin-example kylin-more-config \
--from-file=../config/kylin-more/applicationContext.xml \
--from-file=../config/kylin-more/ehcache.xml \
--from-file=../config/kylin-more/ehcache-test.xml \
--from-file=../config/kylin-more/kylinMetrics.xml \
--from-file=../config/kylin-more/kylinSecurity.xml \
--dry-run -o yaml | kubectl apply -f -
$ kubectl create configmap -n kylin-example kylin-job-config  \
--from-file=../config/kylin-job/kylin-kafka-consumer.xml \
--from-file=../config/kylin-job/kylin_hive_conf.xml \
--from-file=../config/kylin-job/kylin_job_conf.xml \
--from-file=../config/kylin-job/kylin_job_conf_inmem.xml \
--from-file=../config/kylin-job/kylin-server-log4j.properties \
--from-file=../config/kylin-job/kylin-spark-log4j.properties \
--from-file=../config/kylin-job/kylin-tools-log4j.properties \
--from-file=../config/kylin-job/kylin.properties \
--from-file=../config/kylin-job/setenv.sh \
--from-file=../config/kylin-job/setenv-tool.sh \
--dry-run -o yaml | kubectl apply -f -
$ kubectl create configmap -n kylin-example kylin-query-config  \
--from-file=../config/kylin-query/kylin-kafka-consumer.xml \
--from-file=../config/kylin-query/kylin_hive_conf.xml \
--from-file=../config/kylin-query/kylin_job_conf.xml \
--from-file=../config/kylin-query/kylin_job_conf_inmem.xml \
--from-file=../config/kylin-query/kylin-server-log4j.properties \
--from-file=../config/kylin-query/kylin-spark-log4j.properties \
--from-file=../config/kylin-query/kylin-tools-log4j.properties \
--from-file=../config/kylin-query/kylin.properties \
--from-file=../config/kylin-query/setenv.sh \
--from-file=../config/kylin-query/setenv-tool.sh \
--dry-run -o yaml | kubectl apply -f -
$ kubectl create configmap -n kylin-example filebeat-config  \
--from-file=../config/filebeat/filebeat.yml \
--dry-run -o yaml | kubectl apply -f -
$ kubectl create configmap -n kylin-example tomcat-config  \
--from-file=../config/tomcat/server.xml \
--from-file=../config/tomcat/context.xml \
--dry-run -o yaml | kubectl apply -f -

Deploy Kylin’s Job server

$ kubectl apply -f kylin-job/
service/kylin-job-svc created
statefulset.apps/kylin-job created

Deploy Kylin’s Query server
$ kubectl apply -f kylin-query/ service/kylin-query-svc created statefulset.apps/kylin-query created

5 Kylin Service

Visit Web UI
- http://${HOSTNAME}:30012/kylin for Query Server

6 Clean up

$ kubectl delete -f memcached/
$ kubectl delete -f kylin-query/
$ kubectl delete -f kylin-job/

Troubleshooting

Check logs of specific pod
// Output of : sh kylin.sh start $ kubectl logs kylin-job-0 kylin -n kylin-example $ kubectl logs -f kylin-job-0 kylin -n kylin-example
Attach to a specific pod, say “kylin-job-0”.
$ kubectl exec -it kylin-job-0 -n kylin-example-- bash
Check failure reasons of specific pod
$ kubectl get pod kylin-job-0 -n kylin-example -o yaml
If you don’t have an Elasticsearch cluster or not interested in log collection, please remove filebeat container in both kylin-query-stateful.yaml and kylin-job-stateful.yaml.
If you want to check the details or want to have a discussion, please read or comment on KYLIN-4447 Kylin on kubernetes in production env .
Find the provided docker image at: DockerHub: : apachekylin/kylin-client