Kylin organizes all of its metadata (including cube descriptions and instances, projects, inverted index description and instances, jobs, tables and dictionaries) as a hierarchy file system. However, Kylin uses hbase to store it, rather than normal file system. If you check your kylin configuration file(kylin.properties) you will find such a line:
## The metadata store in hbase kylin.metadata.url=kylin_metadata@hbase
This indicates that the metadata will be saved as a htable called
kylin_metadata. You can scan the htable in hbase shell to check it out.
Kylin metastore use
resource root path + resource name + resource suffix as key (rowkey in hbase) to store metadata. You can refer to the following table to use
|Resource root path||resource name||resource suffix|
|/execute_output||/job id-step index|
Kylin store metadata in Byte format in HBase. If you want to view some metadata, you can run:
./bin/metastore.sh list /path/to/store/metadata
to list the entity stored in specified directory, and then run:
./bin/metastore.sh cat /path/to/store/entity/metadata.
to view one entity metadata.
Backup metadata with binary package
Sometimes you need to backup the Kylin’s metadata store from hbase to your disk file system.
In such cases, assuming you’re on the hadoop CLI(or sandbox) where you deployed Kylin, you can go to KYLIN_HOME and run :
to dump your metadata to your local folder a folder under KYLIN_HOME/metadata_backps, the folder is named after current time with the syntax: KYLIN_HOME/meta_backups/meta_year_month_day_hour_minute_second
In addition, you can run:
./bin/metastore.sh fetch /path/to/store/metadata
to dump metadata selectively. For example, run
./bin/metastore.sh fetch /cube_desc/ to get all cube desc metadata, or run
./bin/metastore.sh fetch /cube_desc/kylin_sales_cube.json to get single cube desc metadata.
Restore metadata with binary package
In case you find your metadata store messed up, and you want to restore to a previous backup:
Firstly, reset the metadata store (this will clean everything of the Kylin metadata store in hbase, make sure to backup):
Then upload the backup metadata to Kylin’s metadata store:
./bin/metastore.sh restore $KYLIN_HOME/meta_backups/meta_xxxx_xx_xx_xx_xx_xx
Restore metadata selectively (Recommended)
If only changes a couple of metadata files, the administrator can just pick these files to restore, without having to cover all the metadata. Compared to the full recovery, this approach is more efficient, safer, so it is recommended.
Create a new empty directory, and then create subdirectories in it according to the location of the metadata files to restore; for example, to restore a Cube instance, you should create a “cube” subdirectory：
mkdir /path/to/restore_new mkdir /path/to/restore_new/cube
Copy the metadata file to be restored to this new directory:
cp meta_backups/meta_2016_06_10_20_24_50/cube/kylin_sales_cube.json /path/to/restore_new/cube/
At this point, you can modify/fix the metadata manually.
Restore from this directory:
cd $KYLIN_HOME ./bin/metastore.sh restore /path/to/restore_new
Only the files in the folder will be uploaded to Kylin metastore. Similarly, after the recovery is finished, click Reload Metadata button on the Web UI to flush cache.
Backup/restore metadata in development env
When developing/debugging Kylin, typically you have a dev machine with an IDE, and a backend sandbox. Usually you’ll write code and run test cases at dev machine. It would be troublesome if you always have to put a binary package in the sandbox to check the metadata. There is a helper class called SandboxMetastoreCLI to help you download/upload metadata locally at your dev machine. Follow the Usage information and run it in your IDE.
Cleanup unused resources from metadata store
As time goes on, some resources like dictionary, table snapshots became useless (as the cube segment be dropped or merged), but they still take space there; You can run command to find and cleanup them from metadata store:
Firstly, run a check, this is safe as it will not change anything, you can set the number of days to keep metadata resource by adding the “–jobThreshold 30(default, you can change to any number)” option:
./bin/metastore.sh clean --jobThreshold 30
The resources that will be dropped will be listed;
Next, add the “–delete true” parameter to cleanup those resources; before this, make sure you have made a backup of the metadata store;
./bin/metastore.sh clean --delete true --jobThreshold 30