Putting Things Where They Belong Using New TTL Moves — Altinity
By Vitaliy Zakaznikov
Multi-volume storage is crucial in many use cases. It helps to reduce storage costs as well as improves query performance by allowing placement of the most critical application data on the fastest storage devices. Monitoring data is a classic use case. The value of data degrades rapidly over time. The last day, last week, last month, and previous year data have very different access patterns, which in turn correspond to various storage needs.
Suitable placement of data based on its age is therefore of great importance. ClickHouse TTL moves now provide a mechanism to achieve this. The best part about ClickHouse time-based TTL moves is that they are intuitive and correspond directly to human notions of calendar time. TTL moves make it much easier to set up multi-volume storage that corresponds to business requirements.
In the previous blog articles, we introduced multi-volume storage capability in ClickHouse. The multi-volume feature significantly increases ClickHouse server capacity. It enables tiered storage by providing storage policies to arrange disks into volumes and set up relationships between them. However, storage policies by themselves do not provide much flexibility in controlling where ClickHouse places the data. Either the user has to do it manually using
ALTER TABLE [db.]table MOVE PART|PARTITION commands or rely on the
move factor parameter to enable simple distribution of data between volumes based on the used space ratio.
In this article, we will look at the new table TTL moves that allow a user to define expressions that can move data automatically to a specific disk or volume that the user has specified inside a storage configuration. The new TTL moves greatly enhance multi-volume storage capability and provide the fine-grain control that is needed to put multi-volume storage capability into full use.
The MergeTree is currently the only family of engines that support TTL expressions. ClickHouse first added support for TTL expressions to enable automatic delete mutations. A TTL expression is simply an SQL expression that must evaluate to Date or DateTime data type. The expression could use explicit time intervals using
INTERVAL keyword or use toInterval conversion functions. For example,
TTL date_time + INTERVAL 1 MONTH
TTL date_time + INTERVAL 15 HOUR
or using toInterval conversion functions we could have the following expressions
TTL date_time + toIntervalMonth(ttl)
TTL date_time + toIntervalHour(ttl)
TTL date_time + INTERVAL ttl MONTH
TTL date_time + INTERVAL ttl HOUR
We can assign these expressions to a table and, when assigned, are known as table TTL expressions. A table can have only one expression for delete, and multiple expressions for the automatic move of parts to disks or volumes. For example, assuming we have a storage policy that defines a volume
slow_volume and a disk
slow_disk table TTL expressions could look like the following
TTL date_time + INTERVAL 1 MONTH DELETE,
date_time + INTERVAL 1 WEEK TO VOLUME 'slow_volume',
date_time + INTERVAL 2 WEEK TO DISK 'slow_disk';
A word of caution. Because ClickHouse first added support for delete TTL expressions if
TO VOLUMEclauses are not specified, then the
DELETEclause is assumed. Therefore, we recommend that you always be explicit and use the
DELETEclause to identify which TTL expression you want to use for deletions.
Setting Up the Demo
If you haven’t looked at multi-volume storage yet or played around with TTL delete or move expressions we recommend you use the latest ClickHouse version 220.127.116.11. We will use it for the rest of this article.
ClickHouse client version 18.104.22.168 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.3.2 revision 54433.
For demo purposes, we will use the OnTime database (USA civil flight data from 1987 till 2018). As a shortcut, we will download and use pre-made partitions.
$ curl -O https://clickhouse-datasets.s3.yandex.net/ontime/partitions/ontime.tar
$ tar xvf ontime.tar -C /var/lib/clickhouse # path to ClickHouse data directory
$ # check permissions of unpacked data, fix if required
$ sudo service clickhouse-server restart
$ clickhouse-client --query "select count(*) from datasets.ontime"
Of course, without using multi-volume storage, the TTL move expressions do not make sense. Therefore, we will simulate multiple storage devices by creating different folders that will represent mounted storage devices with distinct speed and capacity.
Adding Storage Configuration Without Server Restart
If we check right now, our server uses only the default disk. We can find this out by looking into the
:) select * from system.disks
We need more storage and can add new disks without restarting the server. We've recently added this feature to ClickHouse. We can eagerly give it a try. The storage configuration will be defined by placing
storage.xml into our
If we reload the configuration using the
SYSTEM RELOAD CONFIG command, then we should see new disks in the
:) SYSTEM RELOAD CONFIG
:) select * from system.disks
We can retrieve the storage policy by peaking into the
:) select * from system.storage_policies
Creating a Table That Uses TTL Moves
Let’s now look at how TTL moves can be defined when creating a new table. We will use flight data until 2010. The data for the last three years we’ll keep on the
fast volume, between 3-5 years on the
med, between 5-7 years should go to the
slow storage, and anything older we want to delete. We can achieve such a scheme with the following table definition.
CREATE TABLE ontime_chunk
ENGINE = MergeTree()
PARTITION BY Year
ORDER BY FlightDate
TTL FlightDate TO VOLUME 'fast',
FlightDate + INTERVAL 3 YEAR TO VOLUME 'med',
FlightDate + INTERVAL 5 YEAR TO VOLUME 'slow',
FlightDate + INTERVAL 7 YEAR DELETE
WHERE Year >= 2010
← Progress: 55.09 million rows, 40.13 GB (472.19 thousand rows/s., 344.00 MB/s.) ██████████████████████████████████████████████████████▌ 96%Ok.
0 rows in set. Elapsed: 116.659 sec. Processed 55.09 million rows, 40.13 GB (472.19 thousand rows/s., 344.00 MB/s.)
Note that old functional style syntax for MergeTree tables is not supported with TTL moves.
Once the table is created and populated, we can look at where table parts are stored. You can do it by looking in the
system.parts table. The query below can give us some basic statistics.
WHERE active AND (table = 'ontime_chunk')
ORDER BY partition DESC
As shown above, ClickHouse has applied TTL moves on the insert, and parts are almost where we expect them to be. Why mostly? Because TTL delete expressions are a different beast. ClickHouse does not evaluate these during insert, and thus we still see data on the
slow disk for years from 2013 to 2010 that we want to delete.
Digression About TTL Deletes
ClickHouse does not handle TTL moves and deletes in the same manner. I have on purpose included a TTL delete expression in my example above to make this point. The reason being is that TTL deletes cause an expensive mutation operation. Therefore, they may be more expensive compared to TTL moves that just copy parts between disks. So keep this in mind when working with TTLs.
But given that most users will use TTL deletes and moves together, it is essential to point out that ClickHouse controls the frequency of TTL deletes by the ‘merge_with_ttl_timeout’ MergeTree table setting. By default, it is set to 24 hours and defines the minimal time in seconds, when merge with TTL can be repeated. This setting actually means that TTL delete will be performed once every 24 hours on only one partition or if a background merge occurs. So in the worst case, at maximum right now ClickHouse will delete one partition every 24 hours that matches the TTL delete expression.
This behavior might not be ideal, so in case you want a TTL delete expression to perform deletes faster you can modify the merge_with_ttl_timeout setting for the table. For example, we can set it to one hour as follows.
ALTER TABLE [db.]table MODIFY SETTING merge_with_ttl_timeout = 3600
Now you should see that ClickHouse is deleting parts according to your TTL delete expression every one hour. Of course, when your table is not too big, you could force it using the
OPTIMIZE TABLE [db.]table FINAL statement. However, for large tables, it is not recommended.
Adding TTL Moves To An Existing Table
We’ve seen how one can create a table with TTL moves defined up-front. However, if you already have a table or you would like to change existing TTL move expressions, you have to resort to using the
ALTER TABLE [db.]table MODIFY TTL command.
Note that when modifying TTL expressions, you have to re-list all TTLs. All the move expressions and the delete expression if present.
Let’s re-use the table above and change TTL expressions. We now want everything except for the data for the last three years to be placed on the
slow volume or deleted at some point in time if it is older than seven years.
ALTER TABLE ontime_chunk
MODIFY TTL FlightDate TO VOLUME 'fast', FlightDate + toIntervalYear(3) TO VOLUME 'slow', FlightDate + toIntervalYear(7)
0 rows in set. Elapsed: 0.037 sec.
That was fast! But wait, did it move anything? No, it didn't. The new TTL expressions will be assigned only to the new parts, either when an insert comes in or because a new part is created as a result of a background merge operation. For the existing parts, the new TTL can be applied by materializing TTLs using the
ALTER TABLE [db.]table MATERIALIZE TTL statement. If we execute it on our table, the command will return quickly.
ALTER TABLE ontime_chunk
0 rows in set. Elapsed: 0.044 sec.
What it does, it just rewrites
ttl.txt files that you can find inside the part's folder. For example, we can take a look at what a random part has.
$ sudo cat /data/fast/data/default/ontime_chunk/2017_113_113_0/ttl.txt
ttl format version: 1
After you execute the
MATERIALIZE TTL command, ClickHouse will start moving the parts to their new location in the next background cycle. For our example it did not take long. Looking again into the
system.parts table I find that parts got moved to their new places and some got deleted because of a background merge.
WHERE active AND (table = 'ontime_chunk')
ORDER BY partition DESC
Some How’s and When’s
After looking at how you can use TTL move expressions, let’s look at how and when ClickHouse evaluates TTL move expressions.
How Are TTL Moves Evaluated?
ClickHouse evaluates TTL move expressions by using a dedicated pool of background threads. The behavior of the pool is controlled by the following parameters that can be defined either in the
config.xml or in a separate configuration file inside the
Here is the list of parameters and their current default values:
- background_move_pool_size: 8
- background_move_processing_pool_thread_sleep_seconds: 10
- background_move_processing_pool_thread_sleep_seconds_random_part: 1.0
- background_move_processing_pool_thread_sleep_seconds_if_nothing_to_do: 0.1
- background_move_processing_pool_task_sleep_seconds_when_no_work_min: 10
- background_move_processing_pool_task_sleep_seconds_when_no_work_max: 600
- background_move_processing_pool_task_sleep_seconds_when_no_work_multiplier: 1.1
- background_move_processing_pool_task_sleep_seconds_when_no_work_random_part: 1.0
For testing purposes, we use the following configuration file to enable instant moves.
This simple configuration highlights the two most important parameters which you might want to adjust if needed. These are the ‘background_move_processing_pool_task_sleep_seconds_when_no_work_max’ and the ‘background_move_processing_pool_thread_sleep_seconds’. Where the ‘background_move_processing_pool_task_sleep_seconds_when_no_work_max’ defines the maximum time the pool can sleep when there is no work (moves) to do. By default, ClickHouse sets it to 600 sec. It means that after a TTL move expression is triggered, the actual move can start to take place maximum within 10 min. The time it takes for the move to complete depends on the number of parts that ClickHouse needs to move and the disk I/O performance. The ‘background_move_processing_pool_thread_sleep_seconds’ parameter defines number of seconds the worker thread sleeps before picking up another task.
Based on these parameters, when the background move process pool wakes up, it scans TTL expressions for all the parts and determines if anything needs moving.
Note that when moving parts to a disk or a volume, the background move processing pool checks the constraints defined by the storage policy. If ClickHouse can’t move some part according to the TTL move expression, then the move will be attempted at a later time.
When Are TTL Moves Evaluated?
ClickHouse evaluates TTL move expressions in the following cases:
- when you insert a part
- when a background move pool processing task wakes up
- after ClickHouse creates a new part as a result of a background merge process
- when replica downloads a new part
Some Known Corner Cases
Nothing is perfect, so here is a list of some known corner cases related to TTL moves that you should keep in mind.
- No SQL statement to force TTL move without parts merge.
- Differences in behavior between TTL move and delete.
- Multi-threaded movements of a large number of parts inside the same physical disks can hurt performance due to the I/O bottleneck.
In this article, we have looked at a new TTL move feature and how it extends the usage of new storage policies. With TTL moves, ClickHouse has a powerful tool to manage how data is stored using multi-volume storage. There are still some corner cases that we are working to iron out. However, it should not keep you away from trying to see how storage policies and TTL moves can save you money by significantly reducing your storage costs. New TTL moves will help you put things where they belong.
Originally published at https://www.altinity.com on March 23, 2020.