Taking a Closer Look at ClickHouse® Live View Tables
by Vitaliy Zakaznikov
In a couple of previous blog posts, I have introduced Live Views tables and covered basic usage. Now, in this post, we will take a closer look at Live View tables. Specifically, we will look at the options available for the WATCH
query, then introduce temporary Live Views, as well as look at the new JSONEachRowWithProgress
format.
Setting Things Up
As before we will be using the 19.16.3 release and will use two instances of clickhouse-client
. Don't forget that we will need to enable theallow_experimental_live_view
setting in both sessions.
ClickHouse client version 19.16.3.6 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.16.3 revision 54427.:) SET allow_experimental_live_view = 1
To keep the setup to the minimum, we will again use a very simple source table
CREATE TABLE myints (a Int32) Engine=Memory
and pre-populate it with some initial data
INSERT INTO myints VALUES (1),(2),(3)
and use it in a simple Live View table.
CREATE LIVE VIEW lv AS SELECT sum(a) FROM myints
WATCH
Query Options
The WATCH
query enables us to watch a live stream of query results. Without any options, the query returns the current query result and infinitely waits for changes to the query result.
:) WATCH lvWATCH lv┌─sum(a)─┬─_version─┐
│ 6 │ 1 │
└────────┴──────────┘
↙ Progress: 1.00 rows, 16.00 B (8.93 rows/s., 142.95 B/s.)
To abort the WATCH
query you can use Ctrl-C
.
WATCH
QUERY LIMIT
However, an infinite stream of query result updates in not always necessary. Sometimes you just want to get a fixed number of changes, one of the most common cases being just the need to get the next update. To control the number of query result updates that you would like to receive you can use the LIMIT
option where the value of the LIMIT
can be 0
to N
. The value of 0
indicates that the WATCH
query should not wait for any new query results and therefore will return immediately once query is evaluated. One way to think of LIMIT 0
is as if you are setting the non-blocking flag.
Let’s give it a try and see that the WATCH
query with the LIMIT 0
returns immediately.
:) WATCH lv LIMIT 0WATCH lv
LIMIT 0┌─sum(a)─┬─_version─┐
│ 6 │ 1 │
└────────┴──────────┘1 rows in set. Elapsed: 0.002 sec. user-node :)
However, using the WATCH
query with the LIMIT
set to 0
is not very useful as you might as well use the SELECT
instead.
The most common value of LIMIT
is just 1
. In many use cases you do not want to receive an infinite number of query result changes but only the next one. With the LIMIT
set to 1
you will receive the current query result and will be notified only once when the query result changes.
Again, let’s give a try. Execute WATCH
query in one client and insert one more row into our source table for the WATCH
query to complete.
Once you have the WATCH
query running you can insert more data as follows.
INSERT INTO myints VALUES (4)
The client that has the WATCH
query running should show the following.
:) WATCH lv LIMIT 1WATCH lv
LIMIT 1┌─sum(a)─┬─_version─┐
│ 6 │ 1 │
└────────┴──────────┘
┌─sum(a)─┬─_version─┐
│ 10 │ 2 │
└────────┴──────────┘2 rows in set. Elapsed: 20.346 sec.
As you can see, the WATCH
query only provided one update as expected. You might ask why is LIMIT 1
very common? The answer is simple: it allows you to get notified the next time the query result has changed. One such use case is watching a table that stores log messages. In this case, we can use count()
for our Live View. A separate SELECT
query can be issued once we detect that the count()
has changed, indicating that there are new log messages available to read from our source table.
WATCH
QUERY EVENTS
A short form of the WATCH
query can be constructed if you specify EVENTS
after the table name. In this case, instead of the query result, you will just get the latest query result version.
Note that the stored query will still be evaluated every time source table data changes.
Let’s try it and confirm that with the EVENTS
we only get the current query result version without the query result itself.
:) WATCH lv EVENTS LIMIT 0WATCH lv EVENTS
LIMIT 0┌─version─┐
│ 2 │
└─────────┘
The EVENTS
is useful when you only want to get short notifications that you can use as triggers about changes to the query result without looking at the actual result.
Temporary Live Views
In some use cases, for example in a dynamic web application, Live Views can be created dynamically to display specific live data to the user. However, it is common in such cases that the application can exit unexpectedly without cleaning up its Live Views. For such cases, a TEMPORARY
Live View can be used.
A TEMPORARY
Live View can be created using the CREATE TEMPORARY LIVE VIEW
statement. This statement will create a Live View which will be automatically deleted when no users are watching the table after the timeout specified by the temporary_live_view_timeout
setting. By default the value of temporary_live_view_timeout
setting is set to five seconds.
It is important to note that compared to a regular temporary table, a TEMPORARY
Live View table lives outside the scope of a single session. This is done because sometimes clients that create Live Views can disappear and reappear due to various factors such as unstable network connection but as long as the client comes back within the timeout period the Live View table does not have to be recreated.
Because a TEMPORARY
Live View table lives outside the scope of a single session, it means that multiple clients can use the same temporary Live View. The Live View table will automatically be deleted after the timeout once all clients stop watching the table.
Let’s see TEMPORARY
Live View in action by creating one
CREATE TEMPORARY LIVE VIEW temp_lv AS SELECT sum(a) FROM myints
and immediately running WATCH temp_lv
.
:) WATCH temp_lvWATCH temp_lv┌─sum(a)─┬─_version─┐
│ 10 │ 1 │
└────────┴──────────┘
Note that the table will be deleted in five seconds so you need to be quick.
Now, abort the WATCH
query using Ctrl-C
and try to re-execute WATCH
query after five seconds. You should see the following error.
:) WATCH temp_lvWATCH temp_lvReceived exception from server (version 19.16.3):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.temp_lv doesn't exist..
As you can see, the Live View table has been deleted because the timeout after the last active user has elapsed. The fact that temporary Live Views are automatically deleted when no more clients are watching it helps with reducing resource consumption on the server. Remember, Live Views store their partial results in memory to always have the latest result ready for any client who might ask for it but it does come at a cost. First, similarly to the Materialized Views, Live View’s stored query is executed on each insert block and therefore inserts into the source table are blocked until processing completes. Second, mergeable blocks for the query result are kept in memory and therefore each active Live View will consume memory. The amount of memory depends on the query result. Therefore, keeping the number of Live Views to a minimum will help speed up inserts and will reduce memory consumption on the server.
NAMING TEMPORARY LIVE VIEWS
In a use case where temporary Live Views are created dynamically and are shared between different clients the question of naming temporary Live Views can arise. The recommended solution to this problem is to use the SHA1
of the query that is used to create the Live View table as a part of the table name.
For example, a temporary Live View for the SELECT 1
query can be created using the lv_<SHA1>
convention with the addition of the IF NOT EXISTS
clause.
CREATE TEMPORARY LIVE VIEW IF NOT EXISTS lv_42364a017b73ef516a0eca9827e6fa00623257ee AS SELECT 1
Using this convention will allow other clients to easily reuse this Live View or create a new one if it does not already exist.
Using Live Views Over HTTP Interface
The most common way that clients connect to Live View tables is via an HTTP interface. The HTTP interface provides ease of use and portability across many systems and programming languages and it is especially useful for web applications.
Let’s see how we can use an HTTP interface to watch our previously created lv
table using the curl
utility. Note, that for the WATCH
query to work over an HTTP interface we must not forget to set the allow_experimental_live_view
setting for the connection.
echo 'WATCH lv LIMIT 0' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
10 2
Now, let’s re-run the same query but with the LIMIT
set to 1
.
echo 'WATCH lv LIMIT 1' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
10 2
As you can see the connection remains open and curl
does not return. Instead, the connection will block until the query result changes. So let's insert another row into our source table so that a new query result is pushed to our HTTP client.
INSERT INTO myints VALUES (5)
You should observe that your curl
command exits and the result should be as below.
$ echo 'WATCH lv LIMIT 1' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
10 2
15 3
$
Wait, did you see that? ClickHouse has just pushed a notification to the outside world! No longer do you have to re-send the same query over and over again to your database server and bother it with unnecessary requests. With Live View tables, ClickHouse can now notify you that query result has changed and provide you with the new result at the same time.
Keeping Long-Lived Connections Alive
Because watching Live View tables requires a long-lived connection, Live View tables support heartbeats to keep the connection alive. The interval between heartbeats is set using the live_view_heartbeat_interval
setting. By default, the live_view_heartbeat_interval
is set to fifteen seconds. Every heartbeat interval the WATCH
query will provide a progress notification that you can observe when using clickhouse-client
. Let's start a WATCH
query again in the clickhouse-client
and this time we will pay attention to the progress notifications that are displayed below the query result.
:) WATCH lvWATCH lv┌─sum(a)─┬─_version─┐
│ 15 │ 3 │
└────────┴──────────┘
↗ Progress: 1.00 rows, 16.00 B (9.70 rows/s., 155.27 B/s.)
Do you see that arrow ↗
before the Progress
message? Pay attention and you should see that the arrow keeps turning once in a while. How often does it turn? You guessed it, every live_view_heartbeat_interval
seconds or whenever a new query result is returned. These periodic progress events are what the Live View table uses to keep the long-lived connection alive.
New JSONEachRowWithProgress
Format
When Live View tables were added to ClickHouse a new output format was added as well. The new format is JSONEachRowWithProgress
and is used to facilitate using Live View tables over the HTTP interface using the HTTP long polling technique.
Let’s again use curl
command to watch our Live View. This time lets use the JSONEachRowWithProgress
format.
First, execute the WATCH
query with the LIMIT 0
.
$ echo 'WATCH lv LIMIT 0 FORMAT JSONEachRowWithProgress' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
{"row":{"sum(a)":"15","_version":"3"}}
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
$
Now re-execute the same query but using the old JSONEachRow
format so that we can compare them.
$ echo 'WATCH lv LIMIT 0 FORMAT JSONEachRow' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
{"sum(a)":"15","_version":"3"}
$
If you compare the two outputs there should be no surprises. The JSONEachRowWithProgress
differ from the old JSONEachRow
by having extra progress
messages added to the output. But why do we need this new output format? The answer is simple, to keep the long-lived HTTP connections alive.
Let’s see how it works when our WATCH
query is blocked waiting for new query results. Now we will use the LIMIT 1
to block and wait for the new result.
We should see the following
$ echo 'WATCH lv LIMIT 1 FORMAT JSONEachRowWithProgress' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
{"row":{"sum(a)":"15","_version":"3"}}
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
and then every live_view_heartbeat_interval
seconds we should see progress
messages added to the output.
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
...
These progress messages are what keeps the long-lived HTTP connection alive until the query result changes. Without them, the HTTP connection would just timeout.
Conclusion
We have taken a closer look at Live View tables. We looked at the WATCH
query options, introduced temporary Live Views, and looked at the new JSONEachRowWithProgress
format. All these pieces should enable you to start putting Live View tables into practice. As always until the next time, happy viewing using Live View tables!
Originally published at https://www.altinity.com on December 5, 2019.