A Magical Mystery Tour of the LowCardinality Data Type

The Tour Begins

:) select OriginCityName, count() from ontime group by OriginCityName order by count() desc limit 10; ┌─OriginCityName────────┬──count()─┐ 
│ Chicago, IL │ 10872578 │
│ Atlanta, GA │ 9279569 │
│ Dallas/Fort Worth, TX │ 7760200 │
│ Houston, TX │ 5898651 │
│ Los Angeles, CA │ 5804789 │
│ New York, NY │ 5283856 │
│ Denver, CO │ 5199842 │
│ Phoenix, AZ │ 4891313 │
│ Washington, DC │ 4252095 │
│ San Francisco, CA │ 4027466 │
└───────────────────────┴──────────┘
10 rows in set. Elapsed: 2.089 sec. Processed 172.34 million rows, 3.82 GB (82.51 million rows/s., 1.83 GB/s.)`
:) select OriginCityName, count(), uniq(FlightNum), sum(Distance) from ontime where OriginCityName = 'Chicago, IL' 
group by OriginCityName order by count() desc;
┌─OriginCityName─┬──count()─┬─uniq(FlightNum)─┬─sum(Distance)─┐
│ Chicago, IL │ 10872578 │ 7765 │ 8143093140 │ └────────────────┴──────────┴─────────────────┴───────────────┘
1 rows in set. Elapsed: 2.197 sec. Processed 172.34 million rows, 4.00 GB (78.45 million rows/s., 1.82 GB/s.)
:) SELECT column, any(type),
sum(column_data_compressed_bytes) compressed,
sum(column_data_uncompressed_bytes) uncompressed,
sum(rows)
FROM system.parts_columns
WHERE (table = 'ontime') AND active AND (column LIKE '%CityName')
GROUP BY column
ORDER BY column ASC
:) ALTER TABLE ontime MODIFY COLUMN OriginCityName LowCardinality(String); 0 rows in set. Elapsed: 19.258 sec.
10 rows in set. Elapsed: 0.595 sec. Processed 172.34 million rows, 281.33 MB (289.75 million rows/s., 472.99 MB/s.)
1 rows in set. Elapsed: 1.475 sec. Processed 172.34 million rows, 460.89 MB (116.87 million rows/s., 312.54 MB/s.)
1 rows in set. Elapsed: 1.064 sec. Processed 172.34 million rows, 549.77 MB (161.98 million rows/s., 516.74 MB/s.)

Under The Hood

LowCardinality vs Enum

Conclusion

--

--

--

Altinity is the leading enterprise provider for ClickHouse — a fast open-source column-store analytic database. Now on Cloud: altinity.com/cloud-database

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

User Experience for Data Science: User Interview

Step by Step to Visualize Music Genres with Spotify API

Python codes for types of Classification Algorithms

Traffic Logics of the BIG 4 live broadcast platforms in China: Douyin, Kuaishou, Taobao and Tencent

The Genetic Differences Between European Fire Ants & BC Fire Ants

Determining a Quote’s Source Using Scikit-Learn

Facebook’s Prophet Model for Time Series Forecasting

4 Data Pipeline Practices You (Probably) Didn’t Know About

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AltinityDB

AltinityDB

Altinity is the leading enterprise provider for ClickHouse — a fast open-source column-store analytic database. Now on Cloud: altinity.com/cloud-database

More from Medium

Using Kubernetes to orchestrate dbt on Snowflake with Github actions

The whys of documentation

Complete Czech Republic COVID data in your pocket!

Automated Data Pipeline Testing using Great Expectations