How to calculate the ratio of empty/zero values in every column in a table
How to calculate the ratio of empty/zero values in every column in a table
If a column is sparse (empty or contains mostly zeros), ClickHouse can encode it in a sparse format and automatically optimize calculations - the data does not require full decompression during queries. In fact, if you know how sparse a column is, you can define its ratio using the ratio_of_defaults_for_sparse_serialization setting to optimize serialization.
This handy query can take a while, but it analyzes every row in your table and determines the ratio of values that are zero (or the default) in every column in the specified table:
For example, we ran this query above on the environmental sensors dataset table named sensors which has over 20B rows and 19 columns:
Here is response:
From the results above:
- the
sensor_idcolumns is not sparse at all. In fact, every row has a non-zero value - the
sensor_typeis only sparse about 15.9% of the time - the
P0column is very sparse: 99.9% of the values are zero - the
pressurecolumn is quite sparse at 83% - and
temperaturecolumn has 53.2% of its values missing or zero
Like we said, it's a handy query for computing how sparse your columns are in a ClickHouse table!