Tuesday, 24 September 2024

How Data Compression in the Column Store in HANA database works and its details ? #SAP HANA

​Data compression in the SAP HANA database's column store significantly improves storage efficiency and performance.​ The method employs various compression techniques that allow for substantial reductions in data size, thus optimizing memory usage and processing times.

Using SAP column store tables, data compression can achieve ratios of up to 11 times, presenting a viable cost-saving solution for data storage in the HANA database. Such high compression ratios are critical in enhancing the performance of data-intensive applications.

The column store enables efficient compression of data, which reduces the costs associated with keeping data in main memory. This efficiency is essential for managing large datasets typical of enterprise applications.

Columnar data storage facilitates highly efficient compression primarily because most columns contain only a few distinct values compared to rows. This structure allows for targeted data access, improving overall read efficiency and query performance, particularly with extensive data sets3.

Administrators can monitor compression performance and trigger optimizations as needed. By understanding the compression ratios and structures in place, they can ensure that the database operates at peak efficiency, maintaining optimal space usage and improving access speeds.

Data in column tables can have a two-fold compression:

  • Dictionary compression

This default method of compression is applied to all columns. It involves the mapping of distinct column values to consecutive numbers, so that instead of the actual value being stored, the typically much smaller consecutive number is stored.

  • Advanced compression

Each column can be further compressed using different compression methods, namely prefix encoding, run length encoding (RLE), cluster encoding, sparse encoding, and indirect encoding. The SAP HANA database uses compression algorithms to determine which type of compression is most appropriate for a column. Columns with the PAGE LOADABLE attribute are compressed with the NBit algorithm only.

Compression is automatically calculated and optimized as part of the delta merge operation. If you create an empty column table, no compression is applied initially as the database cannot know which method is most appropriate. As you start to insert data into the table and the delta merge operation starts being executed at regular intervals, data compression is automatically (re)evaluated and optimized.

Automatic compression optimization is ensured by the parameter active in the optimize_compression section of the indexserver.ini configuration file. This parameter must have the value yes.

 

Cost Functions for Optimize Compression

The cost functions for optimize compression are in the optimize_compression section of the service configuration (e.g. indexserver.ini)

  • auto_decision_func - if triggered by MergeDog
  • smart_decision_func - if triggered by SmartMerge

 

Note: 

  1. Advanced compression is applied only to the main storage of column tables. As the delta storage is optimized for write operations, it has only dictionary compression applied.
  2. If the standard method for initiating a delta merge of the table is disabled (AUTO_MERGE_ON column in the system view TABLES is set to FALSE), automatic compression optimization is implicitly disabled as well. This is the case even if the AUTO_OPTIMIZE_COMPRESSION_ON column is set to TRUE in the system view TABLES. It is necessary to disable auto merge if the delta merge operation of the table is being controlled by a smart merge triggered by the application. For more information, see the section on merge motivations.