Tuesday, 21 March 2023

Type of tables SAP HANA database - Row store and column store tables

SAP HANA - Row store and column store  tables  # Basis Admin

Application developers who want to use the database can focus on the application logic and they never bothering about where the data is stored, what is done in
order to guarantee transactional consistency.  But as database administrator or Basis admins i will say we should know basic of it.

1: Tables: 

Tables are the most fundamental data structure in any relational database. tables are the central object in SAP HANA. AS SAP Basis administrator we know about below point 

Tables in general consist of at least one column, every column needs to have a distinct name in this table. 
Across tables, column names can be used multiple times with different data type definitions and meanings, 
 Although the actual table name does not have any functional relevance, choosing good names can make all the difference when it comes to understanding your
application and the data design.

though tables are made up of a set of columns, these columns could not exist in the database on their own; they can exist only in the compound of a table

Types of Tables in HANA System: 

Row and column store table: 

























1: Row Tables: 


 SAP HANA, like most other database systems, offers multiple different types of tables. The data may be stored by row or by column. Row store table record data row by row over many pages. Most RDBMS use one of these types, and row store data representation is the most common choice. HANA database is exception to this rule. It allows either a row store or a column store representation for its tables as well as for the conversion of a table from one type to the other.

There are several other types of tables available in the system for special use cases, such as temporary tables, flexible schema tables, or history tables

Limitations of row store tables : 

1: Row store tables cannot be partitioned
2: SAP HANA offers no compression for row store tables
3: Columns in row store tables cannot be accessed independently and in parallel
4: Row store tables cannot be displaced from memory 
5: If Row store table failed to load in memory then system can't start.
6: cannot be directly used as data sources

Considering all above limitation it preferred to use column store table, Hence we check parameter " default_table_type" its default set at column. Mean when table get create in HANA database it will column store 


For row store tables, two different index types are available: a classic b-tree index implementation and a cpb+-tree index  (compressed prefix) . To review runtime information of indexes on row store tables, 
can be used monitoring view M_RS_INDEXES .
Indexes on row store tables are not saved to the persistency but rebuilt when the table is loaded into memory. This happens during index server process startup

When to Use

● The application needs to process only one single record at one time (many selects and /or updates of single records).
● The application typically needs to access the complete record.
● The columns contain mainly distinct values so compression rate would be low.
● Neither aggregations nor fast searching are required.
● The table has a small number of rows (for example, configuration tables).


Column Store Tables


Column store tables are the major table data structure used in SAP HANA.  As the information is already stored in separate data structures, SAP HANA can directly access the columns requested by an SQL query without needing to read and decode other column data

SAP HANA column store tables actually can only store about two billion records per partition or table, depending on whether the table is partitioned or not. 

tables that are organized in columns are optimized for high-performing read operations while still providing good performance for write operations. 

Efficient data compression is applied to save memory and speed up searches and calculations.

Table partitioning is available for only for Column tables not row store 

Column tables have several advantages:
Higher data compression rates

Columnar data storage allows for highly efficient compression. Especially if the column is sorted, there will be ranges of the same values in contiguous memory, so compression methods such as run length encoding or cluster encoding can be used more effectively.

Higher performance for column operations


With columnar data organization, operations on single columns, such as searching or aggregations, can be implemented as loops over an array stored in contiguous memory locations. Such an operation has high spatial locality and efficiently utilizes the CPU caches.
In addition, highly efficient data compression not only saves memory but also increases speed.

Elimination of additional indexes


In many cases, columnar data storage eliminates the need for additional index structures since storing data in columns already works like having a built-in index for each column: The column-scanning speed of the in-memory column store and the compression mechanisms (especially dictionary compression) already allow read operations with very high performance. In many cases, it will not be required to have additional index structures. Eliminating indexes reduces memory size, can improve write performance, and reduces development efforts. However, this does not mean that indexes are not used at all in SAP HANA. Primary key fields always have an index and it is possible to create additional indexes, if required. In addition, full text indexes are used to support full-text search.

Elimination of materialized aggregates


Thanks to its column-scanning speed, the column store makes it possible to calculate aggregates on large amounts of data on the fly with high performance. This eliminates the need for materialized aggregates in many cases. Eliminating materialized aggregates has several advantages. It simplifies data model and aggregation logic, which makes development and maintenance more efficient; it allows for a higher level of concurrency because write operations do not require exclusive locks for updating aggregated values; and it ensures that the aggregated values are always up-to-date (materialized aggregates are sometimes updated only at scheduled times).

Parallelization


Column-based storage also simplifies parallel execution using multiple processor cores. In a column store data is already vertically partitioned. That means operations on different columns can easily be processed in parallel.

When to Use


● Calculations are typically executed on individual or a small number of columns.
● The table is searched based on the values of a few columns.
● The table has a large number of columns.
● The table has a large number of rows and columnar operations are required (aggregate, scan, and so on)
● High compression rates can be achieved because the majority of the columns contain only a few distinct values (compared to the number of rows).


3: Document Store for JSON artifacts


the Document Store for JSON artifacts is fully integrated into the SAP HANA database architecture. The document collections of the Document Store are also classified as a type of table and they can be created, updated and read by SQL. The Document Store allows native operation on JSON, for example, filtering, aggregation, and joining JSON documents with HANA column or row store tables.


4: Temporary tables are there for data that is not needed beyond the context of a session or transaction and thus does not need to be written to the persistence
layer. 

3: History tables allow running transactions against the state of the table at a time in the past. 

4: the DUMMY table is an auxiliary table in the database for the convenience of database developers. 







Thanks 
Rupesh Chavan