What is Data Provisioning?
SAP HANA, data provisioning also refers to data exposure, where SAP HANA can consume data from external sources but not necessarily retain the data in SAP HANA. data provisioning with SAP HANA can consume data. Whether it will be retained by SAP HANA is dependant on the data provisioning tool you choose. Today’s business applications are powered by a rich variety of data types (transactional, spatial, text, graphics, and so on) and at various rates of consumption, from continuous, real-time sensor data to periodic batch loads of bulk data. SAP HANA can consume all types of data.
You cannot avoid referring to the acronym ETL when describing SAP HANA data provisioning. Various SAP HANA data provisioning tools can provide ETL capabilities to various extents. ETL stands for Extract, Transform, Load.
Extract
The first part of an ETL process involves extracting the data from the source systems. In many cases, this is the most challenging aspect of ETL, because extracting data correctly sets the stage for all subsequent processes. A good data provisioning tool should be able to extract data from any data sources, of any data type, at any time, and with good performance.
Transform
The transform stage applies a series of rules or functions to the data extracted from the source system to derive the data for loading into the target system. Depending on the project requirements, some data sources require very little data manipulation, if any, and some sources require significant cleaning and enhancement. Often data comes from many sources, so harmonization is also required to make sure that the loaded data appears as one in the target system.
Load
The load phase loads the data into the target system. The target system should be able to handle delta loads, where only the changes since the last time are loaded.
Methods of Data Provisioning for SAP HANA
At present, there are many different methods of data provisioning for SAP HANA.
These methods are as follows:
SLT — SAP LT Replication Server for SAP HANA
SLT works with both SAP and non-SAP source systems and supports the same databases that SAP supports for the SAP Business Suite (as they use the same database libraries).
These include SAP HANA, SAP ASE, SAP MaxDB, Microsoft SQL Server, IBM DB/2 (on all platforms), Oracle, and even the old Informix. The method that SLT uses for real-time replication is trigger-based. It creates triggers in the source systems on the tables it replicates. This could be a problem for some database administrators, for example, banks do not like triggers on their critical legacy production banking systems. In this case, you should instead look at the SAP Replication Server, which only reads the various log files, and has even less impact on the source systems.
The big advantage of SLT is that it can read and use pool and cluster tables from older SAP systems. In the past, due to database limitations, SAP had to use pool and cluster tables. These were tables within tables. As the ABAP data dictionary is platform and database independent, it is inherently different from the database data dictionary. By using this fact, SAP could create a single table (pool or cluster table) in the database, which would then unpack into many different and separate tables in the ABAP data dictionary. In ABAP, you would have 100 tables, but only one table in the database. If we read the database log file, like SAP Replication Server does, we might never find the ABAP table as the SAP system will “hide" it away inside the pool or cluster table.
As SLT uses an ABAP stack, it uses the ABAP data dictionary. As a result, it can read the contents of the pool and cluster tables. You can filter or perform a simple transformation of data as you load the data into SAP HANA.
SAP Data Services
If you have bad quality data going into your reporting system, you can expect bad quality data in your reports.
SAP Data Services is a good ETL tool. The “E" phase involves extracting data from the source systems. Data Services can read many data sources, even the obscure ones like old COBOL copybooks. It can also read SAP systems, even via extractors. It cannot, however, use all the SAP extractors. For example, if you need extractors that require activation afterward, then DXC would be a better tool to use.
In the “T" phase of ETL, Data Services can clean your data. There is no better SAP tool for doing this than Data Services. This eliminates the bad quality portion of your reporting results.
Finally, in the “L” phase of the ETL, Data Services load the data into SAP HANA.
SDA - Smart Data Access
A few years ago people wanted to put all of their data into a single data warehouse-type environment to analyze it there. This is sometimes called a data lake and is a physical data warehouse.
There are a few problems with this:
• You have to duplicate all your data, so now your data is double the size.
• You have to copy all that data via the network, which clogs up and slows your network.
• You need more storage.
• You doubled your cost and effort.
• The data in the new data warehouse is never up to date, and as such the data is never consistent.
■ The single data warehouse system itself cannot handle all the different types of data requirements, such as unstructured data, graph-engine data, key-value pairs, spatial data, a huge variety of data, and so on.
SRS - SAP Replication Server --
The big focus of SAP Replication Server is on low-impact, real-time replication. SAP Replication Server works with both SAP and non-SAP source systems and supports the many databases. These include SAP ASE, Microsoft SQL Server, IBM DB/2, and Oracle. The method that SRS uses for real-time replication is log-based. It reads the log files of the source systems and has very little impact on these source systems
SDI - Smart Data Integration--
The same advantages and disadvantages of SRS apply to SDI. This is because SDI is similar to SAP Replication Server, implemented inside SAP HANA when using the log file adapters. You buy this as an additional feature for SAP HANA, and as such it has separate pricing.
SDQ - Smart Data Quality -- The same advantages and disadvantages of SAP Data Services apply to SDQ. This is because SDQ is essentially Data Services functionality implemented inside SAP HANA.
SDS - Smart Data Streaming -- SDS is based on the technology of Sybase ESP (Event Stream Processor).
DXC - Direct Extractor Connection --
The big drive for DXC is when you want to use complex extractors from SAP systems to load data into SAP HANA. Especially when the extractors require activation, Data Services might not be able to use them. The main business case for this is when you would like to get some data from an SAP Business Suite system, such as financial data.
Flat files and Excel worksheets -- Flat files or Excel worksheets are the simplest way to provide data to SAP HANA.