[go: up one dir, main page]

Gesellschaft für Informatik e.V.

Lecture Notes in Informatics


Datenbanksysteme in Business, Technologie und Web (BTW) P-144, 327-346 (2009).

Gesellschaft für Informatik, Bonn
2009


Copyright © Gesellschaft für Informatik, Bonn

Contents

Formalizing ETL jobs for incremental loading of data warehouses

Th. Jörg and St. Dessloch

Abstract


Extract-transform-load (ETL) tools are primarily designed for data warehouse loading, i.e. to perform physical data integration. When the operational data sources happen to change, the data warehouse gets stale. To ensure data timeliness, the data warehouse is refreshed on a periodical basis. The naive approach of simply reloading the data warehouse is obviously inefficient. Typically, only a small fraction of source data is changed during loading cycles. It is therefore desirable to capture these changes at the operational data sources and refresh the data warehouse incrementally. This approach is known as incremental loading. Dedicated ETL jobs are required to perform incremental loading. We are not aware of any ETL tool that helps to automate this task. In fact, incremental load jobs are handcrafted by ETL programmers so far. The development is thus costly and error-prone. In this paper we present an approach to the automated derivation of incremental load jobs based on equational reasoning. We review existing Change Data Capture techniques and discuss limitations of different approaches. We further review existing loading facilities for data warehouse refreshment. We then provide transformation rules for the derivation of incremental load jobs. We stress that the derived jobs rely on existing Change Data Capture techniques, existing loading facilities, and existing ETL execution platforms.


Full Text: PDF

Gesellschaft für Informatik, Bonn
ISBN 978-3-88579-238-3


Last changed 04.10.2013 18:20:37