Data Science Glossary E

Extract Transform Load

This is generic term used to describe the general process of extracting data sets from existing data sources, optionally performing some transformations on them and usually loading them into a an enterprise data warehouse (EDW)

Transformations can include operations such as cleaning, (re-)categorising, imputing missing values, aggregating, denormalising,  inserting indicator flags, grouping (e.g assigning a time dimension), joining with other data sets etc.

In smaller companies this is often done in a bespoke manner by data engineers but there are many enterprise proprietary tools in use such as Informatica, SQL Server Integration Services, SAP Data Services, Talend etc.

There are also free/open source tools such as Apache Camel and Pentaho that may used.

In the cloud, Google has Google Data Flow, Microsoft has Azure Data Factory and AWS has Amazon Glue.