+49 89 45 23 89 66
contact@30digits.com
call us:
eMail us:
Contact us
First Name:*
Last Name:*
eMail:*
Tel.:*
Comment:

Today’s Information Landscape

IT Infrastructure and the accompanying data repositories have been growing and becoming an increasingly vital part of our everyday business lives since the 80s. That means we have potentially 30 years worth of data (which keeps increasing) and many legacy systems. Add to this any growth from mergers or acquisitions and the information landscape starts to get quite cluttered. With global data infrastructure and now cloud based applications and repositories, one can get quite overwhelmed.

What is ETL? / Extract Transform Load Defined

This is where ETL starts to show its value. It allows one to move, consolidate, and even reduce that mountain of data into the form and location it can serve you the best. The benefits add up quickly when one considers the cost of maintaining all the different repositories, archiving unnecessary data, and the administration costs around it all.

So, what is ETL exactly? It is an acronym for Extract Transform Load which essentially means to pull the data from one spot, clean it up, and put it in another. Lets break down each step though to get a clearer picture.
  • Extract – Here we have any number and kind of repository. This could be a database, CMS, DMS, file system, wiki, Intranet, mail program, you name it. All of these store data in one manner or another. They all have various ways to access them and typically a security model like an ACL (access control list) associated with them. The extract phase is all about getting at this data and importantly its meta data and security details.

  • Transform – In this phase there are 3 important things that need to happen. The first is a mapping from the fields or values from the system currently containing the data to the system that will contain the data. The second is that each these values is normalized. Normalization is the process of turning a format for a piece of data such as a data expressed as “January 1st, 2011” into another format such as “1/1/2011”. The key here is that all the different values from many repositories end up in the single format of the end system. The third aspect is enriching the data to get more value out it. Here there can be a number of processes applied which can range from performing calculations on fields to identifying valuable entities such as people, places, or companies in the flowing text.

  • Load – The final phase is about pushing data into the final system or even back into the original system after being cleansed and enriched. This can be done on an ongoing basis, over a transition period, or a one off switch. The ways it is load vary from system to system but essentially feed data in via some form of API in the native format of the target system.

ETL Tool Benefits

If you have not already gone through the process of ETL, you may be asking the question, “Why don’t I just move the data myself?” or “How does a tool help me?” The answers to these have to do with the complexity and size of your challenge. If you have one small database and want to transition the data to another small database of a different type and the various table types and possible documents stored in those are simple and straightforward, you would probably best of doing this yourself. If you have many and varied types of repositories with any number of document types and patterns of meta-data, this project could take years without assistance from the proper tools.

That is why 30 Digits has created an Extractor framework and particular Extractors for multiple repositories. This allows one to easily take care of each of the ETL steps in a structured and streamlined manor.
  • Extract – The Extractors access the repositories in their native language. They can be easily adjusted to pull the data out as fast as possible while adjusting for the load allowed on the system which is most likely still in production.

  • Transform – Mapping of each of the document types and fields and done simply and quickly with a visualization. The normalization is also done through the graphical user interface allowing the simply entry one or more formats and selection of the desired format. It also has many advanced features for performing more complex operations either for enrichment or normalizations which can be applied with a few clicks as opposed to custom scripts having to be developed, tested, and rewritten for each case. The entity extraction is also done through simply supplying identifying the types of entities wished to be gleaned from the data.

  • Load – This is probably simplified the most as the Extractor speaks the native language of the target repository to feed the data to it continually or in batches. It can also do this over an encrypted stream should security across transmission be an issue. Writing XML, CSV, or other formats to disk as well or even simultaneously is a simple configuration option.
On top of these features and benefits mentioned, there are a host of other features and extensive logging to assure a streamlined, quality controlled process to transform your ETL project into a clear and definable project with guaranteed success.

See the Extractors section for detailed information on individual Extractors, those supported, and the file types handled.

To know more or see a demo, just fill out the contact form on the side or contact us directly. We look forward to hearing from you.
Copyright © 2011 30 Digits GmbH. All rights reserved.

Site Info

Terms of use