+49 89 45 23 89 66
contact@30digits.com
call us:
eMail us:
Contact us
First Name:*
Last Name:*
eMail:*
Tel.:*
Comment:

Linguistic Tools

When dealing with content either from the ETL perspective or search one eventually hits the limits of what can be gleaned from data just from the meta information and inherit structure of the content. To go a step further, one needs to delve into the world of linguistics. There are two main areas where 30 Digits has dove in and improved what we can gather from data and how we can then use it. Those areas in the field of Information Access and Retrieval are called Lemmatisation and Entity Extraction or sometimes Named Entity Recognition (NER)

Following is a brief explanation of both. This section is being expanded in the near future. So, come back soon to find out more.

Lemmatisation

Being a German based company with its roots in search, we have a unique perspective on this problem. In English and many other languages, lemmatisation is simply not a major issue as stemming handles most issues that lemmatisation addresses in a simpler way. For the German language, this is not the case. There are such varied ways the verbs and nouns can morph and be combined that stemming algorithms do not succeed in delivering comprehensive and accurate results.

Thus 30 Digits has its own tools and dictionary for identifying terms and their base forms. This can then be used at access or search time to assure complete results. The most common application of this is to improve search results typically for eCommerce sights which depend on users finding all the products and offerings without the user having to type the exact same phrase as described in the product. The beauty of these lemmatisation tools is that they are easily updated to handle specialized industry terms and new lingo.

Entity Extraction

Entity Extraction in its essence is the ability to take out entities like people, places, and organizations from the free flowing text and tag them to a specific category. Our Entity Extraction tools simply plug-in to our Extractor Framework allowing for easy enrichment of data from any data source. The most common application of this is tagging data from web sites. An example of this would be a pharmaceutical company monitoring a number of sites and watching for certain drugs and product names being mentioned.

To know more or see a demo, just fill out the contact form on the side or contact us directly. We look forward to hearing from you.
Copyright © 2011 30 Digits GmbH. All rights reserved.

Site Info

Terms of use