Web Extractor


Hotels, restaurants, and more; we can harvest the web and normalize the content no matter how divergent the different the sites.


Most Hospitality and Travel businesses are based on providing their clients with the best deals and reviews on demand. They are experts in their respective industries and specialists in semantic technology. The challenge is how to get the immense amount of data from the Internet.

For this data to have any real value not only are the Hotels and Restaurants information required on a globally scale but all of their related reviews, responses and ratings. All of these WebPages are spread throughout the internet and may have millions of pages to traverse. Every site is designed differently as well as containing unique structures and styling. The reviews and responses are all unique pieces of information that need to be extracted and stored individually while still maintaining their relationship to where they came from. They are no standards as to how this type of information is displayed online or the methods used to rate an experience.

The 30 Digits Web Extractor was born to solve issues likes these and many more. The Web Extractor was originally designed to address the problems that the Hospitality and Travel semantic industry encounter. The core emphasis of the Extractor is web data harvesting, being able follow complex links and dig out the exact essential pieces of data. To unearth the relevant core information from a page provides significant time savings as well as boosting content quality. Whether it is navigating links on the page, filling in forms or building links from data on the page the extractor leaps over these obstacles.

These were the founding features for the extractor upon which it has continued to evolve. Making full use of regular expressions matching data patterns becomes child’s play. The easily recognized content and images to the human eye but often elusive to machines has been eliminated.

Different sites have different points of reference or scales; there is no commonality between them. Ratings can be shown as numbers, symbols, and images. Dates can be formatted in numbers and text as well as being affected by the language of a site. All of these pieces of data have to be standardized and conform to have meaning in comparison to each other.

The vast quantity of information all over the Internet pursed by Hospitality and Travel businesses can now be harvested and analyzed for to help customers make more informed decisions. Hotel and Restaurant owners can see how they are viewed and react to the social perception of their products and service.

Connect with us
First Name:*
Last Name:*

Related Material from the Resource Library