Web Extractor


From smart metering to microgeneration to electric vehicles, we know the sites, the content, and how to keep you informed on the latest trends and new innovations.


Competitive intelligence teams spread out across multiple departments within an organization need to know what’s happening as soon as it happens. With the influx of information on the internet sorting through what is relevant and what’s not is a time consuming task. Business Intelligence teams are spending most of their time performing this task, compiling information and not enough time analyzing the information. The results latest energy news topics MISSED, new technology developments MISSED, and marketing movements of you competitors MISSED. All of these scenarios and more are the by-product of not having all the relevant information at the click of a button.

With the combination of the web extractor and search (IDS) technologies, 30 Digits have been able to provide a one-stop-shop competitive intelligence tool for the Energy market, Advantage4Energy. This intelligent search and collaboration solution provides organizations in the energy industry with the tools to locate, collect, investigate, manage and share information effectively and efficiently. The Advantage4Energy solution is feed by the web extractor. A total of 100 plus sources (websites) are crawled at various times of the day to provide the highest quality content to the end user.

Websites in areas such as the following are covered within the plethora of spidered sites.

  • Energy Specific News
  • General News
  • Green Energy
  • Regulation
  • Business Sectors
  • Suppliers
  • Market Research
  • Innovative Developments
  • Comparison Sites
  • Generators
  • Distributors

Crawling so many sites on a regular basis is only made possible via the extractor’s scheduler. Running multiple sources in lists at different times of the day and frequencies / repetitions is all managed by the scheduler. Ensuring that you have the latest data is crawled, indexed / written, ready and available when you need it.

Deliver great content on a regular basis is only useful if the end user can find what’s relevant to them. To answer this call the extractor has been setup with several lists of keywords provided by energy industry experts and expanded by us. These list allow the spider to tag each and every document it extracts with a reference to these words if they are found within the extracted document. Here are a few examples;

  • Countries
    • United Kingdom
    • United States
    • India

  • Cities
    • London
    • New York
    • Washington
  • Energy Organizations
    • Ofgem
    • National Grid
    • EDF Energy

  • Energy Types
    • Gas
    • Oil
    • Electricity

This means that every extracted document that contains the word “electricity” and every synonym for electricity will have a tag “electricity”. Thus enabling it to be easily found in the future.

Ensuring the highest quality controls is a manageable task when the web extractor has the QualityChecker plug-in installed. Setting up this Plugin make it easy to provide technical support with email alerting, general overviews and configurable individual business rules for each source.

Connect with us
First Name:*
Last Name:*

Related Material from the Resource Library