Coin Nup Forum
You are not logged in.
Deep Web: a Primer.
June 4, 2012in , by What is the Deep Web?.
The is a complex concept, and is essentially two categories of data.
The first category of the Deep Web is basically any information that is not easy to obtain through standard searching.
This could include Twitter or Facebook posts, links buried many layers down in a dynamic page, or results that sit so far down the standard search results that typical users will never find them.
The second category is the larger of the two, and represents a vast repository of information that is not accessible to standard search engines.
It is comprised of content found in websites, databases, and other sources.
Often it is only accessible through a custom query directed at individual websites, which cannot be accomplished by a simple surface web search.
The isn’t found in a single location.
It consists of both structured and unstructured content – a huge amount of which is found in databases.
researchers, analysts and through automated processing systems at an array of institutions throughout the world.
All of the content is housed in different systems, with different structures, at physical locations that can be as far apart as New York and Hong Kong.
BrightPlanet has patented the technology to automate custom queries that target thousands of Deep Web sources simultaneously.
Our solutions find topic-specific content and provide highly qualified, relevant results for research, analysis, tracking, and monitoring, all in real-time.
regardless of how it is stored.
How is the Deep Web different from the Surface Web?.
Why.
and standard search engines aren’t built to do that.
These tools (think Google, Yahoo!, and Bing) obtain their results in one of two ways.
First, .
Direct listing accounts for a small fraction of surface Web results and means those search tools are often forced to find their own information.
Search engines do this by performing a “crawl” or “spider”, following one hypertext link to another.
This process takes the pages and puts them into an index that the engine can refer to during future searches.
Simply stated, the crawler starts searching for hyperlinks on a page.
If that crawler finds one that leads to another document, it records the link and schedules that new page for later crawling.
Search engine crawlers extend their indexes further and further from their starting points, like ripples flowing across a pond, in an effort to find everything available.
But due to the limitations inherent in crawler searches, they will never find all the content that exists.
Thus, to be discovered, “surface” Web pages must be static and linked to other pages.
which includes dynamic content retrieved from a database.
How large is the Deep Web?.
While some early estimates put the size of the Deep Web at 4,000-5,000 times larger than the Surface Web, the changing dynamic of how information is accessed and presented means that the Deep Web is growing exponentially and at a rate that defies quantification.
In the earliest days of the Web, there were relatively few documents and sites.
It was a manageable task to post all documents as static pages; since results were persistent and constantly available they could easily be crawled by conventional search engines.
Today, information is published on the Web in a different way.
This is especially true for dynamic content, larger sites, or traditional information providers moving their content to the Internet.
The sheer volume of these sites requires the information to be managed through automated systems with databases.
The contents of these databases are hidden in plain sight from standard search engines since they often require a query to produce results.
Some of these sites may have hundreds of pages to navigate through, but thousands of pages that can be searched.
Think of a major news site, like CNN.com.
You would not be able to follow links from their homepage to find a page from two years ago, but you would be able to search for that page because it is stored and available in their database.
The evolution of the Web to a database-centric design has been gradual and largely unnoticed.
Many Internet information professionals have noted the importance of searchable databases.
But BrightPlanet’s Deep Web white paper is the first to comprehensively define and quantify this category of Web content.
In a word, yes.
But “invisible” implies that you’ll never see it.
That’s why we prefer “ ” – because the information is there if you have the right technology to find it.
As early as 1994, Dr.
Jill Ellsworth first coined the phrase “invisible Web” to refer to information that was publicly available, but not being returned by conventional search engines.
But that is just a semantic difference that doesn’t address the core issue.
The real problem is the spidering and crawling technology used by conventional search engines that returns links based on popularity, not content.
This same Big Data content is only clearly and readily available if different technology, such as the suite of BrightPlanet solutions , is used to access it.
Take a look at our online Deep Web University to read more posts on all things related to the Deep Web and data harvest technologies.
Tags: big data, , deep web search, invisible web.
Topics Select Month May 2018 April 2018 March 2018 February 2018 January 2018 December 2017 November 2017 October 2017 September 2017 August 2017 July 2017 June 2017 May 2017 April 2017 March 2017 February 2017 January 2017 December 2016 November 2016 October 2016 September 2016 August 2016 July 2016 June 2016 May 2016 April 2016 March 2016 February 2016 January 2016 December 2015 November 2015 October 2015 September 2015 August 2015 July 2015 June 2015 May 2015 April 2015 March 2015 February 2015 January 2015 December 2014 November 2014 October 2014 September 2014 August 2014 July 2014 June 2014 May 2014 April 2014 March 2014 February 2014 January 2014 December 2013 November 2013 October 2013 September 2013 August 2013 July 2013 June 2013 May 2013 April 2013 March 2013 February 2013 January 2013 December 2012 November 2012 October 2012 September 2012 August 2012 July 2012 June 2012 Get quick answers about our technology and the web from our FAQ page.
Deep Web: Advanced.
Offline