[go: up one dir, main page]

Jump to content

Vertical search: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
moved "DARPA's Memex program" section from Memex article, now subsection of "Domain-specific search"
Removed what essentially was an advertising link
Tags: Mobile edit Mobile web edit
 
(17 intermediate revisions by 16 users not shown)
Line 1: Line 1:
{{Short description | Segmented search engine with specific content areas}}
{{original research|date=September 2012}}
{{original research|date=September 2012}}
A '''vertical search''' engine is distinct from a general [[web search engine]], in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the [[Library_of_Congress#Digitization|Library of Congress]], [[Mocavo.com|Mocavo]], [[Nuroa]], [[Trulia]] and [[Yelp, Inc.|Yelp]].
A '''vertical search''' engine is distinct from a general [[web search engine]], in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the [[Library_of_Congress#Digitization|Library of Congress]], [[Mocavo.com|Mocavo]], [[Nuroa]], [[Trulia]], and [[Yelp, Inc.|Yelp]].


In contrast to general web search engines, which attempt to [[search engine indexing|index]] large portions of the [[World Wide Web]] using a [[web crawler]], vertical search engines typically use a [[focused crawler]] which attempts to index only relevant web pages to a pre-defined topic or set of topics. Some vertical search sites focus on individual verticals, while other sites include multiple vertical searches within one search engine.
In contrast to general web search engines, which attempt to [[search engine indexing|index]] large portions of the [[World Wide Web]] using a [[web crawler]], vertical search engines typically use a [[focused crawler]] which attempts to index only relevant web pages to a pre-defined topic or set of topics. Some vertical search sites focus on individual verticals, while other sites include multiple vertical searches within one search engine.
Line 10: Line 11:
* Support of specific unique user tasks.
* Support of specific unique user tasks.


Vertical search can be viewed as similar to [[enterprise search]] where the domain of focus is the enterprise, such as a company, government or other organization. In 2013, consumer price comparison websites with integrated vertical search engines such as [[FindTheBest]] drew large rounds of venture capital funding, indicating a growth trend for these applications of vertical search technology.<ref name=TechCrunch1>{{cite web|last=Rao|first=Leena|title=Data-Driven Comparison Shopping Platform FindTheBest Raises $11M From New World, Kleiner Perkins And Others|url=https://techcrunch.com/2013/03/05/data-driven-comparison-shopping-platform-findthebest-raises-11m-from-new-world-kleiner-perkins-and-others/|publisher=TechCrunch|accessdate=27 May 2013}}</ref><ref name=TechCrunch2>{{cite web|last=HO|first=VICTORIA|title=Asian Price Comparison Site Save 22 Gets Angel Round Of "Mid Six Figures"|url=https://techcrunch.com/2013/05/11/asian-price-comparison-site-save-22-gets-angel-round-of-mid-six-figures/|accessdate=27 May 2013}}</ref>
Vertical search can be viewed as similar to [[enterprise search]] where the domain of focus is the enterprise, such as a company, government or other organization. In 2013, consumer price comparison websites with integrated vertical search engines such as [[FindTheBest]] drew large rounds of venture capital funding, indicating a growth trend for these applications of vertical search technology.<ref name=TechCrunch1>{{cite web|last=Rao|first=Leena|title=Data-Driven Comparison Shopping Platform FindTheBest Raises $11M From New World, Kleiner Perkins And Others|date=5 March 2013|url=https://techcrunch.com/2013/03/05/data-driven-comparison-shopping-platform-findthebest-raises-11m-from-new-world-kleiner-perkins-and-others/|publisher=TechCrunch|access-date=27 May 2013|archive-date=1 June 2013|archive-url=https://web.archive.org/web/20130601012944/http://techcrunch.com/2013/03/05/data-driven-comparison-shopping-platform-findthebest-raises-11m-from-new-world-kleiner-perkins-and-others/|url-status=live}}</ref><ref name=TechCrunch2>{{cite web|last=HO|first=VICTORIA|title=Asian Price Comparison Site Save 22 Gets Angel Round Of "Mid Six Figures"|date=11 May 2013|url=https://techcrunch.com/2013/05/11/asian-price-comparison-site-save-22-gets-angel-round-of-mid-six-figures/|access-date=27 May 2013|archive-date=7 June 2013|archive-url=https://web.archive.org/web/20130607041002/http://techcrunch.com/2013/05/11/asian-price-comparison-site-save-22-gets-angel-round-of-mid-six-figures/|url-status=live}}</ref>


== Domain-specific search ==
== Domain-specific search ==
Line 20: Line 21:
</blockquote>
</blockquote>


Any general search engine would be indexing all the pages and searches in a breadth-first manner to collect documents. The spidering in domain-specific search engines more efficiently searches a small subset of documents by focusing on a particular set. Spidering accomplished with a reinforcement-learning framework has been found to be three times more efficient than [[breadth-first search]].<ref>
In the domain-specific setting one can combine the [[tf-idf]] approach implemented via an [[inverse index]] with [[semantic]] approaches of semantic headers and [[semantic skeleton]]s. Instead of most frequent keywords, a set of entities is extracted from a portion of text to be matched against a potential question. This allows much more flexibility due to real-time reasoning capabilities while matching questions and answers in the form of semantic headers.<ref>
{{cite journal |last=Galitsky |first=Boris
|title=Building a Repository of Background Knowledge Using Semantic Skeletons|publisher=AAAI
|journal = AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering
|year=2006
}}</ref>

Any general search engine would be indexing all the pages and searches in a breadth-first manner to collect documents. The spidering in domain-specific search engines more efficiently searches a small subset of documents by focusing on a particular set. Spidering accomplished with a reinforcement-learning framework has been found to be three times more efficient than breadth-first search.<ref>
{{cite journal |last=McCallum |first=Andrew
{{cite journal |last=McCallum |first=Andrew
|title=A Machine Learning Approach to Building Domain-Specific Search Engines|journal = IJCAI|volume = 99|pages=662–667|year=1999
|title=A Machine Learning Approach to Building Domain-Specific Search Engines|journal = IJCAI|volume = 99|pages=662–667|year=1999
Line 34: Line 28:


=== DARPA's Memex program ===
=== DARPA's Memex program ===
In early 2014, the Defense Advanced Research Projects Agency ([[DARPA]]) released a statement on their website outlining the preliminary details of the "Memex program", which aims at developing new search technologies overcoming some limitations of text-based search.<ref name="DARPA PR 20140209">{{cite press release |title=Memex Aims to Create a New Paradigm for Domain-Specific Search |url=http://www.darpa.mil/newsevents/releases/2014/02/09.aspx |publisher=[[DARPA]] |date=February 9, 2014 |access-date=February 11, 2015 |archive-url=https://web.archive.org/web/20150211034344/http://www.darpa.mil/newsevents/releases/2014/02/09.aspx |archive-date=February 11, 2015 |url-status=dead }}</ref> DARPA wants the Memex technology developed in this research to be usable for search engines that can search for information on the [[Deep Web (search indexing)|Deep Web]] – the part of the Internet that is largely unreachable by commercial search engines like [[Google]] or [[Yahoo]]. DARPA's website describes that "The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests".<ref>{{Cite web|url=http://www.darpa.mil/program/memex|title=Memex (Domain-Specific Search)|website=www.darpa.mil|access-date=2016-09-21}}</ref> As reported in a 2015 ''[[Wired (magazine)|Wired]]'' article, the search technology being developed in the Memex program "aims to shine a light on the [[dark web]] and uncover patterns and relationships in online data to help law enforcement and others track illegal activity".<ref>{{cite journal |title=Darpa Is Developing a Search Engine for the Dark Web |author=Kim Zetter |journal=[[Wired (magazine)|Wired]] |date=February 2, 2015 |url=https://www.wired.com/2015/02/darpa-memex-dark-web/}}</ref> DARPA intends for the program to replace the centralized procedures used by commercial search engines, stating that the "creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration, and extension of current search capabilities to the deep web, the dark web, and nontraditional (e.g. multimedia) content".<ref name=":1" /> In their description of the program, DARPA explains the program's name as a tribute to Bush's original Memex invention, which served as an inspiration.<ref name="DARPA PR 20140209" />
In early 2014, the Defense Advanced Research Projects Agency ([[DARPA]]) released a statement on their website outlining the preliminary details of the "Memex program", which aims at developing new search technologies overcoming some limitations of text-based search.<ref name="DARPA PR 20140209">{{cite press release |title=Memex Aims to Create a New Paradigm for Domain-Specific Search |url=http://www.darpa.mil/newsevents/releases/2014/02/09.aspx |publisher=[[DARPA]] |date=February 9, 2014 |access-date=February 11, 2015 |archive-url=https://web.archive.org/web/20150211034344/http://www.darpa.mil/newsevents/releases/2014/02/09.aspx |archive-date=February 11, 2015 |url-status=dead }}</ref> DARPA wants the Memex technology developed in this research to be usable for search engines that can search for information on the [[Deep Web (search indexing)|Deep Web]] – the part of the Internet that is largely unreachable by commercial search engines like [[Google]] or [[Yahoo]]. DARPA's website describes that "The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests".<ref>{{Cite web|url=http://www.darpa.mil/program/memex|title=Memex (Domain-Specific Search)|website=www.darpa.mil|access-date=2016-09-21|archive-date=2016-09-16|archive-url=https://web.archive.org/web/20160916000705/http://www.darpa.mil/program/memex|url-status=live}}</ref> As reported in a 2015 ''[[Wired (magazine)|Wired]]'' article, the search technology being developed in the Memex program "aims to shine a light on the [[dark web]] and uncover patterns and relationships in online data to help law enforcement and others track illegal activity".<ref>{{cite magazine |title=Darpa Is Developing a Search Engine for the Dark Web |author=Kim Zetter |magazine=[[Wired (magazine)|Wired]] |date=February 2, 2015 |url=https://www.wired.com/2015/02/darpa-memex-dark-web/ |access-date=November 19, 2020 |archive-date=June 29, 2023 |archive-url=https://web.archive.org/web/20230629024214/https://www.wired.com/2015/02/darpa-memex-dark-web/ |url-status=live }}</ref> DARPA intends for the program to replace the centralized procedures used by commercial search engines, stating that the "creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration, and extension of current search capabilities to the deep web, the dark web, and nontraditional (e.g. multimedia) content".<ref name=":1" /> In their description of the program, DARPA explains the program's name as a tribute to Bush's original Memex invention, which served as an inspiration.<ref name="DARPA PR 20140209" />

In April 2015, it was announced parts of Memex would be open sourced.<ref>{{cite web |url=https://www.forbes.com/sites/thomasbrewster/2015/04/17/darpa-nasa-and-partners-show-off-memex/ |title=Watch Out Google, DARPA Just Open Sourced All This Swish 'Dark Web' Search Tech |website= |author=Forbes |date=April 17, 2015 |accessdate=April 20, 2015}}</ref> Modules were available for download.<ref name=":1">{{cite web |url=http://opencatalog.darpa.mil/MEMEX.html |title=Memex (Domain-Specific Search) |publisher=DARPA |author= |date= |accessdate=April 20, 2015 |archive-url=https://web.archive.org/web/20150610162436/http://opencatalog.darpa.mil/MEMEX.html |archive-date=June 10, 2015 |url-status=dead }}</ref>


In April 2015, it was announced parts of Memex would be open sourced.<ref>{{cite web |url=https://www.forbes.com/sites/thomasbrewster/2015/04/17/darpa-nasa-and-partners-show-off-memex/ |title=Watch Out Google, DARPA Just Open Sourced All This Swish 'Dark Web' Search Tech |author=Forbes |website=[[Forbes]] |date=April 17, 2015 |access-date=April 20, 2015 |archive-date=April 20, 2015 |archive-url=https://web.archive.org/web/20150420003210/http://www.forbes.com/sites/thomasbrewster/2015/04/17/darpa-nasa-and-partners-show-off-memex/ |url-status=live }}</ref> Modules were available for download.<ref name=":1">{{cite web |url=http://opencatalog.darpa.mil/MEMEX.html |title=Memex (Domain-Specific Search) |publisher=DARPA |access-date=April 20, 2015 |archive-url=https://web.archive.org/web/20150610162436/http://opencatalog.darpa.mil/MEMEX.html |archive-date=June 10, 2015 |url-status=dead }}</ref>


==References==
==References==

Latest revision as of 04:21, 7 February 2024

A vertical search engine is distinct from a general web search engine, in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the Library of Congress, Mocavo, Nuroa, Trulia, and Yelp.

In contrast to general web search engines, which attempt to index large portions of the World Wide Web using a web crawler, vertical search engines typically use a focused crawler which attempts to index only relevant web pages to a pre-defined topic or set of topics. Some vertical search sites focus on individual verticals, while other sites include multiple vertical searches within one search engine.

Benefits

[edit]

Vertical search offers several potential benefits over general search engines:

  • Greater precision due to limited scope,
  • Leverage domain knowledge including taxonomies and ontologies,
  • Support of specific unique user tasks.

Vertical search can be viewed as similar to enterprise search where the domain of focus is the enterprise, such as a company, government or other organization. In 2013, consumer price comparison websites with integrated vertical search engines such as FindTheBest drew large rounds of venture capital funding, indicating a growth trend for these applications of vertical search technology.[1][2]

[edit]

Domain-specific verticals focus on a specific topic. John Battelle describes this in his book The Search (2005):

Domain-specific search solutions focus on one area of knowledge, creating customized search experiences, that because of the domain's limited corpus and clear relationships between concepts, provide extremely relevant results for searchers.[3]

Any general search engine would be indexing all the pages and searches in a breadth-first manner to collect documents. The spidering in domain-specific search engines more efficiently searches a small subset of documents by focusing on a particular set. Spidering accomplished with a reinforcement-learning framework has been found to be three times more efficient than breadth-first search.[4]

DARPA's Memex program

[edit]

In early 2014, the Defense Advanced Research Projects Agency (DARPA) released a statement on their website outlining the preliminary details of the "Memex program", which aims at developing new search technologies overcoming some limitations of text-based search.[5] DARPA wants the Memex technology developed in this research to be usable for search engines that can search for information on the Deep Web – the part of the Internet that is largely unreachable by commercial search engines like Google or Yahoo. DARPA's website describes that "The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests".[6] As reported in a 2015 Wired article, the search technology being developed in the Memex program "aims to shine a light on the dark web and uncover patterns and relationships in online data to help law enforcement and others track illegal activity".[7] DARPA intends for the program to replace the centralized procedures used by commercial search engines, stating that the "creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration, and extension of current search capabilities to the deep web, the dark web, and nontraditional (e.g. multimedia) content".[8] In their description of the program, DARPA explains the program's name as a tribute to Bush's original Memex invention, which served as an inspiration.[5]

In April 2015, it was announced parts of Memex would be open sourced.[9] Modules were available for download.[8]

References

[edit]
  1. ^ Rao, Leena (5 March 2013). "Data-Driven Comparison Shopping Platform FindTheBest Raises $11M From New World, Kleiner Perkins And Others". TechCrunch. Archived from the original on 1 June 2013. Retrieved 27 May 2013.
  2. ^ HO, VICTORIA (11 May 2013). "Asian Price Comparison Site Save 22 Gets Angel Round Of "Mid Six Figures"". Archived from the original on 7 June 2013. Retrieved 27 May 2013.
  3. ^ Battelle, John (2005). The Search: How Google and its Rivals Rewrote the Rules of Business and Transformed Our Culture. New York: Portfolio.
  4. ^ McCallum, Andrew (1999). "A Machine Learning Approach to Building Domain-Specific Search Engines". IJCAI. 99: 662–667. CiteSeerX 10.1.1.88.3818.
  5. ^ a b "Memex Aims to Create a New Paradigm for Domain-Specific Search" (Press release). DARPA. February 9, 2014. Archived from the original on February 11, 2015. Retrieved February 11, 2015.
  6. ^ "Memex (Domain-Specific Search)". www.darpa.mil. Archived from the original on 2016-09-16. Retrieved 2016-09-21.
  7. ^ Kim Zetter (February 2, 2015). "Darpa Is Developing a Search Engine for the Dark Web". Wired. Archived from the original on June 29, 2023. Retrieved November 19, 2020.
  8. ^ a b "Memex (Domain-Specific Search)". DARPA. Archived from the original on June 10, 2015. Retrieved April 20, 2015.
  9. ^ Forbes (April 17, 2015). "Watch Out Google, DARPA Just Open Sourced All This Swish 'Dark Web' Search Tech". Forbes. Archived from the original on April 20, 2015. Retrieved April 20, 2015.