搜索引擎是如何工作的
搜索引擎是如何工作的
小編為大家介紹了搜索引擎是如何工作的,希望對你有幫助哦!
Internet search tools fall into two camps:search engines,such as HotBot and AltaVista,and online directories,such as Yahoo and Lycos.The difference between the two is related to how they compile their site listings.Of course,there are exceptions to every rule.Some search utilities,such as Ask Jeeves,combine the search engine and directory approaches into a single package,hoping to provide users with the best of both worlds.
In directory-based search services,the Web site listings are compiled manually.For example,the everpopular Yahoo dedicates staff resources to accept site suggestions from users,review and categorize them,and add them to a specific directory on the Yahoo site.
You can usually submit your Web site simply by filling out an online form.On Yahoo,for example,you'll find submission information at www.yahoo.com/docs/info/include.html.Because human intervention is necessary to process,verify,and review submission requests,expect a delay before your site secures a spot in a directory-based search service.
On the flip side,search engines completely automate the compilation process,removing the human component entirely.
A software robot,called a spider or crawler,automatically fetches sites all over the Web,reading pages and following associated links.By design,a spider will return to a site periodically to check for new pages and changes to existing pages.
Results from spidering are recorded in the search engine’s index or catalog.Given the wealth of information available on the Internet,it is not surprising that indexes grow to very large sizes.For example,the AltaVista index has recently been increased to top out at 350 million pages.This may seem like a mammoth number,but by all estimates it still represents less than 35 percent of all pages on the Web.
Because of the depth and breadth of information being indexed,there is usually a delay,sometimes up to several weeks,between the time a site has been“spidered”and when it appears in a search index.Until this two-step process has been completed,a site remains unavailable to search queries.
Finally,the heart of each search engine is an algorithm that matches keyword queries against the information in the index,ranking results in the order the algorithm deems most relevant.
Because the spiders,resulting indexes,and search algorithms of each search engine differ,so do the search results and rankings across the various search engines.This explains why a top 10 site in HotBot may not appear near the top of Alta Vista when the same keyword search criterion is entered.
In addition,many,but not all,search utilities also reference metatags—invisible HTML tags within documents that describe their content—as a way to control how content is indexed.As a result,proper use of metatags throughout a site can also boost search engine ranking.
因特網(wǎng)搜索工具分為兩大陣營:搜索引擎,如HotBot和AltaVista,以及在線目錄,如Yahoo和Lycos。兩者間的差別與它們?nèi)绾尉幾W(wǎng)站編目有關(guān)。當然,對任何規(guī)律都有例外。有些搜索實用程序,如Ask Jeeves,把搜索引擎和目錄方法合并成單一的軟件包,希望把這兩個陣營中最好的東西提供給用戶。
在基于目錄的搜索服務(wù)中,Web網(wǎng)站編目是手工編撰的。比如一直流行的Yahoo就指定專門的人力資源來接受用戶對網(wǎng)站的建議,并對建議進行評價和分類,再把它們加到Y(jié)ahoo網(wǎng)站上特定目錄中。
通常是通過簡單地填寫在線表格就能把你的網(wǎng)站信息提交給(搜索引擎)。例如,在Yahoo網(wǎng)站上,你可以在 www.yahoo.com/docs/info/include.htm1上找到提交信息。由于人工干預(yù)對處理、驗 證和評價提交請求是必要的,所以在網(wǎng)站在基于目錄的搜索服務(wù)中捕捉到一處之前,可 望有些延遲。
另一方面,搜索引擎完全實現(xiàn)了編撰過程的自動化,徹底消除了人工干預(yù)。
一個叫做蜘蛛或爬蟲的軟件機器人自動地在整個Web上取出站點,閱讀頁面和跟隨相關(guān)的鏈接。通過設(shè)計,蜘蛛可以周期性地返回到站點,檢查新的頁面和修改已有頁面。
蜘蛛爬行得到的結(jié)果記錄在搜索引擎的索引或目錄中。已知了因特網(wǎng)上可資利用的信息的價值,對索引擴張到非常大的規(guī)模是不會感到驚訝的。 例如,AltaVista的索引最近已增至3.5億頁而名列前茅。這個數(shù)字看來好像非常大,但總體估計它僅代表了Web上不足35%的頁面。
由于已編索引的信息的深度與廣度(非常大),所以通常在“蜘蛛爬行過”站點的時間與出現(xiàn)在搜索索引中的時間之間有一個延遲,有時多達幾周。只有這兩步的過程完成之后,站點才能供搜索查詢使用。
最后,每個搜索引擎的心臟是一種算法,它將關(guān)鍵字查詢與索引中的信息匹配起來,并按算法認為最有關(guān)聯(lián)的順序把結(jié)果列出。
由于每種搜索引擎的蜘蛛、產(chǎn)生的索引和搜索算法都是不一樣的,所以在不同搜索引擎上的搜索結(jié)果和排列次序是不同的。這就解釋了為什么當相同的關(guān)鍵字搜索準則輸入進去時,HotBot中排在最前面的10個站點不會出現(xiàn)在 AltaVista中最前面的站點中。
此外,很多(但不是所有的)搜索實用程序也引用元標記(文檔中用來描述其內(nèi)容的、看不見的HTML標記),作為控制內(nèi)容如何編索引的方法。因此,在整個站點中正確使用元標記也能提高(此站點)在搜索引擎中的排列名次。