用Python标准库修改搜索引擎获取结果

Python标准库在长时间的使用中需要不断的学习。下面我们就看看如何才能更好的掌握相关的技术信息。希望对大家之后的使用和学习有所帮助。下面的就是想大家介绍下相关的使用方法。

公司主营业务：成都网站设计、成都做网站、移动网站开发等业务。帮助企业客户真正实现互联网宣传，提高企业的竞争能力。创新互联是一支青春激扬、勤奋敬业、活力青春激扬、勤奋敬业、活力澎湃、和谐高效的团队。公司秉承以“开放、自由、严谨、自律”为核心的企业文化，感谢他们对我们的高要求，感谢他们从不同领域给我们带来的挑战，让我们激情的团队有机会用头脑与智慧不断的给客户带来惊喜。创新互联推出东光免费做网站回馈大家。

我输入的关键字作为地址参数传递给某个程序，这个程序就会返回一个页面，上面包括顶部（logo和搜索UI）／结果部分／底部（版权信息部分），我们要得到的就是中间结果部分，这个可以用Python标准库的urllib中的urlopen方法得到整个页面的字符串，然后再解析这些字符串，完全有办法把中间结果部分抽取出来，抽出着串字符串，加上自己的头部和顶部和底部，那样搜索小偷的雏形就大概完成了，下面先写个测试代码。

 
 
 
  
  
  [code]   
  
  
  # Search Thief   
  
  
  # creator: Singo   
  
  
  # date: 2007-8-24   
  
  
  import urllib   
  
  
  import re   
  
  
  class SearchThief:   
  
  
  " " "the google thief " " "   
  
  
  global path,targetURL   
  
  
  path = "pages\\ "   
  
  
  # targetURL = "http://www.google.cn/search?complete=1&hl=zh-CN&q= "   
  
  
  targetURL = "http://www.baidu.com/s?wd= "   
  
  
  def __init__(self,key):   
  
  
  self.key = key   
  
  
  def getPage(self):   
  
  
  webStr = urllib.urlopen(targetURL+self.key).read() # get the page string form the url   
  
  
  self.setPageToFile(webStr)   
  
  
  def setPageToFile(self,webStr):   
  
  
  rereSetStr = re.compile( "\r ")   
  
  
  self.key = reSetStr.sub( " ",self.key) # replace the string "\r "   
  
  
  targetFile = file(path+self.key+ ".html ", "w ") # open the file for "w "rite   
  
  
  targetFile.write(webStr)   
  
  
  targetFile.close()   
  
  
  print "done "   
  
  
  inputKey = raw_input( "Enter you want to search --> ")   
  
  
  obj = SearchThief(inputKey)   
  
  
  obj.getPage()   
  
  
  [/code]

这里只是要求用户输入一个关键字，然后向搜索引擎提交请求，把返回的页面保存到一个目录下，这只是一个测试的例子，如果要做真正的搜索小偷，完全可以不保存这个页面，把抽取出来的字符串加入到我们预先设计好的模板里面，直接以web的形式显示在客户端，那样就可以实现利用盗取某些搜索引擎的结果并构造新的页面呈现。

看一下百度搜索结果页的源码，在搜索结构的那个table标签前面有个

的标签，我们可以根据这个标签得到下移两行的结果集，于是增加一个方法。

 
 
 
  
  
  getResultStr()   
  
  
  [code]   
  
  
  def getResultStr(self,webStr):   
  
  
  webStrwebStrList = webStr.read().split( "\r\n ")   
  
  
  line = webStrList.index( "

")+2 # get the line from "

既然得到结果列表，那么我们要把这个结果列表放到自己定义的页面里面，我们可以说这个页面叫模板：

[code]
< http-equivhttp-equiv= "Content-Type " content= "text/html; charset=gb2312 " />
SuperSingo搜索-%title%

[b]reCreatePage()：[/b]
[code]
def reCreatePage(self,resultStr):
demoStr = urllib.urlopen(demoPage).read() # get the demo page string
rereTitle = re.compile( "%title% ")
demoStr = reTitle.sub(self.key,demoStr) # re set the page title
rereResult = re.compile( "%result% ")
demoStr = reResult.sub(resultStr,demoStr) # re set the page result
return demoStr
[/code]

[code]
# the main programme
# creator: Singo
# date: 2007-8-24
import urllib
import re
class SearchThief:
" " "the google thief " " "
global path,targetURL,demoPage
path = "pages\\ "
# targetURL = "http://www.google.cn/search?complete=1&hl=zh-CN&q= "
targetURL = "http://www.baidu.com/s?wd= "
demoPage = path+ "__demo__.html "
def __init__(self,key):
self.key = key
def getPage(self):
webStr = urllib.urlopen(targetURL+self.key) # get the page string form the url
webStr = self.getResultStr(webStr) # get the result part
webStr = self.reCreatePage(webStr) # re create a new page
self.setPageToFile(webStr)
def getResultStr(self,webStr):
webStrwebStrList = webStr.read().split( "\r\n ")
line = webStrList.index( " ")+2 # get the line from " " move 2 line
resultStr = webStrList[line]
return resultStr
def reCreatePage(self,resultStr):
demoStr = urllib.urlopen(demoPage).read() # get the demo page string
rereTitle = re.compile( "%title% ")
demoStr = reTitle.sub(self.key,demoStr) # re set the page title
rereResult = re.compile( "%result% ")
demoStr = reResult.sub(resultStr,demoStr) # re set the page result
return demoStr
def setPageToFile(self,webStr):
rereSetStr = re.compile( "\r ")
self.key = reSetStr.sub( " ",self.key) # replace the string "\r "
targetFile = file(path+self.key+ ".html ", "w ") # open the file for "w "rite
targetFile.write(webStr)
targetFile.close()
print "done "
inputKey = raw_input( "Enter you want to search --> ")
obj = SearchThief(inputKey)
obj.getPage()
[/code]