通过Mechanize模拟自然的浏览器行为来完成与网页之间的交互.
2012-12-31 20:32
411 查看
import re import mechanize br = mechanize.Browser() br.open("http://www.example.com/") # follow second link with element text matching regular expression response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) assert br.viewing_html() print br.title() print response1.geturl() print response1.info() # headers print response1.read() # body br.select_form(name="order") # Browser passes through unknown attributes (including methods) # to the selected HTMLForm. br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__) # Submit current form. Browser calls .close() on the current response on # navigation, so this closes response1 response2 = br.submit() # print currently selected form (don't call .submit() on this, use br.submit()) print br.form response3 = br.back() # back to cheese shop (same data as response1) # the history mechanism returns cached response objects # we can still use the response, even though it was .close()d response3.get_data() # like .seek(0) followed by .read() response4 = br.reload() # fetches from server for form in br.forms(): print form # .links() optionally accepts the keyword args of .follow_/.find_link() for link in br.links(url_regex="python.org"): print link br.follow_link(link) # takes EITHER Link instance OR keyword args br.back()
You may control the browser’s policy by using the methods of
mechanize.Browser’s base class,
mechanize.UserAgent. For example:
br = mechanize.Browser() # Explicitly configure proxies (Browser will attempt to set good defaults). # Note the userinfo ("joe:password@") and port number (":3128") are optional. br.set_proxies({"http": "joe:password@myproxy.example.com:3128", "ftp": "proxy.example.com", }) # Add HTTP Basic/Digest auth username and password for HTTP proxy access. # (equivalent to using "joe:password@..." form above) br.add_proxy_password("joe", "password") # Add HTTP Basic/Digest auth username and password for website access. br.add_password("http://example.com/protected/", "joe", "password") # Don't handle HTTP-EQUIV headers (HTTP headers embedded in HTML). br.set_handle_equiv(False) # Ignore robots.txt. Do not do this without thought and consideration. br.set_handle_robots(False) # Don't add Referer (sic) header br.set_handle_referer(False) # Don't handle Refresh redirections br.set_handle_refresh(False) # Don't handle cookies br.set_cookiejar() # Supply your own mechanize.CookieJar (NOTE: cookie handling is ON by # default: no need to do this unless you have some reason to use a # particular cookiejar) br.set_cookiejar(cj) # Log information about HTTP redirects and Refreshes. br.set_debug_redirects(True) # Log HTTP response bodies (ie. the HTML, most of the time). br.set_debug_responses(True) # Print HTTP headers. br.set_debug_http(True) # To make sure you're seeing all debug output: logger = logging.getLogger("mechanize") logger.addHandler(logging.StreamHandler(sys.stdout)) logger.setLevel(logging.INFO) # Sometimes it's useful to process bad headers or bad HTML: response = br.response() # this is a copy of response headers = response.info() # currently, this is a mimetools.Message headers["Content-type"] = "text/html; charset=utf-8" response.set_data(response.get_data().replace("<!---", "<!--")) br.set_response(response)
mechanize exports the complete interface of
urllib2:
import mechanize response = mechanize.urlopen("http://www.example.com/") print response.read()
When using mechanize, anything you would normally import from
urllib2should be imported from mechanize instead.
相关文章推荐
- JavaScript事件流 HTML和CSS代码支持页面的外观,JavaScript代码支持页面的行为,而JavaScript与HTML之间的交互是通过事件实现的。事件,是文档或者浏览器窗口中发生
- Mechanize模拟自然网页交互一些基本常用方法
- Mechanize模拟自然网页交互一些基本常用方法
- python 使用 mechanize 模拟浏览器访问网页
- python 使用 mechanize 模拟浏览器访问网页
- 通过ESP8266手机或电脑浏览器网页能控制远程任意组任意路继电器开关并收发单片机指令 测试OK
- Mac/Linux/Windows通过命令调用浏览器打开某网页
- 模拟网页行为之实践篇
- 浏览器上模拟qq的消息提示声/网页播放声音
- 编辑网页时 火狐跟IE两大浏览器之间的差异
- Unity3d 中行为类之间的简单交互
- 使用httpClient3.1完成模拟浏览器登录上传下载总结
- .ascx和网页.aspx之间的交互方式
- C# Winform利用POST传值方式模拟表单提交数据(Winform与网页交互)
- google chrome浏览器 模拟手机、浏览器访问手机网页
- 通过模拟浏览器获取cookies
- 通过响应者链条完成控制器之间的跳转
- (转)C# Winform利用POST传值方式模拟表单提交数据(Winform与网页交互)
- Android App之间通过Intent交互
- 网页中,模拟tab健或按回车更换输入焦点(兼容多中浏览器,ie,遨游,火狐)。