java httpclient访问某些网页报403错误
2014-03-19 15:38
471 查看
应该是某些网站对这种“网络收集器”类的东西进行了过滤,你设置请求头伪装成浏览器应该可以的
就是需要setheader
代码如下:
httpclient 模拟浏览器动作需注意的cookie和HTTP头等信息
commons-httpclient是apache下的一个开源项目,提供了一个纯java实现的http客户端,使用它可以很方便发送HTTP请求,接受HTTP应答,自动管理Cookie等等。
对于contact-list类库来说,需要使用的功能有,自动管理Cookie,设置HTTP头,发送HTTP请求,接受HTTP应答,转发HTTP重定向,还有输出HTTP请求/应答日志,下面对这些功能的实现进行解释:
1. 自动管理Cookie
client将按照浏览器的方式来自动处理Cookie。当然你也可以在运行过程中手动调整cookie,比如:
hotmail登录之前需要设置当前时间的Cookie:
不过,httpclient似乎没有提供删除cookie的功能,于是我增加了两个cookie管理的接口,一个是保留指定的cookies,一个是删除指定的cookies:
2. 设置HTTP头:
http头的设置,可以让邮件服务器认为是在和浏览器打交道,而避免被refuse的可能:
另外,在GET和POST的时候设置referer值,以及在POST的时候设置Content-Type:
3. 发送HTTP请求,接收HTTP应答。在contact-list中只使用了GET和POST请求,我也做了简单的封装:
HTTP重定向,主要是两种,一种是根据HTTP头的Location
5. 输出请求/应答日志,这个对调试非常重要:
就是需要setheader
代码如下:
httpclient 模拟浏览器动作需注意的cookie和HTTP头等信息
commons-httpclient是apache下的一个开源项目,提供了一个纯java实现的http客户端,使用它可以很方便发送HTTP请求,接受HTTP应答,自动管理Cookie等等。
对于contact-list类库来说,需要使用的功能有,自动管理Cookie,设置HTTP头,发送HTTP请求,接受HTTP应答,转发HTTP重定向,还有输出HTTP请求/应答日志,下面对这些功能的实现进行解释:
1. 自动管理Cookie
public EmailImporter(String email, String password, String encoding) { ...... client = new HttpClient(); client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); client.getParams().setParameter("http.protocol.single-cookie-header", true); }其中将HttpClient的Cookie策略设置为CookiePolicy.BROWSER_COMPATIBILITY,即表示java
client将按照浏览器的方式来自动处理Cookie。当然你也可以在运行过程中手动调整cookie,比如:
hotmail登录之前需要设置当前时间的Cookie:
client.getState().addCookie(new Cookie("login.live.com", "CkTst", "G" + new Date().getTime()));
不过,httpclient似乎没有提供删除cookie的功能,于是我增加了两个cookie管理的接口,一个是保留指定的cookies,一个是删除指定的cookies:
protected void retainCookies(String[] cookieNames) { Cookie[] cookies = client.getState().getCookies(); ArrayList<Cookie> retainCookies = new ArrayList<Cookie>(); for (Cookie cookie : cookies) { if (Arrays.binarySearch(cookieNames, cookie.getName()) >= 0) { retainCookies.add(cookie); } } client.getState().clearCookies(); client.getState().addCookies(retainCookies.toArray(new Cookie[0])); } protected void removeCookies(String[] cookieNames) { Cookie[] cookies = client.getState().getCookies(); ArrayList<Cookie> retainCookies = new ArrayList<Cookie>(); for (Cookie cookie : cookies) { if (Arrays.binarySearch(cookieNames, cookie.getName()) < 0) { retainCookies.add(cookie); } } client.getState().clearCookies(); client.getState().addCookies(retainCookies.toArray(new Cookie[0])); }
2. 设置HTTP头:
http头的设置,可以让邮件服务器认为是在和浏览器打交道,而避免被refuse的可能:
private void setHeaders(HttpMethod method) { method.setRequestHeader("Accept", "text/html,application/xhtml+xml,application/xml;"); method.setRequestHeader("Accept-Language", "zh-cn"); method.setRequestHeader("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"); method.setRequestHeader("Accept-Charset", encoding); method.setRequestHeader("Keep-Alive", "300"); method.setRequestHeader("Connection", "Keep-Alive"); method.setRequestHeader("Cache-Control", "no-cache"); }
另外,在GET和POST的时候设置referer值,以及在POST的时候设置Content-Type:
protected String doPost(String actionUrl, NameValuePair[] params, String referer) throws HttpException, IOException { ...... method.setRequestHeader("Referer", referer); method.setRequestHeader("Content-Type", "application/x-www-form-urlencoded"); ...... }
3. 发送HTTP请求,接收HTTP应答。在contact-list中只使用了GET和POST请求,我也做了简单的封装:
protected String doGet(String url, String referer) throws HttpException, IOException { GetMethod method = new GetMethod(url); setHeaders(method); method.setRequestHeader("Referer", referer); // log request client.executeMethod(method); String responseStr = readInputStream(method.getResponseBodyAsStream()); // log response method.releaseConnection(); lastUrl = method.getURI().toString(); return responseStr; } protected String doPost(String actionUrl, NameValuePair[] params, String referer) throws HttpException, IOException { PostMethod method = new PostMethod(actionUrl); setHeaders(method); method.setRequestHeader("Referer", referer); method.setRequestHeader("Content-Type", "application/x-www-form-urlencoded"); method.setRequestBody(params); // log request client.executeMethod(method); String responseStr = readInputStream(method.getResponseBodyAsStream()); // log response method.releaseConnection(); if (method.getResponseHeader("Location") != null) { // do redirect } else { lastUrl = method.getURI().toString(); return responseStr; } }4.
HTTP重定向,主要是两种,一种是根据HTTP头的Location
if (method.getResponseHeader("Location").getValue().startsWith("http")) { return doGet(method.getResponseHeader("Location").getValue()); } else { return doGet("http://" + getResponseHost(method) + method.getResponseHeader("Location").getValue()); }另一种是根据javascript中的window.location.replace。
5. 输出请求/应答日志,这个对调试非常重要:
private void logGetRequest(GetMethod method) throws URIException { logger.debug("do get request: " + method.getURI().toString()); logger.debug("header:/n" + getHeadersStr(method.getRequestHeaders())); logger.debug("cookie:/n" + getCookieStr()); } private void logGetResponse(GetMethod method, String responseStr) throws URIException { logger.debug("do get response: " + method.getURI().toString()); logger.debug("header: /n" + getHeadersStr(method.getResponseHeaders())); logger.debug("body: /n" + responseStr); } private void logPostRequest(PostMethod method) throws URIException { logger.debug("do post request: " + method.getURI().toString()); logger.debug("header:/n" + getHeadersStr(method.getRequestHeaders())); logger.debug("body:/n" + getPostBody(method.getParameters())); logger.debug("cookie:/n" + getCookieStr()); } private void logPostResponse(PostMethod method, String responseStr) throws URIException { logger.debug("do post response:" + method.getURI().toString()); logger.debug("header:/n" + getHeadersStr(method.getResponseHeaders())); logger.debug("body:/n" + responseStr); } private String getHeadersStr(Header[] headers) { StringBuilder builder = new StringBuilder(); for (Header header : headers) { builder.append(header.getName()).append(": ").append(header.getValue()).append("/n"); } return builder.toString(); } private String getPostBody(NameValuePair[] postValues) { StringBuilder builder = new StringBuilder(); for (NameValuePair pair : postValues) { builder.append(pair.getName()).append(":").append(pair.getValue()).append("/n"); } return builder.toString(); } private String getCookieStr() { Cookie[] cookies = client.getState().getCookies(); StringBuilder builder = new StringBuilder(); for (Cookie cookie : cookies) { builder.append(cookie.getDomain()).append(":") .append(cookie.getName()).append("=").append(cookie.getValue()).append(";") .append(cookie.getPath()).append(";") .append(cookie.getExpiryDate()).append(";") .append(cookie.getSecure()).append(";/n"); } return builder.toString(); }
相关文章推荐
- java httpclient访问某些网页报403错误
- Java通过http访问网页及xml及文件并保存到local
- HTTP 400/401/403/404/500网页错误代码是什么意思
- 通过JAVA的net包实现JAVA http接口访问错误总结
- HTTP 400/401/403/404/500网页错误代码是什么意思
- 安卓课程表(解决利用Httpclient登录获得cookie继续访问但网页仍提示无权限请登录的问题)
- Java HttpClient访问百度地图服务
- Java通过http访问网页及xml及文件并保存到local
- http协议的状态码——400,401,403,404,500,502,503,301,302等常见网页错误代码
- java中利用开源HttpClient包抓取网页
- HTTP 400/401/403/404/500网页错误代码详解
- 科普 httpClient 403 Forbidden (JAVA方向分析)
- java httpclient 抓取网页 POST GET
- 安装openstack dashboard时网页访问HTTP500错误
- java httpclient 无证书访问 https
- http协议的状态码——400,401,403,404,500,502,503,301,302等常见网页错误代码
- java httpClient使用代理实现外网访问
- Java HttpClient 实现自动登录与获取网页信息
- http协议的状态码——400,401,403,404,500,502,503,301,302等常见网页错误代码
- httpclient 访问网页面