利用httpclient+jsoup解析页面
2012-03-24 15:03
316 查看
步骤:
1. 设置url:HttpPost httpPost = new HttpPost(String url);
//当url带参数时使用 HttpGet httpget = new HttpGet(url);
2. 设置参数(使用HttpGet时无需设置):
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair(String arg0, String arg0Value));
params.add......
httpPost.setEntity(new UrlEncodedFormEntity(params,"GB2312"));
3.执行请求:
HttpClient httpClient = new DefaultHttpClient();
HttpResponse rps0 = httpClient.execute(httpPost);
//可以利用返回码判断请求是否成功再在if内部实现下一步
int resStatu = responce.getStatusLine().getStatusCode();// 返回码
if (resStatu == HttpStatus.SC_OK) {
}
4.获取html:
HttpEntity entity0 = rps0.getEntity();
String html = EntityUtils.toString(entity0);
5.关闭连接:
httpClient.getConnectionManager().shutdown();
6.解析html:
Document doc = Jsoup.parse(html);
7.其他
如果拿到的html是乱码 要进行转码
Document doc = Jsoup.parse(html);
Element e = doc.getElementsByTag("meta").first();
if(e != null){
String content = "";
String charset = "";
if(e.attr("content") != null && e.attr("content") != ""){
content = e.attr("content");
charset = content.substring(content.indexOf("=")+1);
}
else if(e.attr("charset") != null && e.attr("charset") != "")charset = e.attr("charset");
else charset = "GBK";
System.out.println(charset);
text = new String(html.getBytes("ISO-8859-1"),charset);
//// System.out.println(content.substring(content.indexOf("=")+1));
//// System.out.println(new String(html.getBytes("ISO-8859-1"),content.substring(content.indexOf("=")+1)));
}
else
{text = new String(html.getBytes("ISO-8859-1"),"GBK");}//如果拿不到原页面的编码格式,默认为GBK
1. 设置url:HttpPost httpPost = new HttpPost(String url);
//当url带参数时使用 HttpGet httpget = new HttpGet(url);
2. 设置参数(使用HttpGet时无需设置):
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair(String arg0, String arg0Value));
params.add......
httpPost.setEntity(new UrlEncodedFormEntity(params,"GB2312"));
3.执行请求:
HttpClient httpClient = new DefaultHttpClient();
HttpResponse rps0 = httpClient.execute(httpPost);
//可以利用返回码判断请求是否成功再在if内部实现下一步
int resStatu = responce.getStatusLine().getStatusCode();// 返回码
if (resStatu == HttpStatus.SC_OK) {
}
4.获取html:
HttpEntity entity0 = rps0.getEntity();
String html = EntityUtils.toString(entity0);
5.关闭连接:
httpClient.getConnectionManager().shutdown();
6.解析html:
Document doc = Jsoup.parse(html);
7.其他
如果拿到的html是乱码 要进行转码
Document doc = Jsoup.parse(html);
Element e = doc.getElementsByTag("meta").first();
if(e != null){
String content = "";
String charset = "";
if(e.attr("content") != null && e.attr("content") != ""){
content = e.attr("content");
charset = content.substring(content.indexOf("=")+1);
}
else if(e.attr("charset") != null && e.attr("charset") != "")charset = e.attr("charset");
else charset = "GBK";
System.out.println(charset);
text = new String(html.getBytes("ISO-8859-1"),charset);
//// System.out.println(content.substring(content.indexOf("=")+1));
//// System.out.println(new String(html.getBytes("ISO-8859-1"),content.substring(content.indexOf("=")+1)));
}
else
{text = new String(html.getBytes("ISO-8859-1"),"GBK");}//如果拿不到原页面的编码格式,默认为GBK
相关文章推荐
- AndroidHttpClient & jsoup 解析 正方教务系统
- Android开发之利用jsoup解析HTML页面的方法
- httpClient获取Jsoup解析网页
- HttpClient连接网页,Jsoup解析网页
- HttpClient+Jsoup模拟登陆,解析HTML,信息筛选(广工图书馆)
- HtmlUnit、Httpclient、Jsoup爬取网页信息并解析
- HtmlUnit、httpclient、jsoup爬取网页信息并解析
- HtmlUnit、httpclient、jsoup爬取网页信息并解析
- Java爬虫学习:利用HttpClient和Jsoup库实现简单的Java爬虫程序
- HtmlUnit、httpclient、jsoup爬取网页信息并解析
- HttpClient解析页面总结
- 利用HttpClient写的一个简单页面获取
- Android开发探秘之三:利用jsoup解析HTML页面
- 利用 jsoup 解析 html内容
- http协议,httpclient、jsoup
- jquery利用json实现页面之间传值的实例解析
- 关于jsoup解析http文档
- httpclient通过POST来上传文件,而不是通过流的形式,并在服务端进行解析(通过htt...
- 使用java7的try-resource-wi 3ff0 th语法用httpclient抓取网页并用jsoup获取网页对象
- 使用httpClient MultipartRequestEntity文件上传解析文件和普通表单参数