您的位置:首页 > 理论基础 > 计算机网络

网络爬虫抓包使用及通过表单请求

2017-08-08 11:37 183 查看
近期,有人将本人博客,复制下来,直接上传到百度文库等平台。

本文为原创博客,仅供技术学习使用。未经允许,禁止将其复制下来上传到百度文库等平台。如有转载请注明本文博客的地址(链接)

如需源码程序,请联系我。

有些网站抓包请求时,发现数据的真实地址,但在使用httpclient请求该真实地址时,却发现数据为空。该怎么办呢?以下以该网站为例进行讲解。

网站地址为:https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5

如下为数据截图:



通过抓包发现,该数据是通过json返回的,抓包获取了真实的请求地址。如下截图:



真实请求地址为:https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?

单独请求该地址时,发现返回数据为空,如下截图:



数据如下:

{"pageCount":0,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":0,"asstId":null,"addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":0,"statementPrefix":"getPageKeyBranch","totalSize":0,"labFeatureList":null,"keyNum":null,"data":[],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}


针对此问题,继续返回到抓包页面,发现还有一个表单传参,基于此分析,可设计如下程序:

package navi.main;

import java.util.ArrayList;
import java.util.List;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicHeader;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
/**
* @author:合肥工业大学 管理学院 钱洋
* @email:1563178220@qq.com
* @
*/
public class Test {

public static void main(String[] args) throws Exception {
DefaultHttpClient client = new DefaultHttpClient();
String newUrl = "https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?";
HttpPost post = new HttpPost(newUrl);
//设置参数,可有可无,并不是最关键的
post.addHeader(new BasicHeader("Cookie",
"JSESSIONID=0000qty6OnqsYHgBdc3VKzr4zbI:1a5s8ura0"));
post.addHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
post.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36");
post.addHeader("Host", "las.cnas.org.cn");
post.addHeader("Accept", "*/*");
post.addHeader("Accept-Language", "zh-CN,zh;q=0.8");
post.addHeader("X-Requested-With", "XMLHttpRequest");
post.addHeader("Referer", "https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5");
post.addHeader("Origin", "https://las.cnas.org.cn");
//表单传参数,关键的,必不可少
List<NameValuePair> list=new ArrayList<NameValuePair>();
list.add(new BasicNameValuePair("asstId", "3ee5aa672cbf44d0a2d9906b2bae70c5"));
post.setEntity(new UrlEncodedFormEntity(list));
org.apache.http.HttpResponse httpResponse = client.execute(post);
String responseString = EntityUtils.toString(httpResponse.getEntity());
System.out.println(responseString);
}
}


如下,为程序返回的数据:

{"pageCount":1,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":1,"asstId":"3ee5aa672cbf44d0a2d9906b2bae70c5","addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":1,"statementPrefix":"getPageKeyBranch","totalSize":1,"labFeatureList":null,"keyNum":null,"data":[{"remark":null,"addpost":null,"isModify":null,"keyNum":1,"labFeatureList":[{"baseInfoId":null,"branchId":null,"createBy":null,"createTs":null,"feature":"101001","id":"1974ed78b9a8409ba1ddd9dbc349098c","isModify":null,"labfeatureId":"1974ed78b9a8409ba1ddd9dbc349098c","other":null,"otherEn":null,"sourceId":null,"sqlUpdateType":null,"updateBy":null,"updateTs":null}],"mainactivity":"177001, 177003, 177004, 177005","mainActivityOther":null,"remarkEn":null,"labfeature":null,"addEn":"Bioassay and Safety Assessment Building, No.1500, Zhangheng Road, Zhangjiang Hi-Tech Park, Pudong New District, Shanghai, China","addCn":"上海市浦东新区张江高科技园区张衡路1500号生物与安全检测楼","postCode":"201203","addcode":null,"asstId":null,"mainActivityOtherEn":null,"labFeatureJson":"[{\"feature\":\"101001\"}]","nameCn":"上海市检测中心生物与安全检测实验室","primaryRecommend":null,"branchId":"3b00ef1f777247e1a2abd6e4b51ea1a8"}],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐