fsockopen 抓取网页内容
2009-08-27 15:33
134 查看
fsockopen 抓取网页内容:
function get_page_content($url){
$url = eregi_replace('^http://', '', $url);
$temp = explode('/', $url);
$host = array_shift($temp);
$path = '/'.implode('/', $temp);
$temp = explode(':', $host);
$host = $temp[0];
$port = isset($temp[1]) ? $temp[1] : 80;
$fp = @fsockopen($host, $port, &$errno, &$errstr, 30);
if ($fp){
@fputs($fp, "GET $path HTTP/1.1/r/nHost: $host/r/nAccept: */*/r/nReferer:$url/r/nUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)/r/nConnection: Close/r/n/r/n");
}
$Content = '';
while ($str = @fread($fp, 4096)){
$Content .= $str;
}
@fclose($fp);
//重定向
if(preg_match("/^HTTP///d./d 301 Moved Permanently/is",$Content)){
if(preg_match("/Location:(.*?)/r/n/is",$Content,$murl)){
return get_page_content($murl[1]);
}
}
//读取内容
if(preg_match("/^HTTP///d./d 200 OK/is",$Content)){
preg_match("/Content-Type:(.*?)/r/n/is",$Content,$murl);
$contentType=trim($murl[1]);
$Content=explode("/r/n/r/n",$Content,2);
$Content=$Content[1];
}
return $Content;
}
print_r(get_page_content('www.google.cn'));
function get_page_content($url){
$url = eregi_replace('^http://', '', $url);
$temp = explode('/', $url);
$host = array_shift($temp);
$path = '/'.implode('/', $temp);
$temp = explode(':', $host);
$host = $temp[0];
$port = isset($temp[1]) ? $temp[1] : 80;
$fp = @fsockopen($host, $port, &$errno, &$errstr, 30);
if ($fp){
@fputs($fp, "GET $path HTTP/1.1/r/nHost: $host/r/nAccept: */*/r/nReferer:$url/r/nUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)/r/nConnection: Close/r/n/r/n");
}
$Content = '';
while ($str = @fread($fp, 4096)){
$Content .= $str;
}
@fclose($fp);
//重定向
if(preg_match("/^HTTP///d./d 301 Moved Permanently/is",$Content)){
if(preg_match("/Location:(.*?)/r/n/is",$Content,$murl)){
return get_page_content($murl[1]);
}
}
//读取内容
if(preg_match("/^HTTP///d./d 200 OK/is",$Content)){
preg_match("/Content-Type:(.*?)/r/n/is",$Content,$murl);
$contentType=trim($murl[1]);
$Content=explode("/r/n/r/n",$Content,2);
$Content=$Content[1];
}
return $Content;
}
print_r(get_page_content('www.google.cn'));
相关文章推荐
- .Net中抓取网页内容
- JAVA 抓取网页内容
- PHP抓取网页内容汇总
- ASP.NET抓取网页内容
- ASP利用XMLHTTP抓取网页内容
- HttpClient抓取网页内容简单介绍
- HTTPCLIENT抓取网页内容
- Asp 使用 Microsoft.XMLHTTP 抓取网页内容无乱码处理,并过滤须要的内容
- ASP.NET 抓取网页内容
- php抓取网页内容的方法
- 利用Python和Beautiful Soup抓取网页内容
- ASP利用XMLHTTP抓取网页内容
- 【转】C#用HttpWebRequest通过代理服务器验证后抓取网页内容
- 利用PyQt抓取含有JavaScript执行结果的网页内容?
- 抓取网页文本内容
- 抓取网页内容
- ObjC利用正则表达式抓取网页内容(网络爬虫)
- 使用PHP简单网页抓取和内容分析
- 如何通过VC的 CHttpFile 抓取网页内容
- 抓取网页内容生成kindle电子书