使用php simple html dom parser解析html标签
2009-04-24 08:24
627 查看
使用php simple html dom parser解析html标签
用了一下PHP Simple HTML DOM Parser
解析HTML页面,感觉还不错,它能创建一个DOM tree方便你解析html里面的内容。用来抓东西挺好的。附带一个例子,你也到sourceforge下载压缩包看里面的例子:
Scraping data with PHP Simple HTML DOM Parser
PHP Simple HTML DOM Parser , written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts using complicated regexes to extract information from web pages.Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website:
view plain copy to clipboard print ?
Php代码
![](http://cai555.javaeye.com/images/icon_copy.gif)
// Create DOM from URL or file
$html = file_get_html('http://www.microsoft.com/');
// Extract links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
// Extract images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Create DOM from URL or file $html = file_get_html('http://www.microsoft.com/'); // Extract links foreach($html->find('a') as $element) echo $element->href . '<br>'; // Extract images foreach($html->find('img') as $element) echo $element->src . '<br>';
The parser can also be used to modify HTML elements:
view plain copy to clipboard print ?
Php代码
![](http://cai555.javaeye.com/images/icon_copy.gif)
// Create DOM from string
$html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=simple]', 0)->innertext = 'Foo';
// Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
echo $html;
// Create DOM from string $html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>'); $html->find('div', 1)->class = 'bar'; $html->find('div[id=simple]', 0)->innertext = 'Foo'; // Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div> echo $html;
Do you wish to retrieve content without any tags?
view plain copy to clipboard print ?
Php代码
![](http://cai555.javaeye.com/images/icon_copy.gif)
echo file_get_html('http://www.yahoo.com/')->plaintext;
echo file_get_html('http://www.yahoo.com/')->plaintext;
In the package files of this parser (http://simplehtmldom.sourceforge.net/) you can find some scraping examples from digg, imdb, slashdot. Let’s create one that extracts the first 10 results (titles only) for the keyword “php” from Google:
view plain copy to clipboard print ?
Php代码
![](http://cai555.javaeye.com/images/icon_copy.gif)
$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';
// Create DOM from URL
$html = file_get_html($url);
// Match all 'A' tags that have the class attribute equal with 'l'
foreach($html->find('a[class=l]') as $key => $info)
{
echo ($key + 1).'. '.$info->plaintext."<br />\n";
}
$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search'; // Create DOM from URL $html = file_get_html($url); // Match all 'A' tags that have the class attribute equal with 'l' foreach($html->find('a[class=l]') as $key => $info) { echo ($key + 1).'. '.$info->plaintext."<br />\n"; }
NOTE Make sure to include the parser before using any functions of it:
view plain copy to clipboard print ?
Php代码
![](http://cai555.javaeye.com/images/icon_copy.gif)
include 'simple_html_dom.php';
include 'simple_html_dom.php';
For more information regarding the usage of this function consider checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the package files use the following URL: http://sourceforge.net/project/showfiles.php?group_id=218559 .
相关文章推荐
- 使用php simple html dom parser解析html标签
- 使用所见即所得文本编辑器编辑文本存入数据库后通过ajax获取服务器json_encode的数据到前台,文本内容上边的html标签不解析
- Java解析HTML之HTMLParser使用与详解
- 使用HTMLParser 解析html字符串,去除html标签,提取纯文本
- Java解析HTML之HTMLParser使用与详解
- 使用PHP Simple HTML DOM像jQuery一样操作html文档
- Java解析HTML之HTMLParser使用与详解
- Java解析HTML之HTMLParser使用与详解
- Java解析HTML之HTMLParser使用与详解
- 【转】使用Python中HTTPParser模块进行简单的html解析
- PHP Simple HTML DOM Parser 強力解析html工具
- Java解析HTML之HTMLParser使用与详解
- Java解析HTML之HTMLParser使用与详解
- Java解析HTML之HTMLParser使用与详解
- Java解析HTML之HTMLParser使用与详解
- 使用libxml解析HTML -- DTHTMLParser
- 如何在<textarea>标签中使用并解析HTML标签
- Java解析HTML之HTMLParser使用与详解
- Java解析HTML之HTMLParser使用与详解
- 使用python3 解析html对称标签