您的位置:首页 > 其它

从一个网页中获取所有的超链接

2008-10-31 13:40 393 查看
从一个网页中获取所有的超链接

/**
@version 1.01 2004-06-04
@author Cay Horstmann
*/

import java.io.*;
import java.net.*;
import java.util.regex.*;

/**
This program displays all URLs in a web page by
matching a regular expression that describes the
<a href=...> HTML tag. Start the program as
java HrefMatch URL
*/
public class HrefMatch
{
public static void main(String[] args)
{
try
{
// get URL string from command line or use default
String urlString;
if (args.length > 0) urlString = args[0];
else urlString = "http://java.sun.com";

// open reader for URL
InputStreamReader in = new InputStreamReader(new URL(urlString).openStream());

// read contents into string buffer
StringBuilder input = new StringBuilder();
int ch;
while ((ch = in.read()) != -1) input.append((char) ch);

// search for all occurrences of pattern
String patternString = "<a//s+href//s*=//s*(/"[^/"]*/"|[^//s>])//s*>";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);

while (matcher.find())
{
int start = matcher.start();
int end = matcher.end();
String match = input.substring(start, end);
System.out.println(match);
}
}
catch (IOException e)
{
e.printStackTrace();
}
catch (PatternSyntaxException e)
{
e.printStackTrace();
}
}
}
运行:

java HrefMatch http://www.128kj.com
<a href="linktype.jsp?type=1">
<a href="articleType_flex.html">
<a href="articleType_flex.html">
<a href="morephp.html">
<a href="morephp.html">
<a href="morejava.html">
<a href="morejava.html">
<a href="moreajax.html">
<a href="moreajax.html">
<a href="morejsp.html">
<a href="morejsp.html">
<a href="morejavascript.html">
<a href="morejavascript.html">
<a href="morexml.html">
<a href="morexml.html">
<a href="morecss.html">
<a href="morecss.html">
<a href="index7.html">
<a href="morePic.html">
<a href="moreFeibiao.html">
<a href="article/article45/logo/index0.html">
<a href="article/article46/bannerpic/index0.html">
<a href="moreTem.html">
<a href="moreTubiao.html">
<a href="moreJspSrc.html">
<a href="morePhpSrc.html">
<a href="article/article47/bg/index0.html">
<a href="morehtml.html">
<a href="morehtml.html">
<a href="morephotoshop.html">
<a href="morephotoshop.html">
<a href="http://www.miibeian.gov.cn">
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: