LintCode Url Parser
2016-06-26 09:25
691 查看
原题网址:http://www.lintcode.com/en/problem/url-parser/
Parse a html page, extract the Urls in it.
Hint: use regex to parse html.
Have you met this question in a real interview?
Yes
Example
Given the following html page:
You should return the Urls in it:
方法:正则表达式,重点是各种奇葩的情况。
Parse a html page, extract the Urls in it.
Hint: use regex to parse html.
Have you met this question in a real interview?
Yes
Example
Given the following html page:
<html> <body> <div> <a href="http://www.google.com" class="text-lg">Google</a> <a href="http://www.facebook.com" style="display:none">Facebook</a> </div> <div> <a href="https://www.linkedin.com">Linkedin</a> <a href = "http://github.io">LintCode</a> </div> </body> </html>
You should return the Urls in it:
[ "http://www.google.com", "http://www.facebook.com", "https://www.linkedin.com", "http://github.io" ]
方法:正则表达式,重点是各种奇葩的情况。
import java.util.regex.Matcher; import java.util.regex.Pattern; public class HtmlParser { // Pattern pattern1 = Pattern.compile("(href\\s*=\\s*\")([^\"]*?)(\")", Pattern.CASE_INSENSITIVE); // Pattern pattern2 = Pattern.compile("(href\\s*=\\s*')([^']*?)(')", Pattern.CASE_INSENSITIVE); Pattern pattern = Pattern.compile("(href\\s*=\\s*[\"']?)([^\"'\\s>]*)([\"'>\\s])", Pattern.CASE_INSENSITIVE); /** * @param content source code * @return a list of links */ public List<String> parseUrls(String content) { // Write your code here List<String> results = new ArrayList<>(); Matcher matcher = pattern.matcher(content); match(matcher, results); return results; } private void match(Matcher matcher, List<String> results) { while (matcher.find()) { String url = matcher.group(2); if (url.length() == 0 || url.startsWith("#")) continue; results.add(url); } } }
相关文章推荐
- LintCode Majority Number iii
- LintCode-最大数
- lintCode Intersection of Two Arrays II
- Intersection of Two Arrays
- Subtree
- LintCode --number-of-airplanes-in-the-sky(数飞机)
- LintCode --invert-binary-tree(翻转二叉树)
- LintCode --find-the-missing-number(寻找缺失的数)
- LintCode--best-time-to-buy-and-sell-stock(买卖股票的最佳时机)
- LintCode--best-time-to-buy-and-sell-stock-ii(买卖股票的最佳时机 II)
- lintcode之不同子序列数 + 序列II
- lintcode之快速幂
- lintcode删除排序数组中的重复数字 II
- lintcode之数组划分
- leetcode之三数之和 II
- lintcode之 数组剔除元素后的乘积
- lintcode之 合并排序数组 II
- lintcode 之子数组之和
- lintcode之最长公共前缀
- lintcode 之 最长公共子串