给爬到的网址链接加入“朋友值”
2016-11-18 16:34
337 查看
def compute_ranks(graph): print graph d = 0.8 # damping factor numloops = 10 ranks = {} npages = len(graph) for page in graph: ranks[page] = 1.0 / npages for i in range(0, numloops): newranks = {} for page in graph: newrank = (1 - d) / npages for node in graph: if page in graph[node]: newrank = newrank + d*(ranks[node] / len(graph[node])) newranks[page] = newrank ranks = newranks return ranks cache = { 'http://udacity.com/cs101x/urank/index.html': """<html> <body> <h1>Dave's Cooking Algorithms</h1> <p> Here are my favorite recipies: <ul> <li> <a href="http://udacity.com/cs101x/urank/hummus.html">Hummus Recipe</a> <li> <a href="http://udacity.com/cs101x/urank/arsenic.html">World's Best Hummus</a> <li> <a href="http://udacity.com/cs101x/urank/kathleen.html">Kathleen's Hummus Recipe</a> </ul> For more expert opinions, check out the <a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a> and <a href="http://udacity.com/cs101x/urank/zinc.html">Zinc Chef</a>. </body> </html> """, 'http://udacity.com/cs101x/urank/zinc.html': """<html> <body> <h1>The Zinc Chef</h1> <p> I learned everything I know from <a href="http://udacity.com/cs101x/urank/nickel.html">the Nickel Chef</a>. </p> <p> For great hummus, try <a href="http://udacity.com/cs101x/urank/arsenic.html">this recipe</a>. </body> </html> """, 'http://udacity.com/cs101x/urank/nickel.html': """<html> <body> <h1>The Nickel Chef</h1> <p> This is the <a href="http://udacity.com/cs101x/urank/kathleen.html"> best Hummus recipe! </a> </body> </html> """, 'http://udacity.com/cs101x/urank/kathleen.html': """<html> <body> <h1> Kathleen's Hummus Recipe </h1> <p> <ol> <li> Open a can of garbonzo beans. <li> Crush them in a blender. <li> Add 3 tablesppons of tahini sauce. <li> Squeeze in one lemon. <li> Add salt, pepper, and buttercream frosting to taste. </ol> </body> </html> """, 'http://udacity.com/cs101x/urank/arsenic.html': """<html> <body> <h1> The Arsenic Chef's World Famous Hummus Recipe </h1> <p> <ol> <li> Kidnap the <a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>. <li> Force her to make hummus for you. </ol> </body> </html> """, 'http://udacity.com/cs101x/urank/hummus.html': """<html> <body> <h1> Hummus Recipe </h1> <p> <ol> <li> Go to the store and buy a container of hummus. <li> Open it. </ol> </body> </html> """, } def crawl_web(seed): # returns index, graph of inl 4000 inks tocrawl = [seed] crawled = [] graph = {} # <url>, [list of pages it links to] index = {} while tocrawl: page = tocrawl.pop() if page not in crawled: content = get_page(page) add_page_to_index(index, page, content) outlinks = get_all_links(content) graph[page] = outlinks union(tocrawl, outlinks) crawled.append(page) return index, graph def get_page(url): if url in cache: return cache
最后这个视频是加入了一个算法,给搜索到的网址加入了一个值,代表他的“朋友值”,看了之后不是很理解
但是本人感觉就是,谁出现在别人的页面里多,朋友值越大,越能被搜索到。
全部代码如上,等以后想翻看再来看看。。。
再贴一次视频地址:计算机科学导论" target=_blank>
else:
return None
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
if keyword in index:
index[keyword].append(url)
else:
index[keyword] = [url]
def lookup(index, keyword):
if keyword in index:
return index[keyword]
else:
return None
index, graph = crawl_web('http://udacity.com/cs101x/urank/index.html')
ranks = compute_ranks(graph)
print ranks[/code]
最后这个视频是加入了一个算法,给搜索到的网址加入了一个值,代表他的“朋友值”,看了之后不是很理解
但是本人感觉就是,谁出现在别人的页面里多,朋友值越大,越能被搜索到。
全部代码如上,等以后想翻看再来看看。。。
再贴一次视频地址:[url=https://cn.udacity.com/course/intro-to-computer-science--cs101]计算机科学导论
有些没有中文字幕,比较坑爹。
正好又看到一个人的简书笔记 :[url=http://www.jianshu.com/p/229936a65a35]简书
可以参考一下。
相关文章推荐
- vc++编程之在程序中加入网址链接
- 资源链接网址
- android---TextView中电话号码、网址自动链接的实现方法
- 网站加入QQ聊天链接
- 当你输入一个网址/点击一个链接,发生了什么?(以www.baidu.com为例)
- 分割网址/链接/url变量
- 庆祝"西安.NET俱乐部"成立,请申请加入的朋友在这里报到
- 网络爬虫初步:从一个入口链接开始不断抓取页面中的网址并入库
- 自动查找与自己博客文章相关的文章,并将其链接加入到自己的文章中,以增加外链
- dede v57跳转网址直接链接而非直接中转的PHP文件
- Silverlight3 网址链接转向
- 如何添加页面链接网址的图标
- Delphi打开网址链接的几种方法
- 启动开源项目, 希望有兴趣的朋友加入或提些意见.
- Java正则表达式获取网址和链接文字解析
- 王家林老师广收大数据门徒,有志同道合或想了解大数据的朋友,请加入我们团队吧
- 原网址链接:http://stormzhang.com/devtools/2014/12/18/android-studio-tutorial4/
- 将TCE链接加入新工作通知(NewWorkAssignment,Sig)邮件中
- php自动给网址加上链接的方法
- word2010文档中的网址链接显示的是乱码