您的位置:首页 > 其它

利用淘宝IP库分析web日志来源分布

2016-06-18 14:23 429 查看
web访问日志中含有来访IP,通过IP查看归属地,最后统计访问的区域分布,可细化到省、市

淘宝接口地址:http://ip.taobao.com/service/getIpInfo.php?ip=14.215.177.38,后面的IP按需修改
例如要查看14.215.177.38这个地址的相关信息,返回的信息如下:
{"code":0,"data":
{"country":"\u4e2d\u56fd",
"country_id":"CN",
"area":"\u534e\u5357",
"area_id":"800000",
"region":"\u5e7f\u4e1c\u7701",
"region_id":"440000",
"city":"\u5e7f\u5dde\u5e02",
"city_id":"440100",
"county":"",
"county_id":"-1",
"isp":"\u7535\u4fe1",
"isp_id":"100017",
"ip":"14.215.177.38"}
}
返回内容以字典形式保存,code表示查询状态(0为成功,1为失败),具体的信息有:所属国家、区域、省份、市、所属运营商。由于用unicode编码,中文保存成\u4e2d等形式,使用unicode转中文工具即可查看其中的内容。

要求,分析访问IP的所属省份(国外IP划分在一起),分析各个省份分布比例。日志中的IP先处理保存成次数+IP的格式:



代码如下:
#!/usr/bin/env python
#coding:utf-8
from __future__ import division
import urllib2
bs_url = "

# 定义一个全局字典,用来存放最终的统计数据,保存格式{'省份':{'IP':次数,...},...}
region_dic = { }

# 用于获取IP信息的函数,并计入以上的字典
def get_data(IP,WIGHT=1):
city = ""
area = ""
country = ""
region = ""
isp = ""
request = urllib2.Request(bs_url+IP)
reponse = urllib2.urlopen(request)
#print result
result = eval(reponse.read())
#print result

code = result['code']
country_id = result['data']['country_id']
#print country_id
if code == 0:
if country_id == 'CN':
city = result['data']['city'].decode('unicode-escape')
area = result['data']['area'].decode('unicode-escape')
country = result['data']['country'].decode('unicode-escape')
region = result['data']['region'].decode('unicode-escape')
isp = result['data']['isp'].decode('unicode-escape')
else:
region = u"国外"
#print region
if region not in region_dic.keys():
region_dic['%s'%region] = { }
region_dic['%s'%region]['%s'%IP] = int(WIGHT)
else:
print "request error"
#print "IP:%s\nCity:%s\nArea:%s\nCountry:%s\nRegion:%s\nISP:%s"%(IP,city,area,country,region,isp)

if __name__ == '__main__':
count = -1
ip_list = []
fo = open('ips.txt','r')
# 要分析的IP保存在文件中
for line in fo.xreadlines():
wight,ip = line.strip().split()
get_data(ip,wight)
count += int(wight)
fo.close()

print u'合计:'
for regions,stats in region_dic.items():
times = 0
for time in stats.values():
times += time
print "%s:%.2f %%"%(regions.encode('utf-8'),int(times)/count)


运行结果:




注:其他可用的IP库接口:
新浪接口 http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=js&ip=14.215.177.38
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  淘宝 IP库 接口调用