您的位置:首页 > 编程语言 > Python开发

检查json日志的python脚本

2018-01-05 22:23 253 查看
        工作中,常有其他部门或者其它公司打过来的json格式的日志需要处理,往往是在解析的过程中发现了问题,比如格式错误,缺少字段等,再让研发去改,改完再检查,一来一回耽误时间也比较被动,所以写了一个python的脚本,可以发给研发进行自检,减少后面的麻烦。

json格式检查

        第一步是用python解析json串,常有嵌套的json在拼接时不规范的情况,比如多加了双引号,对于这样的问题,可用如下的代码检查

import json

try:
data = json.loads(line.strip())  #line是待检查的json串
except:
print "json格式有误,请检查===>",line
continue


日期格式的检查

有时日期格式不规范,可用如下方式检查

import time

try:
dt = data['dt']
time.strptime(dt,"%Y-%m-%d %H:%M:%S")
except:
print "日期时间字段(dt)格式有误,请检查===>",line


禁用中文的检查

有时,一些字段,不希望出现中文,可用如下方式

import re

zh_pattern = re.compile(u'[\u4e00-\u9fa5]+')
tmp = zh_pattern.search(data['z'])
if tmp:
print "z字段中,含有中文,请使用英文===>",line


必要字段的检查

日志要求有若干字段,且不能为空,此时可用如下方式检查

check = ['a','b','c']; tmp = ""

for tag in check:
if tag not in data or data[tag] is None or data[tag] == '':
tmp += tag+","
if len(tmp)>0:
tmp = tmp[:-1]
print tmp,"字段缺失或值为空,请检查===>",line


综合以上内容,完整的脚本(check.py)如下

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import sys
import re
import json
import time
reload(sys)
sys.setdefaultencoding('utf-8')

def checklog():
check = ['a','b','c']

cnt = 0; json_error = 0; dt_error = 0; zh_error = 0; miss_error = 0

for line in sys.stdin:
if not line or not line.strip():
continue
line = "".join(i for i in line if ord(i)>31) #去除特殊字符
cnt += 1

# json格式
try:
data = json.loads(line.strip())
except:
print "json格式有误,请检查===>",line
json_error += 1
continue

# dt字段
try:
dt = data['dt']
time.strptime(dt,"%Y-%m-%d %H:%M:%S")
except:
print "日期时间字段(dt)格式有误,请检查===>",line
dt_error += 1

# 禁用中文字段
zh_pattern = re.compile(u'[\u4e00-\u9fa5]+')
tmp = zh_pattern.search(data['z'])
if tmp:
print "z字段中含有中文,请使用英文===>",line
zh_error += 1

# 其他必要字段

tmp = ""
for tag in check:
if tag not in data or data[tag] is None or data[tag] == '':
tmp += tag+","
if len(tmp)>0:
tmp = tmp[:-1]
print tmp,"字段缺失或值为空,请检查===>",line
miss_error += 1

print '===================完成==================='
print '本次检查共%d条日志, json格式错误%d条,dt字段错误%d条,z字段错误或缺失%d条,其他必要字段缺失%d条'%(cnt,json_error,dt_error,zh_error,miss_error)

if __name__=='__main__':

checklog()


使用下面的文件 (t.txt) 进行测试

{"dt":"2017-11-02 11:11:11","z":"hello","a":1,"b":2,"c":3,"js":{"d":4}}
{"dt":"2017-11-02","z":"hello","a":1,"b":2,"c":3,"js":{"d":4}}
{"dt":"2017-11-02 11:11:11","z":"中","a":1,"b":2,"c":3,"js":{"d":4}}
{"dt":"2017-11-02 11:11:11","z":"hello","a":1,"b":2,"c":3,"js":"{"d":4}"}
{"dt":"2017-11-02 11:11:11","z":"hello","a":1,"js":{"d":4}}


会有如下的输出

cat t.txt | python check.py
日期时间字段(dt)格式有误,请检查===> {"dt":"2017-11-02","z":"hello","a":1,"b":2,"c":3,"js":{"d":4}}
z字段中含有中文,请使用英文===> {"dt":"2017-11-02 11:11:11","z":"中","a":1,"b":2,"c":3,"js":{"d":4}}
json格式有误,请检查===> {"dt":"2017-11-02 11:11:11","z":"hello","a":1,"b":2,"c":3,"js":"{"d":4}"}
b,c 字段缺失或值为空,请检查===> {"dt":"2017-11-02 11:11:11","z":"hello","a":1,"js":{"d":4}}
===================完成===================
本次检查共5条日志, json格式错误1条,dt字段错误1条,z字段错误或缺失1条,其他必要字段缺失1条
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: