您的位置:首页 > 其它

ELK——Logstash 2.2 mutate 插件【翻译+实践】

2016-05-17 17:24 393 查看
官网地址

本文内容

语法
测试数据
可选配置项
mutate 插件可以在字段上执行变换,包括重命名、删除、替换和修改。这个插件相当常用。

比如:

你已经根据 Grok 表达式将 Tomcat 日志的内容放到各个字段中,想把状态码、字节大小或是响应时间,转换成整型;
你已经根据正则表达式将日志内容放到各个字段中,但是字段的值,大小写都有,这对于 Elasticsearch 的全文检索来说,显然用处不大,那么可以用该插件,将字段内容全部转换成小写。

语法

该插件必须是用 mutate 包裹,如下所示:

[code]mutate {}

[/code]

可用的配置选项如下表所示:

设置输入类型是否必填默认值
add_fieldhashNo{}
add_tagarrayNo[]
converthashNo 
gsubarrayNo 
joinhashNo 
lowercasearrayNo 
mergehashNo 
periodic_flushbooleanNofalse
remove_fieldarrayNo[]
remove_tagarrayNo[]
renamehashNo 
replacehashNo 
splithashNo 
striparrayNo 
updatehashNo 
uppercasearrayNo 

其中,add_field、remove_field、add_tag、remove_tag 是所有 Logstash 插件都有。它们在插件过滤成功后生效。虽然 Logstash 叫过滤,但不仅仅过滤功能。

tag 作用是,当你对字段处理期间,还期望进行后续处理,就先作个标记。Logstash 有个内置 tags 数组,包含了期间产生的 tag,无论是 Logstash 自己产生的,还是你添加的,比如,你用 grok 解析日志,但是错了,那么 Logstash 自己就会自己添加一个 _grokparsefailure 的 tag。这样,你在 output 时,可以对解析失败的日志不做任何处理;

而 field 作用是,对字段的操作,比如,你想利用已有的字段,创建新的字段。这些在后面再说。




另外,你会发现,上表中所有选项,要么是动词,要么是动宾短语。估计你也猜到了,选项其实就是 ruby 函数,而它们后面,即“=>”,跟着的肯定是一堆参数(要是你写程序,你也会这么干)。第一个参数,肯定是字段,也就是你期望该函数作用在哪个字段上,从第二个字段开始往后,是具体参数~

什么是字段?比如,你想解析 Tomcat 日志,把一行访问日志拆分后,得到客户端IP、字节大小、响应时间等放到指定变量,那么这个变量就是字段。

下面具体介绍各个选项。



测试数据

假设有 Tomcat access 日志:

[code]192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/goLogin" "" 8080 200 1692 23 "http://10.1.8.193:8080/goMain" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"


192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/js/common/jquery-1.10.2.min.js" "" 8080 304 - 67 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"


192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/css/common/login.css" "" 8080 304 - 75 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"


192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/js/system/login.js" "" 8080 304 - 53 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"

[/code]

它是按如下 Tomcat 配置产生的:

[code]<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"


prefix="localhost_access_log." suffix=".txt"


pattern="%h %l %u %t %m "%U" "%q" %p %s %b %D "%{Referer}i" "%{User-Agent}i"" />

[/code]

若用如下 Grok 表达式解析该日志:

[code]%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}

[/code]

会得到如下结果:

[code]{


"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"@version" => "1",


"@timestamp" => "2016-05-17T08:26:07.794Z",


"host" => "vcyber",


"clientip" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => "8080",


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""


}

[/code]

注意,日志拆分到各个字段后的数据类型。port、statusCode、bytes、reqTime 字段肯定是(最好是)数字,不过这里暂时先用字符串。后面会介绍,下面的示例都在此基础上。

可配置选项

add_field

值是散列,就是键值对,比如 add_field => {"field1"=>"value1","field2"=>"value2"}。

默认值是空对象,即
{}


添加新的字段。

示例:

[code]input {


stdin {


}


}


filter {


grok {


match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]


}


mutate {


                add_field=>{


                         "SayHi"=>"Hello , %{clientip}"


}


}


}


output{


stdout{


codec=>rubydebug


}


}

[/code]

注意黑体部分,如果用这个配置,解析前面的 Tcomat access 日志,会得到如下结果:


[code]{


"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"@version" => "1",


"@timestamp" => "2016-05-17T04:52:02.031Z",


"host" => "vcyber",


"clientip" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => "8080",


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


            "SayHi" => "Hello , 192.168.6.25"


}

[/code]

你会看到多了一个 SayHi 字段。这个字段是写死的,当然也可以动态。如果将


[code]"SayHi"=>"Hello , %{clientip}"

[/code]

改成:


[code]"another_%{clientip}"=>"Hello , %{clientip}"

[/code]

你会看到如下结果:


[code]{


       "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


       "@version" => "1",


"@timestamp" => "2016-05-17T06:38:04.427Z",


       "host" => "vcyber",


       "clientip" => "192.168.6.25",


       "identd" => "-",


       "auth" => "-",


       "timestamp" => "24/Apr/2016:01:25:53 +0800",


 "http_method" => "GET",


       "request" => "\"/goLogin\"",


   "request_query" => "\"\"",


       "port" => "8080",


"statusCode" => "200",


       "bytes" => "1692",


       "reqTime" => "23",


       "referer" => "\"http://10.1.8.193:8080/goMain\"",


       "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


    "another_192.168.6.25" => "Hello , 192.168.6.25"


}

[/code]

虽然这个例子不太合理,但你现在知道,用已有字段的值,可以生成新的字段和它的值。


上面示例只添加了一个字段,你也可以添加多个字段:


[code]add_field=>{


"another_%{clientip}"=>"Hello , %{clientip}"


"another_%{http_method}"=>"Hello, %{http_method}"


}

[/code]

add_tag

值是 array 数组

默认值为空数组,即
[]


添加新的标签。

示例:

[code]mutate {


add_tag=>[


"foo_%{clientip}"


]


}

[/code]

你会看到如下结果:


[code]{


"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"@version" => "1",


"@timestamp" => "2016-05-17T06:48:43.278Z",


"host" => "vcyber",


"clientip" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => "8080",


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"tags" => [


[0] "foo_192.168.6.25"


]


}

[/code]

与 add_field 类似,也可以一次添加多个 tags。


注意,add_tag 是数组 [],不是 {}。


convert

值是 hash

无默认值

数据类型转换。

如果要转换成
boolean,那么可接受的数据是:


true
,
t
,
yes
,
y
, 和
1


false
,
f
,
no
,
n
, 和
0


另外,还可转换成 integer, float, string。

示例:

[code]mutate {


#convert=>["reqTime","integer","statusCode","integer","bytes","integer"]


convert=>{"port"=>"integer"}


}

[/code]

convert 有两种写法。一种是用数组,两个为一组;另一种是散列。得到如下结果:

[code]{


"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"@version" => "1",


"@timestamp" => "2016-05-17T09:06:25.360Z",


"host" => "vcyber",


"clientip" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => 8080,


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""


}

[/code]


注意,

port 字段,已经没有双引号啦。

mutate 插件选项的值类型设计得很简单,要么是散列(键值对),要么数组……比如,convert=>["reqTime","integer","statusCode","integer"],两个为一组,第一个表示字段,第二个为想转换的数据类型,并没有采用嵌套或是复合类型。看来作者的意图是——简单,复杂的数据类型,虽然看起来容易,但要付出成本的。简单没关系,约定好就行。Logstash 很多插件和其选项都这样。



gsub

值是 array 数组

无默认值

字符串替换。用正则表达式和字符串都行。它只能用于字符串,如果不是字符串,那么什么都不会做,也不会报错。

该配置的值是数组,三个为一组,分别表示:字段名称,待匹配的字符串(或正则表达式),待替换的字符串。

示例:在解析 Tomcat 日志,会遇到一种情况,资源的字节大小,可能会是“-”,因此,需要将“-”,替换成0,然后在用convert转换成数字型。

[code]input {


stdin {


}       


}


filter {


grok {


match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]


}


mutate {


gsub=>["bytes","_","0"]


convert=>["port","integer","reqTime","integer","statusCode","integer","bytes","integer"]


}


}


output{


stdout{


codec=>rubydebug


}


}

[/code]

得到如下结果:

[code]{


"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/js/common/jquery-1.10.2.min.js\" \"\" 8080 304 - 67 \"http://10.1.8.193:8080/goLogin\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"@version" => "1",


"@timestamp" => "2016-05-17T09:17:21.745Z",


"host" => "vcyber",


"clientip" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/js/common/jquery-1.10.2.min.js\"",


"request_query" => "\"\"",


"port" => 8080,


"statusCode" => 304,


"bytes" => 0,


"reqTime" => 67,


"referer" => "\"http://10.1.8.193:8080/goLogin\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""


}

[/code]

join

值是 hash

无默认值

用分隔符连接数组. 如果字段不是数组,那什么都不做。

示例:

filter {
mutate {
join =>{"fieldname"=>","}}}


lowercase 和 uppercase

值是数组 array

没有默认值

把字符串转换成小写或大写。

示例:

[code]filter {


mutate {


lowercase =>["fieldname"]}}

[/code]

示例:

[code]filter {


mutate {


uppercase =>["fieldname"]}}

[/code]

merge

值是 hash

无默认值

合并两个数组或散列字段。存在三种情况,合并后是数组:

数组和字符串,可以合并

字符串和字符串,可以合并

数组和散列不能合并

示例:

[code]mutate {


add_field=>{"arr_clientip"=>"%{clientip}"}


add_field=>{"arrmstr_clientip"=>"%{clientip}"}


add_field=>{"arrmarr_clientip"=>"%{clientip}"}


#merge=>{"merge_clientip"=>"clientip"}


}


mutate {


split=>{"arr_clientip"=>"."}


split=>{"arrmstr_clientip"=>"."}


split=>{"arrmarr_clientip"=>"."}


}


mutate {


merge=>{"arrmstr_clientip"=>"clientip"}


merge=>{"arrmarr_clientip"=>"arr_clientip"}


}

[/code]

=> 后面的字段值会合并到前面的字段。


得到如下结果:


[code]{


   "message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


   "@version" => "1",


"@timestamp" => "2016-05-18T02:53:35.671Z",


   "host" => "vcyber",


   "clientip" => "192.168.6.25",


   "identd" => "-",


   "auth" => "-",


   "timestamp" => "24/Apr/2016:01:25:53 +0800",


   "http_method" => "GET",


   "request" => "\"/goLogin\"",


   "request_query" => "\"\"",


   "port" => "8080",


   "statusCode" => "200",


   "bytes" => "1692",


   "reqTime" => "23",


   "referer" => "\"http://10.1.8.193:8080/goMain\"",


   "userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"arr_clientip" => [


[0] "192",


[1] "168",


[2] "6",


[3] "25"


],


"arrmstr_clientip" => [


[0] "192",


[1] "168",


[2] "6",


[3] "25",


[4] "192.168.6.25"


],


"arrmarr_clientip" => [


[0] "192",


[1] "168",


[2] "6",


[3] "25",


[4] "192",


[5] "168",


[6] "6",


[7] "25"


]


}

[/code]

periodic_flush

值是 boolean

默认值是
false


按时间间隔调用。可选。

remove_field

值是数组 array

默认值是数组
[]


移除字段。

示例:移除 message 字段。

[code]mutate {


remove_field=>["message"]


}

[/code]

得到如下结果:

[code]{


"@version" => "1",


"@timestamp" => "2016-05-18T02:04:16.879Z",


"host" => "vcyber",


"clientip" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => "8080",


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""


}

[/code]

message 字段已经没有了~message 字段保存了原始日志,如果保留的话,就意味着日志存了两份:分割前和分割后。

当然,也可以一次移除多个字段。

remove_tag

值是数组 array

默认值是
[]


移除标识。

示例:

[code]filter {


mutate {


remove_tag =>["foo_%{somefield}"]}}

[/code]

也可以一次移动多个 tag:


[code]filter {


mutate {


remove_tag =>["foo_%{somefield}","sad_unwanted_tag"]}}

[/code]

rename

值是 hash

无默认值

重命名一个或多个字段。

示例:

[code]input {


stdin {


}       


}


filter {


grok {


match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]


}


mutate {


rename=>{"clientip"=>"host"}


}


}


output{


stdout{


codec=>rubydebug


}


}

[/code]

得到如下结果:

[code]{


"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"@version" => "1",


"@timestamp" => "2016-05-17T09:29:44.018Z",


"host" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => "8080",


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""


}

[/code]

Grok 里,客户端IP本来叫 clientip,但是可以在 mutate 里重新命名为 host。

replace

值是 hash

无默认值

用一个新的值替换掉指定字段的值。

示例:

[code]input {


stdin {


}       


}


filter {


grok {


match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]


}


mutate {


replace=>{"message"=>"%{clientip}: My new Message."}


}


}


output{


stdout{


codec=>rubydebug


}


}

[/code]

得到如下结果:

[code]{


"message" => "192.168.6.25: My new Message.",


"@version" => "1",


"@timestamp" => "2016-05-18T01:55:34.566Z",


"host" => "vcyber",


"clientip" => "192.168.6.25",


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => "8080",


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""


}

[/code]

message 字段的值已经变了。

split

值是 hash

无默认值

用分隔符或字符分割一个字符串。只能应用在字符串上。

示例:把客户端IP按英文句号分割成数组。

[code]mutate {


split=>{"clientip"=>"."}


}

[/code]

得到如下结果:


[code]{


"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",


"@version" => "1",


"@timestamp" => "2016-05-18T01:58:40.687Z",


"host" => "vcyber",


"clientip" => [


[0] "192",


[1] "168",


[2] "6",


[3] "25"


],


"identd" => "-",


"auth" => "-",


"timestamp" => "24/Apr/2016:01:25:53 +0800",


"http_method" => "GET",


"request" => "\"/goLogin\"",


"request_query" => "\"\"",


"port" => "8080",


"statusCode" => "200",


"bytes" => "1692",


"reqTime" => "23",


"referer" => "\"http://10.1.8.193:8080/goMain\"",


"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""


}

[/code]

strip

值是数组 array

无默认值

去掉字段首尾的空格。

示例:

[code]filter {


mutate {


strip =>["field1","field2"]}}

[/code]

update

值是 hash

无默认值

Update an existing field with a new value. If the field does not exist, then no action will be taken.

示例:

filter {
mutate {
update =>{"sample"=>"My new message"}}}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: