DRUID: 新增MiddleManager节点遇到的坑
2017-09-08 16:01
2943 查看
新增MiddleManager节点遇到的坑
问题背景: 我们的consumer是从kafka介入数据,然后通过集成tranquility向druid生产数据。
问题描述: 线上druid集群新增MiddleManager节点后,发现某些任务success了, 但是数据并没有保存到hdfs里面。
查看peon节点日志,发现有如下错误:
解决方案:
首先: 确定是不是所有有问题的任务都是在这个MiddleManager下面,经运维确定所有有问题的任务都是新增MiddleManager下的。
其次, 仔细查看日志发现如下重点日志:
怎么回事呢?peon没有接收到数据???
原因找到了,原来consumer所在机器没有配置新增MiddleManager的host配置,导致任务创建正常,并且可以正常结束,然后hdfs里却没有数据。
因为,consumer不知道把数据发送给谁……
2.理解误区
我们的consumer通过tranquility,向overlord发送任务模板,overlord接收到任务模板后,会根据任务分配策略发送给某个MiddleManager,MiddleManager启动一个java进程,
即peon。peon可以理解为一个http server。最后: consumer是向http server直接发送数据,而不再依赖overlord和MiddleManager节点。可以理解为我们通过集成tranqulity可以和
每个peon创建一个连接。
一些人的误区主要体现在: consumer将数据发送给了overlord,是通过overlord转发数据给peon,这样理解是错误的。
问题背景: 我们的consumer是从kafka介入数据,然后通过集成tranquility向druid生产数据。
问题描述: 线上druid集群新增MiddleManager节点后,发现某些任务success了, 但是数据并没有保存到hdfs里面。
查看peon节点日志,发现有如下错误:
......(略)...... 2017-09-04T00:56:03,285 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push. 2017-09-04T00:56:03,286 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z]. 2017-09-04T00:56:03,286 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge ......(略)...... 2017-09-04T02:09:03,305 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2017-09-04T02:09:03.305Z","service":"druid/middleManager","host":"AtsbDruidMid1:8101","version":"0.10.0","metric":"ingest/handoff/count","value":0,"dataSource":"mobileTopic-1m","taskId":["index_realtime_mobileTopic-1m_2017-09-04T01:00:00.000Z_1_0"]}] 2017-09-04T02:09:03,305 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2017-09-04T02:09:03.305Z","service":"druid/middleManager","host":"AtsbDruidMid1:8101","version":"0.10.0","metric":"ingest/sink/count","value":0,"dataSource":"mobileTopic-1m","taskId":["index_realtime_mobileTopic-1m_2017-09-04T01:00:00.000Z_1_0"]}] 2017-09-04T02:09:03,305 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2017-09-04T02:09:03.305Z","service":"druid/middleManager","host":"AtsbDruidMid1:8101","version":"0.10.0","metric":"ingest/events/messageGap","value":1504490943305,"dataSource":"mobileTopic-1m","taskId":["index_realtime_mobileTopic-1m_2017-09-04T01:00:00.000Z_1_0"]}] 2017-09-04T02:10:00,003 INFO [mobileTopic-1m-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push. 2017-09-04T02:10:00,004 INFO [mobileTopic-1m-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z]. 2017-09-04T02:10:00,004 INFO [mobileTopic-1m-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge 2017-09-04T02:10:02,864 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2017-09-04T02:10:02.864Z","service":"druid/middleManager","host":"AtsbDruidMid1:8101","version":"0.10.0","metric":"segment/scan/pending","value":0}] 2017-09-04T02:15:00,000 INFO [timed-shutoff-firehose-0] io.druid.data.input.FirehoseFactory - Closing delegate firehose. 2017-09-04T02:15:00,000 INFO [timed-shutoff-firehose-0] io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory - Firehose closing. 2017-09-04T02:15:00,001 INFO [timed-shutoff-firehose-0] io.druid.server.metrics.EventReceiverFirehoseRegister - Unregistering EventReceiverFirehoseMetric for service [druid:firehose:mobileTopic-1m-01-0001-0000] 2017-09-04T02:15:00,001 INFO [timed-shutoff-firehose-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[druid:firehose:mobileTopic-1m-01-0001-0000] 2017-09-04T02:15:00,002 INFO [timed-shutoff-firehose-0] io.druid.curator.discovery.CuratorServiceAnnouncer - Unannouncing service[DruidNode{serviceName='druid:firehose:mobileTopic-1m-01-0001-0000', host='AtsbDruidMid1', port=8101}] 2017-09-04T02:15:00,373 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Persisting remaining data. 2017-09-04T02:15:00,374 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[mobileTopic-1m] 2017-09-04T02:15:00,376 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Finishing the job. 2017-09-04T02:15:00,376 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Shutting down... 2017-09-04T02:15:00,377 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Job done! 2017-09-04T02:15:00,377 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_mobileTopic-1m_2017-09-04T01:00:00.000Z_1_0] status changed to [SUCCESS]. 2017-09-04T02:15:00,381 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: { "id" : "index_realtime_mobileTopic-1m_2017-09-04T01:00:00.000Z_1_0", "status" : "SUCCESS", "duration" : 4737152 } 2017-09-04T02:15:00,388 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@615439f7]. 2017-09-04T02:15:00,388 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@22e2266d] 2017-09-04T02:15:00,388 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/AtsbDruidMid1:8101] 2017-09-04T02:15:00,403 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@45bbc52f]. 2017-09-04T02:15:00,403 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/AtsbDruidMid1:8101] 2017-09-04T02:15:00,405 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/AtsbDruidMid1:8101] 2017-09-04T02:15:00,406 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.query.lookup.LookupReferencesManager.stop()] on object[io.druid.query.lookup.LookupReferencesManager@4245bf68]. 2017-09-04T02:15:00,406 INFO [main] io.druid.query.lookup.LookupReferencesManager - Stopping lookup factory references manager 2017-09-04T02:15:00,416 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@4c635edc{HTTP/1.1,[http/1.1]}{0.0.0.0:8101} 2017-09-04T02:15:00,417 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@47fb7ec9{/,null,UNAVAILABLE} 2017-09-04T02:15:00,424 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop() throws java.lang.Exception] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@6010d0d1]. 2017-09-04T02:15:00,425 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@283ab206]. 2017-09-04T02:15:00,425 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@741ac284]. 2017-09-04T02:15:00,429 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.announcement.Announcer.stop()] on object[io.druid.curator.announcement.Announcer@733e6df7]. 2017-09-04T02:15:00,431 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@2d7637e6]. 2017-09-04T02:15:00,431 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.http.client.NettyHttpClient.stop()] on object[com.metamx.http.client.NettyHttpClient@69a024a0]. 2017-09-04T02:15:00,466 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.storage.hdfs.HdfsStorageAuthentication.stop()] on object[io.druid.storage.hdfs.HdfsStorageAuthentication@2a4f8009]. 2017-09-04T02:15:00,467 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@5fdfe8cf]. 2017-09-04T02:15:00,476 INFO [main] io.druid.curator.CuratorModule - Stopping Curator 2017-09-04T02:15:00,477 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting 2017-09-04T02:15:00,484 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x354bcd7b507b357 closed 2017-09-04T02:15:00,484 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@697d6ded]. 2017-09-04T02:15:00,484 INFO [main] com.metamx.emitter.core.LoggingEmitter - Close: started [false] 2017-09-04T02:15:00,484 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@65c5bf0e]. 2017-09-04T02:15:00,484 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x354bcd7b507b357 2017-09-04 02:15:00,519 Thread-2 ERROR Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Not started at io.druid.common.config.Log4jShutdown.addShutdownCallback(Log4jShutdown.java:45) at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:273) at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:145) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:182) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103) at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42) at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:253) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:273) at org.apache.hadoop.hdfs.LeaseRenewer.<clinit>(LeaseRenewer.java:72) at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:699) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:859) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:853) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2407) at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
解决方案:
首先: 确定是不是所有有问题的任务都是在这个MiddleManager下面,经运维确定所有有问题的任务都是新增MiddleManager下的。
其次, 仔细查看日志发现如下重点日志:
2017-09-04T00:56:03,285 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push. 2017-09-04T00:56:03,286 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z]. 2017-09-04T00:56:03,286 INFO [task-runner-0-priority-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge ......(略)...... 2017-09-04T02:10:00,003 INFO [mobileTopic-1m-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push. 2017-09-04T02:10:00,004 INFO [mobileTopic-1m-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z]. 2017-09-04T02:10:00,004 INFO [mobileTopic-1m-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge
怎么回事呢?peon没有接收到数据???
原因找到了,原来consumer所在机器没有配置新增MiddleManager的host配置,导致任务创建正常,并且可以正常结束,然后hdfs里却没有数据。
因为,consumer不知道把数据发送给谁……
2.理解误区
我们的consumer通过tranquility,向overlord发送任务模板,overlord接收到任务模板后,会根据任务分配策略发送给某个MiddleManager,MiddleManager启动一个java进程,
即peon。peon可以理解为一个http server。最后: consumer是向http server直接发送数据,而不再依赖overlord和MiddleManager节点。可以理解为我们通过集成tranqulity可以和
每个peon创建一个连接。
一些人的误区主要体现在: consumer将数据发送给了overlord,是通过overlord转发数据给peon,这样理解是错误的。
相关文章推荐
- C#中通过XmlDocument操作Xml文档(新建XML、新增节点、删除节点、显示文档内容)
- (八)Redis集群常用命令、集群节点新增删除
- k8s-19-新增一个master节点
- 【转载】动态新增svg节点
- openstack icehouse 3节点部署遇到的问题和解决方法 推荐
- Oracle RAC中新增节点
- nginx(非openresty) 实现解析uri自动upstream并支持动态新增删除(redisCluster)节点(四种方式)
- hadoop 集群新增节点配置
- dom4j解析xml文件,添加新节点,添加节点内容,读取文件中节点内容作为新增节点内容
- 安装cm初始脚本配置数据库scm_prepare_database.sh(在主节点上)遇到的问题
- redis集群之新增节点
- Greenplum对新增节点扩展Segments实战
- (八)Redis集群常用命令、集群节点新增删除
- ABP新增模块可能遇到的问题
- 红黑树(改进版) -- 新增删除节点的方法实现
- 自动创建设备节点时遇到的BUG
- SpringBoot获取Redis集群实例有缺少的问题->为Redis集群新增节点分配slot
- oracle新增控制文件遇到的问题
- (八)Redis集群常用命令、集群节点新增删除
- RAC (双节点) + 单实例 DATAGUARD 安装遇到的问题处理