您的位置:首页 > 移动开发

hadoop 0.23 YARN分布式程序的编写 (Hadoop MapReduce Next Generation - Writing YARN Applications)

2014-05-13 18:00 585 查看
转载自:http://blog.csdn.net/bertzhang/article/details/7102579

本来想直接转载过来,但是找不到怎么转载,所以就复制过来了。

原文:http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

目的
这个文档从比较高的层面上描述了如何编写一个YARN应用

概念和流程

首先说的概念是“Application Submission Client”他负责将“Application”提交到YARN的Resource
Manager.客户端通过ClientRMProtocol协议与ResourceManager联系,如果需要Client会通过ClientRPProtocol::getNewApplication来获取新的ApplicationId,然后通过ClientRMProtocol::submitApplication将应用提交运行。作为ClientRMProtocol::submitApplication调用的一部分,客户端需要足够的信息给ResourceManager来运行应用的第一个container也就是Applicationmaster.你需要提供如下一些信息:你的应用程序运行时所需要的local
file/jars,执行时所运行的命令(包括必要的命令参数),Unix环境变量设置(可选的)等等。实际上你需要为ApplicationMaster提供Unix进程的描述信息。

YARN的ResourceManager会在一个获得的container上启动ApplicationMaster。ApplicationMaster然后通过AMRMProtocol协议与ResourceManager通讯,首先ApplicationMaster需要将自身注册到ResouceManager。ApplicationMaster为了完成交给他的任务,他会通过AMRMProtocol::allocate来申请containers。如果获得了container,ApplicationMaster会通过ContainerManager::startContainer和NodeManager联系,来为任务启动一个container。作为启动container的一部分,ApplicationMaster需要指定ContainerLaunchContext,ContainerLaunchContext和ApplicationSubmissionContext相似,包括了一些启动时需要的信息,诸如:命令行命令、环境变量等。一旦任务完成,ApplicationMaster会通过AMRMProtocol::finishApplicationMaster来通知ResourceManager任务完成。

与此同时,client可以通过查询ResourceManager来获取application的状态信息,或者如果ApplicationMaster支持也可以直接从ApplicationMaster查询信息。如果需要,client可以通过ClientRMProtocol::forceKillApplication来kill掉application。

接口

你可能关心的接口包括以下这些:

ClientRMProtocol -- Client <--> ResourceManager

这是client和ResourceManager通讯来启动一个新的application(这个application是ApplicationMaster等)的协议,可以通过这个协议查询或kill application。例如:a job-client将使用这个协议。

AMRMProtocol -- ApplicationMaster <-->ResourceManager

这个协议用于ApplicationManager向ResourceManager注册和注销自己,同时包括从Scheduler申请资源来完成任务。

ContainerManager - ApplicationMaster <-->NodeManager

这个协议用于ApplicationMaster和NodeManager来开始或停止一个container,或者获取container的状态更新信息。

写一个简单的Yarn应用

写一个简单的client

第一步是client连接到ResourceManager或者更具体一点说,连接到ResourceManager的ApplicationManager(AsM)接口

[html] view
plaincopy

ClientRMProtocol applicationsManager;

YarnConfiguration yarnConf = new YarnConfiguration(conf);

InetSocketAddress rmAddress =

NetUtils.createSocketAddr(yarnConf.get(

YarnConfiguration.RM_ADDRESS,

YarnConfiguration.DEFAULT_RM_ADDRESS));

LOG.info("Connecting to ResourceManager at " + rmAddress);

configuration appsManagerServerConf = new Configuration(conf);

appsManagerServerConf.setClass(

YarnConfiguration.YARN_SECURITY_INFO,

ClientRMSecurityInfo.class, SecurityInfo.class);

applicationsManager = ((ClientRMProtocol) rpc.getProxy(

ClientRMProtocol.class, rmAddress, appsManagerServerConf));

一旦ASM的handler获得后,client需要从ResourceManager获取一个ApplicationId

[java] view
plaincopy

GetNewApplicationRequest request =

Records.newRecord(GetNewApplicationRequest.class);

GetNewApplicationResponse response =

applicationsManager.getNewApplication(request);

LOG.info("Got new ApplicationId=" + response.getApplicationId());

从ASM返回的response也包含一些整个集群的信息,诸如minimum/maximum资源容量等。有了这些信息才能够适当的设置container的一些参数使得ApplicationMaster能够在这个container上运行。可以参考GetNewApplicationResponse获得更多细节信息。

client的一个关键工作就是设置ApplicationSubmissionContext,使得ResourceManager能够启动ApplicationMaster。client需要设置下面的一些context:

Application Info:id和name
队列(Queue),优先级信息(Priority info):application将被提交到的队列,以及application被设定的优先级
User:提交application的用户
ContainerLaunchContext:ApplicationMaster被启动的container的一些信息。ContainerLaunchContext正如前面所描述的,定义了启动ApplicationMaster需要的信息包括local resource(binary,jars,files等等),security tokens,environment setting(CLASSPATH等)和被执行的command。

[java] view
plaincopy

// Create a new ApplicationSubmissionContext

ApplicationSubmissionContext appContext =

Records.newRecord(ApplicationSubmissionContext.class);

// set the ApplicationId

appContext.setApplicationId(appId);

// set the application name

appContext.setApplicationName(appName);

// Create a new container launch context for the AM's container

ContainerLaunchContext amContainer =

Records.newRecord(ContainerLaunchContext.class);

// Define the local resources required

Map<String, LocalResource> localResources =

new HashMap<String, LocalResource>();

// Lets assume the jar we need for our ApplicationMaster is available in

// HDFS at a certain known path to us and we want to make it available to

// the ApplicationMaster in the launched container

Path jarPath; // <- known path to jar file

FileStatus jarStatus = fs.getFileStatus(jarPath);

LocalResource amJarRsrc = Records.newRecord(LocalResource.class);

// Set the type of resource - file or archive

// archives are untarred at the destination by the framework

amJarRsrc.setType(LocalResourceType.FILE);

// Set visibility of the resource

// Setting to most private option i.e. this file will only

// be visible to this instance of the running application

amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);

// Set the location of resource to be copied over into the

// working directory

amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath));

// Set timestamp and length of file so that the framework

// can do basic sanity checks for the local resource

// after it has been copied over to ensure it is the same

// resource the client intended to use with the application

amJarRsrc.setTimestamp(jarStatus.getModificationTime());

amJarRsrc.setSize(jarStatus.getLen());

// The framework will create a symlink called AppMaster.jar in the

// working directory that will be linked back to the actual file.

// The ApplicationMaster, if needs to reference the jar file, would

// need to use the symlink filename.

localResources.put("AppMaster.jar", amJarRsrc);

// Set the local resources into the launch context

amContainer.setLocalResources(localResources);

// Set up the environment needed for the launch context

Map<String, String> env = new HashMap<String, String>();

// For example, we could setup the classpath needed.

// Assuming our classes or jars are available as local resources in the

// working directory from which the command will be run, we need to append

// "." to the path.

// By default, all the hadoop specific classpaths will already be available

// in $CLASSPATH, so we should be careful not to overwrite it.

String classPathEnv = "$CLASSPATH:./*:";

env.put("CLASSPATH", classPathEnv);

amContainer.setEnvironment(env);

// Construct the command to be executed on the launched container

String command =

"${JAVA_HOME}" + /bin/java" +

" MyAppMaster" +

" arg1 arg2 arg3" +

" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +

" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";

List<String> commands = new ArrayList<String>();

commands.add(command);

// add additional commands if needed

// Set the command array into the container spec

amContainer.setCommands(commands);

// Define the resource requirements for the container

// For now, YARN only supports memory so we set the memory

// requirements.

// If the process takes more than its allocated memory, it will

// be killed by the framework.

// Memory being requested for should be less than max capability

// of the cluster and all asks should be a multiple of the min capability.

Resource capability = Records.newRecord(Resource.class);

capability.setMemory(amMemory);

amContainer.setResource(capability);

// Set the container launch content into the ApplicationSubmissionContext

appContext.setAMContainerSpec(amContainer);

进程信息设置完成后,client已经最终准备好提交任务到ASM

[java] view
plaincopy

// Create the request to send to the ApplicationsManager

SubmitApplicationRequest appRequest =

Records.newRecord(SubmitApplicationRequest.class);

appRequest.setApplicationSubmissionContext(appContext);

// Submit the application to the ApplicationsManager

// Ignore the response as either a valid response object is returned on

// success or an exception thrown to denote the failure

applicationsManager.submitApplication(appRequest);

这时,ResourceManager将接受这个application并且在后台根据设定的参数获取container并且在container上启动ApplicationManager

有多种办法client能够track progress的状态

client可以通过ClientRMProtocol::getApplicationReport和ResourceManager通讯来获取application的report

[java] view
plaincopy

GetApplicationReportRequest reportRequest =

Records.newRecord(GetApplicationReportRequest.class);

reportRequest.setApplicationId(appId);

GetApplicationReportResponse reportResponse =

applicationsManager.getApplicationReport(reportRequest);

ApplicationReport report = reportResponse.getApplicationReport();

从ResourceManager获取的ApplicationReport包含下面这些信息:

一般性application information:ApplicationId,application被提交到的queue,提交application的user,application开始的时间
ApplicationMaster的详细信息:ApplicationMaster运行的主机,提供给client连接的rpc端口,以及client与ApplicationManager通讯需要的一个token
Application tracking information:如果application支持某种类型的progress tracking,他可以设置监控的url,client可以通过ApplicationReport::getTrackingUrl来获取url并通过这个url来监控progress的状态
ApplicationStatus:ResourceManager能够看到的一些application的状态,可以通过Application::getYarnApplicationState得到是否YarnApplicationState被设置为FINISHED,client可以通过ApplicationReport::getFinalApplicationStatus来check application的success/failure。在failures时,ApplicationReport::getDiagnostics可以提供一些关于failure的一些信息。

如果ApplicationMaster支持,client可以直接通过host:rpcport(通过ApplicationReport获得的)来从ApplicationMaster获取progress的更新信息,如果可以获得,client也可以通过tracking url来获取状态信息。

在特定条件下,如果应用花费了太长时间或者其他因素,client可能希望kill掉application。ClientRMProtocol支持forceKillApplication调用通过ResourceManager给Application发送一个kill消息。ApplicationMaster也可以通过设计为client提供abort调用,client通过rpc方式来调用。

[java] view
plaincopy

KillApplicationRequest killRequest =

Records.newRecord(KillApplicationRequest.class);

killRequest.setApplicationId(appId);

applicationsManager.forceKillApplication(killRequest);

编写ApplicationMaster

ApplicationMaster是job的实际持有者,他由client通过ResouceManager启动,并被提供了job运行需要的必要的信息和资源,负责task的监督管理和相关工作的完成。

ApplicationMaster在多用户环境下可能与其他container运行在相同的物理主机上,因此假设他使用哪个预先配置的端口来监听都是不合理的。

当ApplicationMaster启动时,他可以通过环境变量来获得一些参数,诸如:ApplicationMaster所在container的ContainerId,application提交的时间,以及运行 ApplicationMaster的NodeManger host的细节信息,这些信息可以查阅ApplicationConstants来获得参数名称。

所有与ResouceManager的交互需要一个ApplicationAttemptId(如果任务失败可能会有多次尝试),ApplicationAttemptId能够通过ApplicationMaster的containerId来获得,有相应的API可以完成从环境变量获得的字符串到对象的转换。

[java] view
plaincopy

Map<String, String> envs = System.getenv();

String containerIdString =

envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);

if (containerIdString == null) {

// container id should always be set in the env by the framework

throw new IllegalArgumentException(

"ContainerId not set in the environment");

}

ContainerId containerId = ConverterUtils.toContainerId(containerIdString);

ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();

ApplicationMaster初始化完成后,可以通过ARMRMProtocol::registerApplicationMaster来向ResourceManager注册,ApplicationMaster通过ResouceManager的Scheduler接口来进行通讯。

[java] view
plaincopy

// Connect to the Scheduler of the ResourceManager.

YarnConfiguration yarnConf = new YarnConfiguration(conf);

InetSocketAddress rmAddress =

NetUtils.createSocketAddr(yarnConf.get(

YarnConfiguration.RM_SCHEDULER_ADDRESS,

YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS));

LOG.info("Connecting to ResourceManager at " + rmAddress);

AMRMProtocol resourceManager =

(AMRMProtocol) rpc.getProxy(AMRMProtocol.class, rmAddress, conf);

// Register the AM with the RM

// Set the required info into the registration request:

// ApplicationAttemptId,

// host on which the app master is running

// rpc port on which the app master accepts requests from the client

// tracking url for the client to track app master progress

RegisterApplicationMasterRequest appMasterRequest =

Records.newRecord(RegisterApplicationMasterRequest.class);

appMasterRequest.setApplicationAttemptId(appAttemptID);

appMasterRequest.setHost(appMasterHostname);

appMasterRequest.setRpcPort(appMasterRpcPort);

appMasterRequest.setTrackingUrl(appMasterTrackingUrl);

// The registration response is useful as it provides information about the

// cluster.

// Similar to the GetNewApplicationResponse in the client, it provides

// information about the min/mx resource capabilities of the cluster that

// would be needed by the ApplicationMaster when requesting for containers.

RegisterApplicationMasterResponse response =

resourceManager.registerApplicationMaster(appMasterRequest);

ApplicationMaster需要发出心跳通知ResouceManager,告知ApplicationMaster is alive and
still running。在ResouceManager端设置的超时时间可以通过YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS来访问,缺省值定义下YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS。对ResouceManager的AMRMProtocol::allocate调用可以算所是heatbeats,它还包含发送进程进展的相关信息。

依据任务的需求,ApplicationMaster可以申请一系列containers来运行任务。ApplicationMaster使用ResouceRequest类来指定container的specifications。具体包括:

hostname:如果container需要host在特定的rack或主机上,需要设定这个参数,其中“*”代表container可以分配在任何主机上。
Resouce capability:目前,YARN只支持Memeory base的资源需求分配,因此request只需要定义application需要多少memory。Memory的值以MB为单位,并且必须小于机群的max容量,并且是min容量的整数倍。内存是以物理内存使用来设定限制的。
Priority:当申请到一些container时,ApplicationMaster可以给不同组的container设置不同的优先级,举例来说,对于Map-Reduce任务来说,ApplicationMaster可以给map container指定比较高的优先级,而给reduce container指定比较低的优先级。

[java] view
plaincopy

// Resource Request

ResourceRequest rsrcRequest = Records.newRecord(ResourceRequest.class);

// setup requirements for hosts

// whether a particular rack/host is needed

// useful for applications that are sensitive

// to data locality

rsrcRequest.setHostName("*");

// set the priority for the request

Priority pri = Records.newRecord(Priority.class);

pri.setPriority(requestPriority);

rsrcRequest.setPriority(pri);

// Set up resource type requirements

// For now, only memory is supported so we set memory requirements

Resource capability = Records.newRecord(Resource.class);

capability.setMemory(containerMemory);

rsrcRequest.setCapability(capability);

// set no. of containers needed

// matching the specifications

rsrcRequest.setNumContainers(numContainers);

定义了container requirement以后,ApplicationMaster需要构建AllocateRequest发送到ResourceManager。AllocateRequest包括:

Requested containers:container specification和ApplicationMaster从ResourceManager处申请的container的数量
Released containers:在有些情况下,ApplicationMaster可能申请了过多的container,它可以返还这些不用的container给ResourceManager,这些container可以分配给其他的应用使用。
ResponseId:在allocate调用时保持在response当中的response id
Progress update information:ApplicationMaster可以发送进程更新信息给ResourceManager,取值为0-1

[java] view
plaincopy

List<ResourceRequest> requestedContainers;

List<ContainerId> releasedContainers

AllocateRequest req = Records.newRecord(AllocateRequest.class);

// The response id set in the request will be sent back in

// the response so that the ApplicationMaster can

// match it to its original ask and act appropriately.

req.setResponseId(rmRequestID);

// Set ApplicationAttemptId

req.setApplicationAttemptId(appAttemptID);

// Add the list of containers being asked for

req.addAllAsks(requestedContainers);

// If the ApplicationMaster has no need for certain

// containers due to over-allocation or for any other

// reason, it can release them back to the ResourceManager

req.addAllReleases(releasedContainers);

// Assuming the ApplicationMaster can track its progress

req.setProgress(currentProgress);

AllocateResponse allocateResponse = resourceManager.allocate(req);

ResourceManager返回的AllocateResponse通过AMResponse对象包含了下面这些信息:

Reboot flag:如果ApplicationMaster失去了和ResourceManager同步,则需要reboot
Allocated containers:分配给ApplicationMaster的containers
Headroom:整个机群的余量资源,基于这个信息和自己的资源需求,ApplicationMaster可以智能的决定调整子任务的优先度利用已经获得的containers,或者如果没有可获得的resource时,能够快速的脱困。
Completed containers:当ApplicationMaster启动了一个获得的container后,当这个container完成后,它将接收到来自ResourceManager的更新信息。ApplicationMaster能够查看完成的container的状态信息,采取适当的行动,比如如果任务失败则重试执行。

一个需要注意的事情是,container不一定会立刻分配给ApplicationMaster。这不意味着ApplicationMaster需要持续不断的请求没有获得的containers,一旦allocate request被发送了,在考虑到机群容量、优先级和scheduling policy的条件下,ApplicationMaster最终将获得container。ApplicationMaster只有在它估计需要的container数量增加时,才会再次发送request的请求。

[java] view
plaincopy

// Get AMResponse from AllocateResponse

AMResponse amResp = allocateResponse.getAMResponse();

// Retrieve list of allocated containers from the response

// and on each allocated container, lets assume we are launching

// the same job.

List<Container> allocatedContainers = amResp.getAllocatedContainers();

for (Container allocatedContainer : allocatedContainers) {

LOG.info("Launching shell command on a new container."

+ ", containerId=" + allocatedContainer.getId()

+ ", containerNode=" + allocatedContainer.getNodeId().getHost()

+ ":" + allocatedContainer.getNodeId().getPort()

+ ", containerNodeURI=" + allocatedContainer.getNodeHttpAddress()

+ ", containerState" + allocatedContainer.getState()

+ ", containerResourceMemory"

+ allocatedContainer.getResource().getMemory());

// Launch and start the container on a separate thread to keep the main

// thread unblocked as all containers may not be allocated at one go.

LaunchContainerRunnable runnableLaunchContainer =

new LaunchContainerRunnable(allocatedContainer);

Thread launchThread = new Thread(runnableLaunchContainer);

launchThreads.add(launchThread);

launchThread.start();

}

// Check what the current available resources in the cluster are

Resource availableResources = amResp.getAvailableResources();

// Based on this information, an ApplicationMaster can make appropriate

// decisions

// Check the completed containers

// Let's assume we are keeping a count of total completed containers,

// containers that failed and ones that completed successfully.

List<ContainerStatus> completedContainers =

amResp.getCompletedContainersStatuses();

for (ContainerStatus containerStatus : completedContainers) {

LOG.info("Got container status for containerID= "

+ containerStatus.getContainerId()

+ ", state=" + containerStatus.getState()

+ ", exitStatus=" + containerStatus.getExitStatus()

+ ", diagnostics=" + containerStatus.getDiagnostics());

int exitStatus = containerStatus.getExitStatus();

if (0 != exitStatus) {

// container failed

// -100 is a special case where the container

// was aborted/pre-empted for some reason

if (-100 != exitStatus) {

// application job on container returned a non-zero exit code

// counts as completed

numCompletedContainers.incrementAndGet();

numFailedContainers.incrementAndGet();

}

else {

// something else bad happened

// app job did not complete for some reason

// we should re-try as the container was lost for some reason

// decrementing the requested count so that we ask for an

// additional one in the next allocate call.

numRequestedContainers.decrementAndGet();

// we do not need to release the container as that has already

// been done by the ResourceManager/NodeManager.

}

}

else {

// nothing to do

// container completed successfully

numCompletedContainers.incrementAndGet();

numSuccessfulContainers.incrementAndGet();

}

}

}

当container分配给ApplicationMaster以后,ApplicationMaster需要follow Client相似的过程来为最终的task设置ContainerLaunchContext,使得task能够在获取到的container上运行。一旦ContainerLaunchContext被定义了,ApplicationMaster能够与ContainerManager进行通信启动这个allocated
container。

[java] view
plaincopy

//Assuming an allocated Container obtained from AMResponse

Container container;

// Connect to ContainerManager on the allocated container

String cmIpPortStr = container.getNodeId().getHost() + ":"

+ container.getNodeId().getPort();

InetSocketAddress cmAddress = NetUtils.createSocketAddr(cmIpPortStr);

ContainerManager cm =

(ContainerManager)rpc.getProxy(ContainerManager.class, cmAddress, conf);

// Now we setup a ContainerLaunchContext

ContainerLaunchContext ctx =

Records.newRecord(ContainerLaunchContext.class);

ctx.setContainerId(container.getId());

ctx.setResource(container.getResource());

try {

ctx.setUser(UserGroupInformation.getCurrentUser().getShortUserName());

} catch (IOException e) {

LOG.info(

"Getting current user failed when trying to launch the container",

+ e.getMessage());

}

// Set the environment

Map<String, String> unixEnv;

// Setup the required env.

// Please note that the launched container does not inherit

// the environment of the ApplicationMaster so all the

// necessary environment settings will need to be re-setup

// for this allocated container.

ctx.setEnvironment(unixEnv);

// Set the local resources

Map<String, LocalResource> localResources =

new HashMap<String, LocalResource>();

// Again, the local resources from the ApplicationMaster is not copied over

// by default to the allocated container. Thus, it is the responsibility

// of the ApplicationMaster to setup all the necessary local resources

// needed by the job that will be executed on the allocated container.

// Assume that we are executing a shell script on the allocated container

// and the shell script's location in the filesystem is known to us.

Path shellScriptPath;

LocalResource shellRsrc = Records.newRecord(LocalResource.class);

shellRsrc.setType(LocalResourceType.FILE);

shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);

shellRsrc.setResource(

ConverterUtils.getYarnUrlFromURI(new URI(shellScriptPath)));

shellRsrc.setTimestamp(shellScriptPathTimestamp);

shellRsrc.setSize(shellScriptPathLen);

localResources.put("MyExecShell.sh", shellRsrc);

ctx.setLocalResources(localResources);

// Set the necessary command to execute on the allocated container

String command = "/bin/sh ./MyExecShell.sh"

+ " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"

+ " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";

List<String> commands = new ArrayList<String>();

commands.add(command);

ctx.setCommands(commands);

// Send the start request to the ContainerManager

StartContainerRequest startReq = Records.newRecord(StartContainerRequest.class);

startReq.setContainerLaunchContext(ctx);

cm.startContainer(startReq);

正如前面所描述的,通过AMRMProtocol::allocate调用的返回信息,ApplicationMaster能够得到完成情况的更新信息,他也能够通过查询ContainerManager的状态来主动监测launched container。

[java] view
plaincopy

GetContainerStatusRequest statusReq =

Records.newRecord(GetContainerStatusRequest.class);

statusReq.setContainerId(container.getId());

GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq);

LOG.info("Container Status"

+ ", id=" + container.getId()

+ ", status=" + statusResp.getStatus());

FAQ

1、我如何能够将我的application的jars部署到需要这些jars的全部的节点上?

你可以利用LocalResource将需要的resource添加进去。这将使YARN分发这些资源到ApplicationMaster node。如果资源类型是tgz,zip或者jar,你可以让YARN去unzip他们。你需要做的只是将unziped的folder添加到你的classpath中。举例来说,当你像下面这样创建你的application request:

[java] view
plaincopy

File packageFile = new File(packagePath);

Url packageUrl = ConverterUtils.getYarnUrlFromPath(

FileContext.getFileContext.makeQualified(new Path(packagePath)));

packageResource.setResource(packageUrl);

packageResource.setSize(packageFile.length());

packageResource.setTimestamp(packageFile.lastModified());

packageResource.setType(LocalResourceType.ARCHIVE);

packageResource.setVisibility(LocalResourceVisibility.APPLICATION);

resource.setMemory(memory)

containerCtx.setResource(resource)

containerCtx.setCommands(ImmutableList.of(

"java -cp './package/*' some.class.to.Run "

+ "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout "

+ "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"))

containerCtx.setLocalResources(

Collections.singletonMap("package", packageResource))

appCtx.setApplicationId(appId)

appCtx.setUser(user.getShortUserName)

appCtx.setAMContainerSpec(containerCtx)

request.setApplicationSubmissionContext(appCtx)

applicationsManager.submitApplication(request)

正如你所看到的,setLocalResource命令通过一个map建立了names和resources的映射,name成为一个sym链接进入你应用的cwd,因此通过使用“./package*.",你就可以使用这些设施了(artifacts)

注意:Java‘s classpath参数是非常sensitive的,一定要保证你使用的语法正确。

一旦你的package被部署到你的ApplicationMaster,无论何时ApplicationMaster启动一个新的container,你只要follow这个相同的过程(假设你希望resource被发送到你的container)。这段代码是完全相同的,你只要确信你给你的ApplicationMaster package path(无论是HDFS或者local),这样它可以随着container的ctx一起发送resource URL。

2、我如何获得ApplicationMaster的ApplicationAttemptId?

通过环境变量,ApplicationAttemptId将被发送给ApplicationMaster,从环境变量获得的值通过ConverterUtils辅助函数能够转化为ApplicationAttemptId对象。

3、我的container被NodeManager kill了

这可能是因为比较高的内存消耗超出了你的container的memory size。有一系列的原因可能产生这种现象,首先当container被kill时,可以产看node manager dump出来的process tree。你需要关注的是physical memory和virtual memory。如果你超出了physical memory限制,你的application使用了太多的physical memory,如果你运行一个Java app,你可以使用 -hprof来什么占用了堆里的空间。如果你超出了虚拟内存的限制,你需要增加机群范围的配置变量yarn.nodemanager.vmem-pmem-rati
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: