您的位置：首页 > 其它

工作流挖掘：相关问题和方法的研究(10)

2007-08-06 10:34 597 查看

8．如何挖掘伴随多项任务的工作流过程？-一种归纳法

前面章节描述的方法都假设一个过程内一项任务被赋以唯一的标识，即：在图形化模型中不可能存在多个构建块指向同一项任务。对于某些过程，这样的假设是不成立的。可能会有多项任务用同一个名称。作为这样的过程的例子，图10给出了参考文献[25]中提到的客车开发过程的一部分。尽管甚至对这样的过程也可以找到唯一的名称（如：“notifyEng-1”, “notifyEng-2”），对工作流挖掘过程要求唯一的名称将是一个强制的需求（与[16]相比）。要提供唯一的任务名称就要求至少在一些范围内过程的结构是可知的。在工作流挖掘过程是用来发现结构的情况下，这是不现实的。
在[26]中，我们提到了一种对伴有不唯一任务名称的工作流模型进行挖掘的解决方案。该方案包括2步：引导和转换。
在引导时，通过工作流日志产生了一个随机任务图（也被称作随机活动图，SAG[26]）。该引导算法可被描述为嵌入到搜索过程的一种图形生成算法（InduceUniqueNodeSAG）。
搜索过程的思路借鉴自机器学习和文法推理。它找到从工作流日志中的任务实例到工作流模型中任务节点的一种映射。搜索空间是关于这些映射的格子。映射之间是部分有序的（更加普通或特殊）。格子以最一般的映射（每一个名为X的任务实例映射为1个单独的名为X的任务节点）为顶，以最特殊的元素（日志中的任务实例和工作流模型中的任务节点间存在双向映射）为底。我们的查找算法从最一般的映射开始自顶向下搜索以找到一个理想的映射。更多特殊的映射通过分离操作创建。在两个通过重命名任务节点而映射为两个不同任务节点的分组中，分离操作将所有的任务实例映射到（工作流）模型的相同的任务节点。在图11给出的例子中，工作流日志E1中名为A和C的任务实例通过两个分离操作被分解为A、A’、C和C’。
InduceUniqueNodeSAG被称作从任务实例到节点的固定映射，而且他能够针对图11所示映射产生一个随机的任务图。这与参考文献[11]中提到的方法非常相似。主要的不同在于对依赖关系得稍微不同的定义，以及为了插入所要求任务节点副本和聚合具有相同前驱的任务节点（而增加）的两个附加操作。与第5章提到的正规方法一个显著的不同是依赖的确定性。InduceUniqueNodeSAG认为由于依赖关系的确定性，每一对任务实例发生的机会是相同的（他们之间任务实例的数量忽略不计）。及物减少量杯用作识别直接的后继。注意：为了确定依赖关系，第5章所提到的正规方法只关心成对的直接后继。
查找算法应用了beam-search。搜索以每份样例的SAG日志可能性为准。日志可能性的计算要求一个随机的样例。这就意味着归纳算法运用n个严格遵循相同的任务顺序的工作流实例作为n个不同的案例。对于正规的方法（参见第5章）对于每组顺序的任务有1个实例就够了。通过这一信息不仅可以算出SAG的可能性，还可以算出任务和边界的可能性。这一信息对于从稀有的行为中找出共同点是有用的。
在转换的步骤，SAG转换为了ADONIS格式的块状结构的工作流模型。这一步是必须的，因为引导阶段提供的随机任务图没有明确的区分出选择和平行路径。转换阶段可分解为3个主要的步骤：（1）分析工作流日志中工作流实例的同步结构，（2）生成工作流模型的同步结构，和（3）生成模型。参考文献[26]给出了转换步骤地详细说明。
工作流挖掘工具InWoLvE（Inductive Workflow Learning via Examples，通过例子归纳工作流知识）实现了（上面）所描述的挖掘算法和两个进一步的归纳算法，只针对顺序的工作流模型。InWoLvE包含与商业过程管理系统ADONIS[36]的接口，用作交换工作流日志和过程模型。它已经成功地应用于从实际的工作流模型（就象图10所示的那样）和大量人工工作流模型出发的工作流跟踪。
有关进一步的细节以及诸如从SAG到稳定的Petri网的转换、用于处理噪音的附加的分离操作和实验估算的结果等附加方面的描述请看参考文献[26]。

8. How to mine workflow processes with duplicate tasks?––An inductive approach

The approaches presented in the preceding sections assume that a task name should be a unique identifier within a process, i.e., in the graphical models it is not possible to have multiple building blocks referring to the same task. For some processes this requirement does not hold. There may be more than one task sharing the same name. An example of such a process is the part release process for the development of passenger car from [25], which is shown in Fig. 10. Although one may find unique names (e.g. “notifyEng-1”, “notifyEng-2”) even for these kind of processes, requiring unique names for the workflow mining procedure would be a tough requirement (compare [16]). Providing unique task names requires that the structure of the process is known at least to some extent. This is unrealistic if the workflow mining procedure is to be used to discover the structure.

In [26] we present a solution for mining workflow models with non-unique task names. It consists of two steps: the induction and the transformation step.
In the induction step a Stochastic Task Graph (also referred to as Stochastic Activity Graph, SAG [26]) is induced from the workflow log. The induction algorithm can be described as a graph generation algorithm (InduceUniqueNodeSAG) that is embedded into a search procedure.
The search procedure borrows ideas from machine learning and grammatical inference [49]. It searches for a mapping from task instances in the workflow log to task nodes in the workflow model. The search space can be described as a lattice of such mappings. Between the mappings there is a partial ordering (more general than/more specific than). The lattice is limited by a top or most general mapping (every task instance with name X is mapped to one single task node with name X) and a bottom or most specific element (the mapping is a bijection between task instances in the log and task nodes of the workflow model). Our search algorithm searches top down starting with the most general mapping for an optimal mapping. More specific mappings are created using a split operator. The split operator splits up all task instances mapped to the same task node of the model in two groups which are mapped two different task nodes by renaming task nodes. In the example shown in Fig. 11 the task instances with names A and C of workflow log E1 are split in A, A’, C and C’ using two split operations.
The InduceUniqueNodeSAG is called for a fixed mapping from instances to task nodes and it generates a stochastic task graph for this mapping as indicated in Fig.11. It is very similar to the approach presented in [11]. The main differences are a slightly different definition of the dependency relation and two additional steps for inserting copies of task nodes where required and for clustering task nodes sharing common predecessors. A notable difference to the formal approach in Section 5 is the determination of dependencies. InduceUniqueNodeSAG considers every pair of task instances occurring in the same instance––regardless of the number of task instances in between––for the determination of the dependency relation. The transitive reduction is used to identify direct successors. Note that the formal approach presented Section 5 considers only pairs of direct successors for determining the dependency relation.
The search algorithm applies beam-search. The search is guided by the log likelihood of the SAG per sample. The calculation of the log likelihood requires a stochastic sample. This means that the induction algorithm handles n workflow instances sharing exactly the same ordering of tasks as n different cases. For the formal approach (cf. Section 5) one instance for each ordering of tasks is enough. Using this information one is able to calculate not only the likelihood of the SAG but also the probability of tasks and edges. This information is useful to distinguish common from rare behavior.

In the transformation step the SAG is transformed into a block-structured workflow-model in the ADONIS format. This step is needed because the stochastic task graph provided by the induction phase does not explicitly distinguish alternative and parallel routing. The transformation phase can be decomposed into three main steps: (1) the analysis of the synchronization structures of the workflow instances in the workflow log, (2) the generation of the synchronization structure of the workflow model, and (3) the generation of the model. Details of the transformation steps are given in [26].
The workflow mining tool InWoLvE (Inductive Workflow Learning via Examples) implements the described mining algorithm and two further induction algorithms, which are restricted to sequential workflow models. InWoLvE has an interface to the business process management system ADONIS [36] for interchanging workflow logs and process models. It has been successfully applied to workflow traces generated from real-life workflow models (such as the one shown in Fig. 10) and from a large number of artificial workflow models.
Further details and additional aspects such as a transformation from the SAG to a wellbehaved Petri net, an additional split operator for dealing with noise, and the results of the experimental evaluation are described in [26].

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航