您的位置:首页 > 其它

工作流挖掘:相关问题和方法的研究(11)

2007-08-12 18:22 537 查看

9. 如何挖掘块结构的工作流?--一种数据挖掘方法。

本文所讨论的最后一个方法是针对挖掘块结构工作流所作的裁减。这与前面4章讲到的方法有2个值得一提的不同之处。第一,只关心块结构的工作流模式。第二,挖掘算法基于重写而非基于图形的技术。另外,该方法的目的是挖掘完整的和最小的模型:完整是指压缩的模型覆盖了所有记录的案例,最小是指仅覆盖那些记录的案例。为了达到这一目标,该方法用了一种比基于直接后继(参见第5章)的完整性理念更加健壮的完整性理念。
在我们实现从基于事件的数据挖掘一个工作流模型之前,有必要确定所输出的工作流模型是什么样子的,即:用到的工作流语言或所关注的工作六模型的类别。不同语言和类别的模型具有不同的元模型。我们将工作流元模型主要分为2组:面向图形的元模型和面向块的元模型。这种元模型(即:块结构工作流)的模型通常都是完整和合理的。
块结构的模型由相互嵌套的块构成。块结构模型的这些构成块可以分为操作符和常数。操作符构建过程流,常数则指那些包含在过程流内的任务或子工作流。我们通过设置一个操作符为工作流的起点和在获得所求流程结构时嵌套其他操作符来构建一个自顶向下形式的块结构模型。这一结构的底端,我们将常数插入了用于终止嵌套过程的操作数。一个块结构的工作流模型是一棵叶子总是操作数的树。
除了通过树来表示块结构模型,我们也可以将其定义为一组术语,令S表示操作符序列,P表示平行操作符,a、b、c表示3个不同的任务。举个例子,可以用S(a,P(b,c))来表示一个工作流:在任务b之前完成任务a的执行且任务c与b同步执行。由于模型的块结构,每一个术语都有完整的格式。进一步的,我们可以定义包含交换律、分配律、结合律等公理的代数。这些公理构成了我们用以挖掘工作流的术语重写系统的基础。关于元模型的详细地描述请参看参考文献[54]。
基于块结构的元模型,过程挖掘算法从基于事件的数据中提取工作流模型。算法包括顺序执行的5个步骤:
第1步,读取与一个特定过程有关的基于事件的数据并在数据的基础上为每个过程实例构建踪迹。踪迹是指包含一个过程实例在正确的时间顺序下的全部开始和完成事件的数据结构。在构建了踪迹之后,过程就在开始和完成事件的序列基础上压缩了。在过程图中每组踪迹组成了一条路径。
第2步,一个事件算法来从所有的踪迹组中构建一个初始的过程模型。这一模型表现为一种特殊的形式--正常分离形式(DNF)。以这种形式存在的过程模型从一个选择操作开始并在块内包含了所有可能的执行路径,就像这些块的构建不包含任何选择操作一样。对于每一组踪迹,算法都为其建立这样的块并将它加到座位模型的起源的选择操作上。
接下来的步骤是处理一些任务之间的关系,这些任务是随机顺序执行了一些没有真正先后顺序的任务之后的结果。这些虚假的优先关系必须被识别出来并从模型中移除。为了识别到这些表面的优先关系,术语重写系统将模型转换为了另一种形式,该形式列举了嵌套在所有选择操作内的并行操作内的全部任务序列。之后,一种查找算法确定其中哪些序列是虚假的优先关系。这可以通过在初始模型中寻找用于完全解释相应的块的序列的最小子集来确定。所有不在此子集中的序列都是虚假的优先关系并因此而被移除。步骤的最后,初始的转化由一个术语重写系统翻转了。
因为在DNF中构建了过程模型,所以必须分离模型的全部分支并移动部分在时间上与不能再推迟的点的可能性相同的分支。这通过使用另一个术语重写系统的转换过程来完成。它基于分配率并在对时间上较后的点进行转换选择操作时合并那些块。它也导出了一种模型的浓缩形式。
最后的步骤是一个基于结论树归纳法的可选项挖掘步骤。该步中对模型的每个结论点执行一个归纳过程。为了完成这一步骤,我们需要与每个轨迹的工作流内容相关的数据。从这些数据一种树归纳算法构建结论树。这些树转换为规则并附加到特定的选择操作。
在执行了所有的步骤之后,输出的是完整的、最小的块结构模型形式。有关过程挖掘程序的更多细节请参考[55, 56]。
对块结构模型的挖掘方法受到名为Process Miner的工具的支持。该工具能够从数据库或第4章中提到的XML格式的文件中读取基于事件的工作流数据。然后,在该数据上自动执行完整的过程挖掘程序。在没有提供内容数据时结论挖掘步骤是缺省的。
Process Miner的图形化界面如图12所示。它在图形化编辑器中以结构图和树的形式显示了输出模型。另外,它允许用户编辑一个模型并输出以作进一步的用途。它也包含一个工作流模拟组件。关于Process Miner的描述参见[57]。

9. How to mine block-structured workflows?––A data mining approach

The last approach discussed in this paper is tailored towards mining block-structured workflows. There are two notable differences with the approaches presented in the preceding four sections. First of all, only block structured workflow patterns are considered. Second, the mining algorithm is based on rewriting techniques rather than graph-based techniques. In addition, the objective of this approach is to mine complete and minimal models: Complete in the sense that all recorded cases are covered by the extracted model, minimal in the sense that only recorded cases are covered. To achieve this goal the approach uses a stronger notion of completeness than e.g. the completeness notion based on direct successor (cf. Section 5).
Before we can mine a workflow model from event-based data it is necessary to determine what kind of model the output should be, i.e., the workflow language being used or the class of workflow models considered. Different languages/classes of models have different meta-models. We distinguish two major groups of workflow meta-models: graph-oriented meta-models and block-oriented meta-models. This approach is based on a block-oriented meta-model. Models of this meta-model (i.e., block-structured workflows) are always well-formed and sound.
Block-structured models are made up from blocks which are nested. These building blocks of block-structured models can be differentiated into operators and constants. Operators build the process flow, while constants are the tasks or sub-workflows that are embedded inside the process flow. We build a block-structured model in a top-down fashion by setting one operator as starting point of the workflow and nest other operators as long as we get the desired flow structure. At the bottom of this structure we embed constants into operators which terminate the nesting process. A block-structured workflow model is a tree whose leafs are always operands.
Besides the tree representation of block-structured models we can specify them as a set of terms. Let S denote the operator sequence, P denote the operator parallel, and a, b, c denote three different tasks, the term S(a,P(b,c)), for example, represents a workflow performing task a completely before task b and task c are performed in parallel. Because of the model’s block-structure each term is always well-formed. Further on, we can specify an algebra that consists of axioms for commutativity, distributivity, associativity, etc. These axioms form the basis for term rewriting systems we can use for mining workflows. A detailed description of the meta-model can be found in [54].
Based on the block-structured meta-model a process mining procedure extracts workflow models from event-based data. The procedure consists of the following five steps that are performed in sequential order.
First, the procedure reads event-based data that belongs to a certain process and builds a trace for each process instance from this data. A trace is a data structure that contains all start and complete events of a process instance in correct chronological order. After building traces, they are condensed on the basis of their sequence of start and complete events. Each trace group constitutes a path in the process schema.
Second, a time-forward algorithm constructs an initial process model from all trace groups. This model is in a special form called disjunctive normal form (DNF). A process model in this form starts with an alternative operator and enumerates inside this block all possible paths of execution as blocks that are built up without any alternative operator. For each trace group such a block is constructed by the algorithm and added to the alternative operator that builds the root of the model.
The next step deals with relations between tasks that result from the random order of performing tasks without a real precedence relation between them. These pseudo precedence relations have to be identified and then removed from the model. In order to identify pseudo precedence relations the model is transformed by a term rewriting system into a form that enumerates all sequences of tasks inside parallel operators embedded into the overall alternative. Then, a searching algorithm determines which of these sequences are pseudo precedence relations. This is determined by finding the smallest subset of sequences that completely explains the corresponding blocks in the initial model. All sequences out of the subset are pseudo precedence relations and therefore removed. At the end of this step, the initial transformation is reversed by a term rewriting system.
Because the process model was built in DNF, it is necessary to split the model’s overall alternative and to move the partial alternatives as near as possible to the point in time where a decision cannot be postponed any longer. This is done by a transformation step using another term rewriting system. It is based on distributivity axioms and merges blocks while shifting alternative operators towards later points in time. It also leads to a condensed form of the model.
The last step is an optional decision-mining step that is based on decision tree induction. In this step an induction is performed for each decision point of the model. In order to perform this step we need data about the workflow context for each trace. From these data a tree induction algorithm builds decision trees. These trees are transformed into rules and then attached to the particular alternative operators.
After performing all steps, the output comes in form of a block-structured model that is complete and minimal. The process mining procedure is reported in more detail in [55, 56].
The approach on mining block-structured models is supported by a tool named Process Miner. This tool can read event-based workflow data from data-bases or from files in the XML format presented in Section 4. It then automatically performs the complete process mining procedure on this data. The decision-mining step is omitted if no context data are provided.
Process Miner comes with a graphical user interface (see Fig. 12). It displays the output model in a graphical editor in form of a diagram and a tree. Additionally, it allows the user to edit a model and to export it for further use. It also contains a workflow simulation component. A description of Process Miner can be found in [57].

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: