您的位置:首页 > 其它

工作流挖掘:相关问题和方法的研究(9)

2007-07-29 11:31 543 查看

7.怎样检测一个挖掘的工作流模型的质量?--一种实验性的方法

正如我们在第5章提到的,由若干类Petri网可以用来证明挖掘模型与原始的Petri网一致或有相似的行为。这一章,我们寻找更多的通用方法来检测工作流的挖掘模型的质量。
对于工作流挖掘模型质量的一个重要的评判标准是挖掘模型和工作流日志中线索之间的一致性。因此,作为对一种挖掘模型的标准检查是试着执行挖掘模型中工作流日志中的所有路径。如果一个案例的路径不能在Petri网中执行,就表明在日志和模型之间存在差异。这是第一项简单的检查。然而,对于每一个工作流日志,定义一个简单的模型去生成工作流日志(以及其它)中的所有路径是有可能的。另一个问题是包含噪音(即:错误在日志中而非模型中)的路径的执行。
在我们实验性的设置中,假设我们都了解用于生成工作流日志的工作流模型。在这一小节里我们专注于通过将工作流挖掘模型与原始模型(用于生成挖掘所用工作流日志的工作流模型)相比较来检测其质量的方法。我们通过规定正确探测到的基本关系的数量(即:在前一章中描述的R表的正确率)来检测挖掘模型的质量。
基本思想是定义一种测试平台来检测不同工作流挖掘方法的行为。为了生成用于模拟真正的工作流日志的测试材料,我们区分出那些在各个工作流之间存在不同而且随后会影响到工作流日志的因素。他们是:(i)事件总的类型数,(ii)工作流日志中的可用信息量,(iii)噪音的多少以及(iv)OR分支和AND分支的分布情况。因此,我们使用了一个数据生成过程,该过程通过下面的方法给这四个因素设置不同的值:
1. 任务类型数:我们生成分别包含12、22、32和42类事件的Petri网。
2. 过程日志的信息量或日志大小:信息量通过在案例树目上的不同来表示,我们设计的日志分别包含200,400,600,800以及1000个案例。
3. 噪音的大小:我们通过在代表不同案例的事件序列上执行4个不同的操作来制造噪音:(i)删除一个事件序列的头,(ii)删除一个事件序列的尾,(iii)删除序列体的一部分以及(iv)交换两个随机选择的事件。在噪音产生的过程中,最少1个事件,最多序列的1/3将被删除。我们制造的噪音有5级:0%(初始的工作流日志是没有噪音的)、5%、10%、20%和50%(我们选择原始事件序列的5%、10%、20%和个别的50%,并且应用了上述提到的噪音产生步骤之一)。
4. 执行优先级的差异:我们假设任务执行的优先级介于0到2。图9中,在执行了事件A后就存在一个选择(OR分支)。这种选择可能是均衡的(即任务B和F有相同的可能性),也可能是不均衡的。举个例子,任务B可能有0.8的执行优先权,而任务F的执行优先权则为1.5,这使得F发生的机会几乎是B的两倍。执行不均衡性的产生分为4个级别:
l 0级,没有不均衡:所有任务的执行优先级均为1;
l 1级,轻微的不均衡:每个任务执行的优先级在0.9和1.1之间随机选择;
l 2级,中度不均衡:每个任务执行的优先级在0.5和1.5之间随机选择;
l 3级,高度不均衡:每个任务执行的优先级在0.1和1.9之间随机选择。
预想过程所产生的工作流日志允许对不同的工作流挖掘方法进行测试,特别是需要评估在存在噪音和数据不完整的情况下的健壮性的时候。我们使用生成的数据来测试我们在前面讨论过的启发性方法。
实验表明对于发现因果、独占和平行关系,我们的方法是相当准确的。事实上,在存在不完整性、不平衡性和噪音的情况下,几乎全部的关系都能被发现。进一步的,我们还有如下发现:
l 正如所料,较多地噪音,较低的平衡性和较少的案例都会对(挖掘)结果的质量产生消极影响。如果存在较少的噪音,较高的平衡性和较多的案例,因果关系(即:a àW b)能够得到更加准确的预测。
l 没有明确的证据能够证明:事件类型的数量会对预测因果关系得行为有影响。然而,在一个结构复杂的Petri网中,对因果关系的探测会比较困难。
l 由于独占/平行关系(a#W b和a||W b)的探测依赖于因果关系,因此对独占/平行关系的质量很难有定论。看起来,噪音对独占和平衡关系的影响与因果关系类似,例如,随着噪音级别的增加,发现平行性的准确度就下降了。
在挖掘真正的工作流数据的时候,上述结论可作为有用的参考。
通常,实现是很难知道噪音和不平衡性的级别的。然而,在挖掘的过程中是有可能搜集关于这些关系更多地数据的。这一信息可以激发(我们)作进一步(附加)的努力去收集数据。
软件ExperDiTo(实验用到的挖掘工具)支持这一实验方法。可用的生成数据可以通过下面的连接进行下载:http:// tmitwww.tm.tue.nl/staff/lmaruster/。

7. How to measure the quality of a mined workflow model?––An experimental approach

As we already mentioned in Section 5, there are classes of Petri nets for which we can formally prove that the mined model is equivalent or has a behavior similar to the original Petri net. In this section we search for more general methods to measure the quality of mined workflow models.
An important criterion for the quality of a mined workflow model is the consistency between the mined model and the traces in the workflow log. Therefore, a standard check for a mined model, is to try to execute all traces of the workflow log in the discovered model. If the trace of a case cannot be executed in the Petri net, there is a discrepancy between the log and the model. This is a simple first check. However, for each workflow log it is possible to define a trivial model that is able to generate all traces of the workflow log (and many more). Another problem is the execution of traces with noise (i.e., the error is not in the model but in the log).
In our experimental setup, we assume that we know the workflow model that is used to generate the workflow log. In this subsection we will concentrate on methods to measure the quality of a mined workflow model by comparing it with the original model (i.e. the workflow model used for generating the workflow log used for mining). We will measure the quality of the mined model by specifying the amount of correctly detected basic relations, i.e. the correctness of the R-table described in the previous section.
The basic idea is to define a kind of a test bed to measure the performance of different workflow mining methods. In order to generate testing material that resembles real workflow logs, we identify some of the elements that vary from workflow to workflow and subsequently affect the workflow log. They are (i) the total number of events types, (ii) the amount of available information in the workflow log, (iii) the amount of noise and (iv) the imbalance in OR-splits and AND-splits. Therefore, we used a data generation procedure in which the four mentioned elements vary in the following way:
1. The number of task types: we generate Petri nets with 12, 22, 32 and 42 event types.
2. The amount of information in the process log or log size: the amount of information is expressed by varying the number of cases. We consider logs with 200, 400, 600, 800 and 1000 cases.
3. The amount of noise: we generate noise by performing four different operations on the event sequences representing individual cases: (i) delete the head of a event sequence, (ii) delete the tail of a sequence, (iii) delete a part of the body and (iv) interchange two randomly chosen events. During the noise generation process, minimally one event and maximally one third of the sequence is deleted. We generate five levels of noise: 0% noise (the initial workflow log without noise), 5% noise, 10%, 20% and 50% (we select 5%, 10%, 20% and respectively 50% of the original event sequences and we apply one of the four above described noise generation operations).




4. The imbalance of execution priorities: we assume that tasks can be executed with priorities between 0 and 2. In Fig. 9 there is a choice after executing the event A (which is an OR-split). This choice may be balanced, i.e., task B and task F can have equal probabilities, or not. For example, task B can have an execution priority of 0.8 and task F 1.5 causing F to happen almost twice as often as B. The execution imbalance is produced on four levels:
• Level 0, no imbalance: all tasks have the execution priority 1;
• Level 1, small imbalance: each task can be executed with a priority randomly chosen between 0.9 and 1.1;
• Level 2, medium imbalance: each task can be executed with a priority randomly chosen between 0.5 and 1.5;
• Level 3, high imbalance: each task can be executed with a priority randomly chosen between 0.1 and 1.9.
The workflow logs produced with the proposed procedure allow the testing of different workflow mining methods, especially when it is desired to assess the method robustness against noise and incomplete data. We used the generated data for testing our heuristic approach discussed in the previous section.
The experiments show that our method is highly accurate when it comes to finding causal, exclusive and parallel relations. In fact we have been able to find almost all of them in the presence of incompleteness, imbalance and noise. Moreover, we gained the following insights:
• As expected, more noise, less balance and less cases, each have a negative effect on the quality of the result. The causal relations (i.e. a àW b) can be predicted more accurately if there is less noise, more balance and more cases.
• There is no clear evidence that the number of event types have an influence on the performance of predicting causal relations. However, causal relations in a structurally complex Petri net (e.g., non-free choice) can be more difficult to detect.
• Because the detection of exclusive/parallel relations (a#W b and a||W b) depends on the detection of the causal relation, it is difficult to formulate specific conclusions for the quality of exclusive/parallel relations. It appears that noise is affecting exclusive and parallel relations in a similar way as the causal relation, e.g., if the level of noise is increasing, the accuracy of finding parallelism is decreasing.
When mining real workflow data, the above conclusions can play the role of useful recommendations.
Usually it is difficult to know the level of noise and imbalance beforehand. However, during the mining process it is possible to collect more data about these metrics. This information can be used to motivate additional efforts to collect more data.
The software supporting this experimental approach is called ExperDiTo (Experimental Discovery Tool). The generated data are available to be downloaded as benchmarks from http://tmitwww.tm.tue.nl/staff/lmaruster/.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: