Description
Process-oriented data analysis techniques allow organizations to understand how their processes operate, where modifications are needed and where enhancements are possible. A recurrent task in any process analysis technique is querying. Process data querying allows analysts to easily explore the data with the intent of getting insights about the execution of business processes. Existing process querying techniques require end users to be knowledgeable of the query language and the database schema. However, a key success factor for process analysis is to make querying accessible to business experts who may be inexperienced in database querying. We address this challenge by proposing a natural language interface (NLI) for querying event data. The interface allows users to formulate their questions in natural language and to automatically translate the questions into a structured query that can be executed over a database. We use graph based storage techniques, namely labeled property graphs, which allow to explicitly model event data relationships. As an executable query language, we use Cypher which is widely used for querying property graphs. The approach has been implemented and evaluated using two publicly available event logs. All of the files and results mentioned in this document can be found in Results repository.
Graph-based storage of event data
We proposed a graph metamodel for storing event data which is based on labeled property graphs. Figure 1 illustrates a graphical representation of our proposed event graph meta-model. The nodes represent the type of information that is relevant to processes (i.e. activities, actors, business roles and artifacts). Nodes can be connected via six different relation types. The proposed graph model allows for explicitly represent the relation between entities. We use Cypher language which is a declarative graph query language that allows for expressive data querying in a property graph.
NLI for querying process execution data
To easly query the stored process data, we proposed a NLI system that allows the user to query in natural language the populated event graph model.
The pipeline we developed in our approach is illustrated in Figure 2. It takes as input a natural language user query, translates it to a Cypher query,
which can be executed over the process event data stored in an event property graph. The query's result is then returned to the user.
We propose a hybrid pipeline that takes advantages of machine-learning and rule-based approaches. The pipeline is made up of two main stages.
In the first stage, we apply two main tasks of natural language processing, namely intent detection and entities extraction.
First, the detected intent assists the system in determining what type of information the user is seeking.
Second, we use direct mappings between natural language words and graph elements to extract entities.
In the second stage, a rule-based approach is proposed to build the corresponding graph query based on the intent and entities provided by the first stage.
An example of a natural language query with the corresponding detected intent and extracted entities, as well as the constructed cypher query is shown in Figure 3.
Evaluation
To evaluate our approach, we used two publicly available event logs: BPIC 2017 and order management log represented in OCEL format. The BPIC'17 log contains data describing a loan application process from filling out a loan application to decision-making (approving or declining).
The data set is presented in BPI_Challenge_2017.xlsx file. After filtering the data using ProM,
we obtained 2105 unique event records for 26 different activities. There are 154 applications, 183 offers, and 240 workflows.
The activities were performed by 69 actors, and no business roles were included. The data is stored as a graph database in Neo4j, yielding 2157 nodes and 10692 relations.
In order to collect natural language queries related to the loan application process, a workshop was held with two different groups of Master students. The
students were not familiar with the implementation, and are unaware of the Cypher language or how to access the stored data using graph queries.
We filtered the NL queries at the end of the workshop to get only the ones that are not yet supported by our system (i.e. complex queries that require for
example sub-queries and negation, queries related to the performance category, etc.). As a result, we ended up with more than 300 of content queries and more than 70 of behavioural queries.
The collected queries used in the evaluation are presented in the files BPIC_Content_EvaluationQuestions and
BPIC_Behavioral_EvaluationQuestions.
The order management log includes data that keeps track of customer orders from the time they are accepted until they are delivered.
The data set is presented in Order Management.xml file.
The log contains 22367
unique event records for 11 different activities and 5 different object types. There are 11522 object instances including: 17 customers, 20 products, 8159
items, 2000 orders, and 1326 packages. The data is stored in a graph database in Neo4j, which yields 33889 nodes and 750710 relations.
We used a paraphrasing tool to collect natural language queries about the order management process. The parapharsing task provides syntax and lexicon
variations of a natural language text without changing the original meaning. We use the tool to paraphrase a set of manually generated questions. At the
end, 65 of content queries, and 85 of behavioral queries were generated.
The collected queries used in the evaluation are presented in the files OCEL_Content_EvaluationQuestions and
OCEL_Behavioral_EvaluationQuestions.
The experiments were carried out with the help of a python application that we developed with a conversational interface. The file Instructions.txt contains the instructions for launching the tool and repeating the experiment.
We conducted separate experiments on the two components of the pipeline to evaluate various aspects in a controlled environment. We performed two main experiments to evaluate various aspects of the approach in a controlled environment. In the first experiment, we evaluated the performance of the machine learning model in detecting intent and extracting entities from a natural language query. Additionally, we compared ML-based intent recognition model to a Rule-based intent recognition model. In the second experiment, we assess the importance of the intent detection step in the construction of the corresponding Cypher queries. Accordingly we compared the query construction's accuracy of our intent-based system to a baseline that does not include an intent detection step.
NLU component evaluation
We conducted two major experiments on the first component to justify the
use of ML for intent detection and entity extraction. The main disadvantage
of using machine learning, as discussed in the related works section, is that it
requires a training dataset, which is not always available. In this experiment, we
aim to evaluate the performance of a machine learning model for detecting intent
and extracting entities using a small training dataset. Second, we compared the
results of the machine learning model for intent detection to those of a rule-based
approach.
For each process domain, a machine learning model in Wit.ai is trained
with typically a small set of utterances labeled with their corresponding intents
and their associated entities. The created utterances for the BPIC'17 log, are presented in
BPIC_Content_trainingQuestions and BPIC_Behavioral_trainingQuestions files,
and for the order management log are presented in OCEL_Content_trainingQuestions and OCEL_Behavioral_trainingQuestions files.
The evaluation results obtained using the BPIC'17 log for content and behavioral queries are presented in
BPIC_Content_NLPEvaluation
and BPIC_Behavioral_NLPEvaluation files respectively.
The evaluation results obtained using the order management log for content and behavioral queries are presented in
OCEL_Content_NLPEvaluation
and OCEL_Behavioral_NLPEvaluation files respectively.
In the second experiement, we compared the machine learning model in Wit.ai for intent detection to a baseline that uses a rule-based approach. The baseline was implemented separately. It attempts to detect the corresponding intent based on defined rules.
The rules are based solely on the extracted entities and some defined trigger words.
The evaluation results obtained using the BPIC'17 log for content and behavioral queries are presented in
BPIC_Content Intent rule based
and BPIC_Behavioral Intent rule based files respectively.
The evaluation results obtained using the order management log for content and behavioral queries are presented in
OCEL_Content Intent rule based
and OCEL_Behavioral Intent rule based files respectively.
As evaluation metrics, we used accuracy for intent detection, and precision/
recall/ F-score for entity extraction. The accuracy is computed as the number
of questions with correctly detected intent divided by the total number of NL
queries. For each NL query, the precision/ recall/ F-score of entities extraction
is computed. The precision for a given NL query is calculated by dividing the
number of correctly extracted entities by the total number of entities extracted.
The recall is calculated by dividing the number of correctly extracted entities
by the total number of entities expected to be extracted. The metrics are then
averaged over all NL queries.
Query constriction evaluation
The evaluation’s goal of this component is to determine whether the query
construction component is able to construct the right Cypher query from the
detected intent and extracted entities. By design, our constructed queries are
syntactically correct. Therefore, we evaluate whether
they are semantically correct (i.e. they return the correct result as inquired by
the user). We compare the intent-based approach (i.e. which takes the detected
intent and extracted entities to construct the Cypher query) to a baseline that does not involve an intent detection step (i.e. it does not take into account
the detected intent).
The NL queries are grouped into two categories. The first category
(i.e. category 1 ) consists of the NL queries that include the information to be
returned in the extracted entities. The second category (i.e. category 2 ) consists
of the NL queries that inquire about a specific type of information that is not
present in the extracted entities.
For each NL query, we examined whether the generated Cypher query returned
the expected answer. As evaluation metric, we computed the accuracy
by dividing the number of semantically correct Cypher queries by the total
number of queries.
The results of each evaluated query related to the the BPIC'17 log are presented in the files
BPIC_Content_QueryConstructionEvaluation and
BPIC_Behavioral_QueryConstructionEvaluation files.
The results of each evaluated query related to the the order management log are presented in the files
OCEL_Content_QueryConstructionEvaluation and
OCEL_Behavioral_QueryConstructionEvaluation files.
Contributors
- Meriana Kobeissi
- Nour Assy
- Walid Gaaloul
- Boualem Benatallah
- Bruno Defude
- Bassem Haidar
Contact
-
Meriana Kobeissi
Telecom SudParis
Computer Science Departement
email: meriana.kobeissi@telecom-sudparis.eu