Process Mining as One Base for the Generation of Synthetic Test Data

Today I read at Forbes about 10 AI predictions for 2022. The 7th point in particular caught my attention: Multiple large cloud/data platforms will announce new synthetic data initiatives.
Rob Toews writes: “Getting the right data is the most important and the most challenging part of building AI products today. Synthetic data offers compelling advantages over the status-quo approach of collecting and labeling real-world datasets.”

This approach is not only interesting for products in the area of artificial intelligence. Likewise, the generation of synthetic data is also very interesting for test automation.

Wikipedia defines Process Mining as “use event data to show what people, machines, and organizations are really doing.” In my opinion is this one pillar for generating synthetic data.

UiPath offers a Process Mining solution and a Test Suite, and they work on solutions of AI in the context of RPA. So I asked myself if there are plans or approaches to bring this together for the generation of synthetic test data? In my opinion this could be very profitable.

Could I bring this into your attention? @Gernot_Brandl @ThomasStocker


@Ana_Bacioiu FYIP

1 Like

@StefanSchnell thx for sharing your thoughts on this topic - indeed very insteresting!
Data is the fuel of any business. For RPA the data processed is mainly production data by nature, in contrast to this Testing requires robust test data, preferrably synthetic data to meet common data regulations like GDPR.
We are working on several fronts to make dealing with data as easy as possible for our customers:

  • Process Mining is gathering real production data to extract valuable insights
  • Test Suite offers capabilities to generate typed synthetic data to empower test engineers to create their own data and not being depended on other teams in their organization
    Besides that there are new initiatives in the works in the same direction:
  • Semantic Automation: an intuitive way to process data without the need of labeling it
  • Test Prioritization based on Process Mining: instead of using anecdotal feedback to prioritize your tests we would like to leverage production insights from Process Mining
1 Like

Hello Thomas,

thank you very much for your feedback.

“Test Suite offers capabilities to generate typed synthetic data” That sounds very interesting. Do you have a link to a more detailed description.

Process Mining knows exact how the data flows, it knows also which data type flows and, in dependency to the detail level of the log, the data itself. This could be a (further) basis for the generation of test data in the Test Suite, so my idea.

Best regards

Hi @StefanSchnell ,

synthetic data is defined as data that was generated synthetically without connection to real production data to avoid conflicts with data regulation and privacy.
The challenge for synthetic data is that (1) it has to meet the required data types (String vs Data vs ID…) of your production data and (2) it has to meet the required relations between data points/data variations.
This is a highly complex topic and there are tools available that try to handle those issues but in most cases require a lot of hands-on implementation.
Our approach right now is a pragmatic one - we try to support you with 2 different areas:
(1) data management/storage
Here we offer concepts like data-driven Testing via Excel sheets, Test Data Queues or our Data Service
(2) data generation
Here we offer a set of activities that allow you to generate typed data

Besides that, we are now working on capabilities to auto-generate test data based on your arguments - a first version of those capabilities should be expected in 22.4.

1 Like