Process Mining - Multiple Case file Input

Hi,

I tried uploading our source data in multiple chunks say i have 3 case_raw files with data. Now I need to upload all three files into single custom process app. All the 3 case file data should get appended in single case_raw file. When i checked uploading into Dev data of process app, the case file is getting overwritten with data. is it possible or is there any other way to get all 3 file data in single case_raw file (append with next file data and not getting overwritten).

Hey @srijanani.kj ,

Thanks for reaching out. Yes, this is possible, hereā€™s a guide to describe how to do this: Process Mining - Merging event logs

This documentation helped us in merging multiple event logs. My use case scenario has multiple case raw files. In which each case files has 10000 records and 3 such case files have to be merged into single case raw file inside process mining custom app.

The same approach as described for event_logs can also be used for merging together multiple case_raw files. All files will have to get a different name, like Cases_raw1, Cases_raw2, cases_raw3.

The SQL for the Cases_raw file then would look like this.

-- The following code merges 3 input tables.
select * from {{ ref('Cases_input_1') }}
union all
select * from {{ ref('Cases_input_2') }}
union all
select * from {{ ref('Cases_input_3') }}
1 Like

When we code as above - 3 case input files in main case_log_input. During compilation, it expects first case file to be compiled but that cannot be possible. facing issue as ā€œmain case file depends on node named case_raw1, which was not foundā€.

It might be that the 3 source files are not references yet in sources.yml. Could that be the case?
See Process Mining - Adding source tables

I have already added those new case files in sources.yml, but still facing the same error.

Ok good. I expect thereā€™s a type somewhere then. Could you maybe share the SQL code and the error log?

I have my case files as below:

File 1 : Cases_raw
File 2 : Cases_raw2
File 3 : Cases_raw3

the input sql file on each case file is with different source tables respectively. the main case sql file is named as ā€œFinal_Case_raw.sqlā€ with below code snippet:

select * from {{ ref(ā€˜Cases_rawā€™) }}
union all
select * from {{ ref(ā€˜Cases_raw2ā€™) }}
union all
select * from {{ ref(ā€˜Cases_raw3ā€™) }}

Also, I have added the new case tables to the sources.yml file as below:

-name: Cases_raw2
tests:
-pm_utils.exists
columns:
-name: ā€˜ā€œCase_IDā€ā€™
tests:
-pm_utils.exists
-pm_utils.not_null
-pm_utils.unique

Still facing the error - ā€œFinal_case_raw depends on node Cases_raw, which was not foundā€. Kindly check and let me know if I missed something.

Do you still have Cases_raw.sql under your models? and does it still contain the sql code to read from the source_table and then model the table?

The error message indicates that you have missed something fundamental in your changes like either sql file or the model name or schema.

Can you please export the transformations you have and share them here? We can take a look at them to fix them.
The image shows a section of a user interface with a folder structure for "Transformations," including "dbt_packages" and "macros," and a dropdown menu with options to "Export transformations" and "Import transformations." (Captioned by AI)

As we have signed SOW on this prospect, we might not able to share the entire data here. So I explained you the scenario with dummy data on the above reply.

Can you let me know what are the basic step checks for the above error. I also checked the sqls are correct with source tables. The data model doesnā€™t have cases_raw2, since it will not have primary key. Are there anything to investigate more.

@srijanani.kj - We are not asking for data, but you need to provide some details to able to help you.

You need to have at minimum three things correctly done -

  1. Have the raw files uploaded before you run transformations.
  2. Have the raw filed defined in sources.yml.
  3. Model files defined for all the individual raw files.

If any of the above is missing, you will run into issues.

  1. Have the raw files uploaded before you run transformations.
  • yes already uploaded all case_raw files
  1. Have the raw filed defined in sources.yml.
  • added case_raw tables in sources.yml as mentioned in above code snippet
  1. Model files defined for all the individual raw files.
  • as in attached screenshot, iā€™m unable to add case_raw2 table since there is no primary key listed out there

Let me know if Iā€™m missing anything

Thanks for clarifying, and thatā€™s the source of confusion. I think we should have been clearer when we tried to say that the files must be in the model.

The screenshot you are showing is part of the output data model and the raw filed need not be added there.

Do you have the four cases files defined as part of the Transformations>models>1_input?

  • Cases_raw.sql
  • Cases_raw2.sql
  • Cases_raw3.sql
  • Final_case_raw.sql
    like the following screenshot?
    The image shows a file directory structure for a project under "Transformations" with nested folders and SQL files, specifically highlighting the "Event_log_input_3.sql" file within the "1_input" folder. (Captioned by AI)

The Final_case_raw.sql will be the combination of all the three files as you had written earlier but the individual Cases_rawX.sql needs to model the input from the source file.