I thought I was imagining this. But for the second time in a week, all the files under the \DocumentUnderstanding folder of my DU Workflow have been wiped out!
I am sure this time because I had backed up my project. Today I made extensive changes to the Form Extractor templates, successfully executed the project and extracted the data. However, few seconds after successful execution, the \DocumentUnderstanding folder has been reset.
And with it the taxonomy, keyword JSONs, and the Form Extractor template ZIP files have been completely wiped out!
Before: The \DocumentProcessing folder from my backup
After: Output Excel File successfully generated as intended
After: The \DocumentProcessing folder in the Project has reset!
Hi @AndyMenon - Strange. I never faced this issue in my DU which I built in Oct 20. Even I showed a demo to my management recently. All my files are still there.
Highlighted File is my final output from my workflow which was generated on 12\23.
Yes, I did not expect this to happen. When it happened last Wednesday, I thought I made a mistake.
But this time I am sure of it. It’s good that I backed up some of my work yesterday, but everything that I did today has been wiped out.
I’m lucky that I have the output Excel file and the Logs to prove this fact.
I’ve scanned the Recycle Bin, but it looks like the deletion is permanent!
Let’s suppose I made a mistake. For that to happen, I have to go into the \DocumentProcessing folder and shift+delete everything and then put a vanilla taxonomy.js file! There is no way I can keep going after making that many mistakes!
So, it begs the question, why this is happening. The only thing I can say about this project is that most packages are Preview versions as seen from the Package stack.
Ok, I spent a couple of hours rebuilding all the work I lost yesterday
I’ve backed up all my work and posting screen shots here with the time stamps for the record.
The key difference this time is that I have put in only the .JSON files in the \DocumentProcessing folder. All other sub-folders have been moved out into the parent project folder.
My theory is that putting in anything other than these JSON files is causing my \DocumentProcessing folder to be reset to default status with an empty taxonomy.json left inside it! The reason I believe that is this time this folder did not get reset after the Excel file was successfully generated.
DocumentProcessing Folder has only JSON files
All other folders moved out to project folder | Excel generated successfully
Log of the latest execution matching the timestamps above
If the folder resets again, we’ll know!
This is the weirdest issue!
I will definitely try this, though it should NOT happen…
How did you run this project? Did you run it from Studio, or did you start a job from Orchestrator?
Here are the approximate sequence of actions that led to the reset event and I hope this helps you in some way.
- I had one project that I had created several months ago (back in 2020) - I revisited it last Late Wednesday-early Thursday and this project reset on me after I modified the taxonomy.json by backing up the older file.
- At this point, I considered this to be my mistake and so, I created an entirely new project on Saturday (screen shot of the two separate projects with their time stamps below)
Followed the same steps, but this time, I frequently backed up my project folder (Thank God for that!)
After the final successful run, the folder reset again!
**Common Actions between both projects run directly from Studio:**
1. All Metrics in taxonomy are Table datatypes
2. Taxonomy.json was modified multiple times between each project run
3. All JSON files, documents and document templates were placed under the \DocumentProcessing folder
Can you constantly reproduce this?
Any chance you could send us a workflow that repro’s this?
The only thing I can think of, is that maybe you are triggering the process from Orchestrator, and somehow the DocumentProcessing folder gets reset to some initial state?
or are you running the project from your Studio?
It’s been run from studio all the time. I haven’t even got to deployment yet. This is a POC and I don’t know if I will have the need to deploy it to the Orchestrator.
Yesterday, I made changes to the taxonomy once more. Made sure I hit the license limit of 2 pages for the CE form extractor where we get that error message from the API.
And then I changed the taxonomy to remove metrics from one page and executed the project .
I did not see any resets of the folder.
The idea was to reproduce the behavior by making too many changes to the taxonomy to see if that is the reason.
Maybe I should put everything (documents, backups and templates) back into the \DocumentProcessing folder and try again.
Thank you @AndyMenon for keeping us posted!
Do let me know if you manage to reproduce this somehow - it seems like a pretty serious issue!
I sure will. But it hasn’t happened since I stopped doing this:
All JSON files, documents and document templates were placed under the \DocumentProcessing folder.