I am trying to get this table data from PDF to excel using activities and not Document understanding.
This is the sample data, similarly there are many more files. Single FIle Sample.pdf (390.4 KB)
I am using:
1 . Read PDF Text to get the string text.
2. Using string manipulation.
3. Generate Data Table to convert that to Data Table
4. Write range to write the data to excel.
I am not able to get the values in structured format as it is in PDF, can anyone help me with that ASAP? please share the expression as that will help me, I am new to this domain.
@Sudharsan_Ka Thanks for your response, I am not from Technical / coding background, I took a shift from management domain, that is why I am finding help here, I don’t know regex yet. at learning phase
If you could help me with that I can try that out.
The same way as you created
1 . Read PDF Text to get the string text.(Properties-> Preserve Formating =True)
3. Generate Data Table to convert that to Data Table
If you have more files in the folder
Steps will be like :
1 )For Each Item in Directory.GetFiles(“The Path of your folder in which the files will be there”,“*.pdf”) (This will get all the excel files and store it as string of array)(Properties->Arguments=String)
Please avoid attaching only xaml files as reponses to questions.
What is the approach you used? Show the person asking the question why the suggested approach is a good / reliable approach.
People who read this thread will benifit from description of your approach. Xaml files alone are not really helpful. Let the readers review different approaches.
At the minimum, try to attach a screenshot of your workflow with some annotations.
You can also use the Search and Advanced Search to find similar topics and filter to solved. It is a powerful / quick way of navigating to solved queries in the forum.
This worked well with Sample I provided as it was generated using excel, but the actual file I am having is an exported PDF report, generated using a software. as the details are confidential it will not be possible to share the actual PDF file with you, but all I can say is that format looks same in that PDF as well.
Seems like that PDF does not retain format, and it is merging first and second column as one and writing to coloum1. output is messy.
first column contains only integer
second column (“Name”) contains 17 characters with underscore included in between.
similarly other column are varying in character count.
Is there a better way to do that??
In Parallel i’m looking into what was shared by @jeevith