I have multiple pdf and want to extract the data dynamically for all pdf’s.
I have attached the screen shot for reference.according to the screenshot
I want to extract the below fields
1.Finding
2.Root cause
3.Risk Assessment
4.Recommendation
5.Management Response and target date
6.Responsible Party
Note:The points highlighted in red colour in image are the seperate points…Like that we have more than 10 points in each point we have the above points to be extracted dynamically.
I’m new to this pdf extracting work,so please give the solution for who to do it.
Use read pdf text to read the pdf and store it in string.
Use regex or string manipulations to extract the data in the pdf
System.text.RegularExpressions.Regex.Match(“YourPdfText”,“(?<=\n)[A-Z]+[a-z]+*(?=:)”)).Value
Want to extract the data which present in the above points heading…the points mentioned are the sub heading
And the
Moderate…
2.Minor…
Are the points heading
And the main heading will be
Detailed audit findings
Only Main heading “DETAILED AUDIT FINDINGS” will be constant…
Sub heading - Points 1…,2…will not constant
And the subheading inside points 1 …,2… will be constant.
Using this syntax you are able to extract the
2.Root cause
3.Risk Assessment
4.Recommendation
5.Management Response and target date
6.Responsible Party
The red highlighted are common heading
The green highlighted are Subheading with points as number
The yellow highlighted are the files to extract dynamically
Note.: Like this have 10 or more Green subheading with points in pdf…
Page number for this is not constant…there are multiple pages in pdf file and not sure which page this part will exist …so the above code which you have shared is not working…for sample I have attached the image…
As we have understood the data is confidential or could not be sent over for testing from our end, we suggest you to go through the Below tutorials done.
Understanding the above tutorials would help you solve many more problem statements with Regex or String Manipulation and understand the right criteria for using them.
Try out the Regex Expressions and The data that you have in either Matches activity (we can test the data there) or if accessible you could go to the below website and check your expressions there.