I have one native pdf and trying to extract data from it.
The content is in table format and I tried to use anchor and text activities to extract data but everytime when I am trying to indicate text, its taking the whole table.
My scenario is this, I have one portal where I need to put these details, that portals has different text fields such as, Digital solution, offering version etc. If the portal has field such as offering version then I need to go to this pdf extract the offering version text which in this case is 1.1
Thanks for your reply, actually currently my solution is still in POC mode so my manager won’t approve to use cloud vision activities for now.
Also, there are some restrictions to use third party api in our organisation.
Well you can do this, Use Replace Activity to Replace whitespaces which are more than two with any special characters and Use that character as Delimiter to split into Columns
Thanks a lot for the code, it worked but I think in my case using dictionary will better option rather using the datatable as I need to put the extracted values into fields on a portal.
How should I modify my code accordingly so I can store the values in a dictionary?
If the Offereing version text always the second row in the data table? if so you can read the use the Read PDF text tool and use the split tool to assign each line of the table to an index of an array. then use the match or string manipulation to pull the version out.
<yourPdfTextVarible>.Split(Environment.NewLine.TocharArray) in the Assign block.
The same technique could be used for assigning to a dictoinary (key-value version) if you can find the delimiter between the cells of the table.
Thanks for your response, actually I have created one excel sheet using the data table (code share by @Pradeep_Shiv ) but everything has been populated in one row in the excel. Check below
I have also populated the data table on message box fyr, please see below