could you please advise:
I need to get some information from pdf attachments (various customers, so each pdf can vary) like Invoice no, Delivery date, Amount etc.

  • I am able to split pdf to lines (with output.Split(Environment.NewLine.ToArray, StringSplitOptions.RemoveEmptyEntries))

  • I can find for example “Delivery date” string and get delivery date, but my questions are:

  • “Delivery date” might 3 times on pdf, how to find 1st occurrance and get date next to it?

  • “Delivery date” string and the date as such can be always in different structure on pdf, like right next to the string, or right below, or at the end of the row (due to different customers) - how to assure that I always get the date? Do I need to create code for each variation?

Share your sample input string and required output string



Check Attached,

BlankProcess13 (2).zip (18.0 KB)

I tried to do the OCR, but due to bad results, it will give you different values

but if you are getting good OCR results then this flow will work as expected

Thank you ksrinu. I can use your proposal.
I also found this solution: 1. number_Start = myText.IndexOf(“Nummer”), 2. Calculate = number_Start.Substring(number_Start.length-3,3), 3. Start =Calculate.ToInt and 4. Write Line: "Nummer: " + myText.substring(Start+7,7)

Thank you for your help!

