Hey There Everybody! I recently just started an internship as a RPA developer this year and I have ran into an issue trying to scrape data off of a PDF. I cannot give you the details of the PDF, so I will go ahead and give you a fake scenario:
Say I have a PDF for ordered foods. I have read in the data into a variable and have split the PDF data using a Substring and Environent.NewLine and have been able to access the data using a for each loop. My code looks something like this:
CurrentItem IN Split(PDFSubstringWithItems, Environment.NewLine)
Log Message: CurrentItem
With this, I have been able to gain access to each line of data I need. The Data looks something like this: (Let’s say it is a PO for a grocery store lets say)
1 FGHI876 590843 BG 10.00 $25.00 $250.00
Organic Green Apples
2 POJQ3498 78654 BG 5.00 $25.00 $125.00
Fresh Ripe Oranges
3 MNGET4321 09473 BG 4.00 $8.00 $32.00
Frozen Angus Beef Burgers Grass Fed
1 is the line number, FGHI876 is the item code and everything following it on that line is as associated as such. The next line is the item description. My issue is this: Is there a good regex expression that would be able to get each item individually, both item codes and item description? I have a one idea but I am not sure how efficient it is.
1), I know I have to use a loop, and by checking for “BG” in the line, I am able to get each item code and related details just fine. I figure I can continue to split the string into the necessary values that I need. However, this doesn’t get the item description, which is information that I need.
This is the first time I’ve posted on here, but I have lurked on here a bunch of times to help solve other issues I’ve had. I appreciate any and all feedback from more experienced developers! Thank you so much!