Please help me with the pdf data extraction using regex,
in the attached pdf the header is in 2 rows and (line items is also in 2 rows for single line)
I want to extract the data into excel as per given example below… please help PRODUCT ID.pdf (53.3 KB)
Yes, we can extract the required values using Regular Expression but not able to open your PDF and getting message it is either corrupted or unstructured.
Also, the regex approach will be appropriate only if the rows is always limited in your input or Product ID file. If you get more number of rows, it would be best to check an option to convert the data in table format into excel file.
Thank you so much @Boopathi.M for your response. I will attach the pdf file once again. It will be really helpful if you let me know if I can extract from it. And thank you for the suggestion, I will look into the link you have send to me. PRODUCT ID.pdf (53.3 KB)
Step 1 - Install Package UiPath.PDF.Activities and use Read PDF in text file and save the output
Step2 - In that output text file, extracted text removing headers using this regex pattern - “(?<=Extended Cost)[\s\S]*” This will extract all the data after header and it is greedy i.e extract everything after headers… if there is some constant value at the end of row or some fixed pattern to identify we can make it as non greedy
Step3 - Now whatever data you have - use regex to identify the values - used the below pattern to identify the values using Named Group and it will be saved as collection
Step 4 - Used foreach to extract each group name and adding the values into a datarow which is built before reading data.
I am trying your solution with my data, but im unable to create groups in Reg Ex builder, Ex. PRODUCT ID1(this is my product id DCD797C3 ), PRODUCT ID2(1001311467). Can you please help me on how to create the group of PRODUCT ID.