Extracting Specific data from PDF getting an error

Hello, I want to extract specific data from pdf file like “seller” information, I am using below syntax to get the data from pdf. But I am unable to pull the data.

sampletext.Substring(sampletext.IndexOf(“Seller:”)+“Seller::”.Length).Split(" ".ToCharArray)(1)

Here sampletext is string variable used in “READ PDF with OCR” activity to store the entire data

Thanks in advance for the help… :slight_smile:

Hi @NikhilRPA

You must share the Screenshot/File for the input data you are trying to extract so which makes us to identify the root cause for your error

And also show that what error you are getting

Try doing as shown below :-

sampletext.Substring(sampletext.IndexOf(“Seller:”)+“Seller:”.Length).Split(" "c)(1)

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

Hello @Pratik_Wavhal Thanks for response.

Please find the text file attachedTest1.xaml (6.3 KB) ExcelCult Inventory Management_Orchestrator.pdf (66.4 KB)

Hi @NikhilRPA

Below is the working workflow for the same :-
Challenge 08 _ Inventory Management Process.zip (1.0 MB)

Also You will get all Excelcult Challenges Running workflow on below github profile :-

Mark as solution and like it :slight_smile:

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

Hi @Pratik_Wavhal

Actually, I Already have the path of the solution for the challenge from Github. I am trying to understand here how to extract the specific information from PDF like the example above which I attached.

Can you help me with that syntax for String manipulation to get “Seller information” from pdf file

Hi @NikhilRPA

In this post i have provided you the solution by correcting your Function :-

Have you tried that ??

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

Hi @NikhilRPA

Below is the updated workflow :-
MainPratik.xaml (7.3 KB)

Output :-

Function used :-

System.Text.RegularExpressions.Regex.Replace(readPDF.Substring(readPDF.IndexOf("Seller")+"Seller:".Length).Split(Convert.ToChar(vblf))(0),"\s+"," ").Split(" "c)(2)

Mark as solution and like it :slight_smile:

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

2 Likes

Hi @Pratik_Wavhal

Thanks for the sample xaml file. I have attached mine after changes done but I am getting an empty value. I have another question, Can we use same syntax for any specific data to retrieve from pdf files like any invoice numbers or file id’s from pdf?

Thanks :slight_smile:Test1.xaml (6.4 KB)

Hi @NikhilRPA

Actually the thing you are doing is by using Get OCR Text something so the position of the text may vary thats why you are not getting the Value in variable and getting Empty Value.

If you try reading the PDF by Read PDF Text Activity then you will definetly get the output.

Difference between Get OCR Text and Read PDF Text is :-

  1. Get OCR Text :- It is used for Screen Scrapping when you wont find the Position of any Text.
  2. Read PDF Text :- Reading all the Text within PDF along with Preserving Format and not also.

Yes. You can use the same syntax if the data pattern/format within PDF is same like the ExcelCult PDF on which you are working on now. Then it will definetly work.

But if it is not then you might have to make changes in the syntax if the Data pattern/format is different.

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.