How to extract multiple data from PDF

Hi All,

I have PDF file which has many pages , i contain data like names , numbers , etc…

so i want exreact nunber only from all page how i can do that ?

could any one help me

Thanks

Hey @coder

I think regex would be best

2 Likes

Hey @Jacqui_M

yah m i read that but I did not understand how to do this

1 Like

Hello @coder,

you can use graphic regex builder with this string manipulation methods, you need to convert PDF to text for this to work so you need to use activities for that.


First you need to download dependency from packages for PDF

Then you need to use this activity:


This activity can be used only on PDF which are generated from applications, scanned documents need to use this activity:

That activity will turn this PDF into string format and then you can use use is match or matches which best suits your need.

Cheers,
Dino

1 Like

Thank you @dfilipovic

so i do not need to use for each or if condition ?

only regex ?

@coder,
in this case no, because you won’t pass through text one row at the time, this will check string and return all values which match your regex combination.
This regex builder is excellent because you can paste text which is extracted from PDF and then play with regex combinations to get result which you need.

This is stupid example but you can test it out. :slight_smile:

1 Like

Thank you @dfilipovic , i will try that :slight_smile:

1 Like

If that helps, can you please mark it as a solution so I get some points. :slight_smile:

1 Like

Sure i will do :innocent:

@dfilipovic it is not show with me

HI @coder

Can you provide the input and exact output for the same then giving you exact regex will be easy and if also can you provide pdf itself then it will be best.

Also till that have a look to the below mega post by @Steven_McKeering :-

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

2 Likes

Hi @Pratik_Wavhal

Sure ,

Laura Jefferson

890 Main St., Pythonville LA 29947

Maria Johnson
884 High St., Braavos ME 43597
Michael Arnold
249 Elm St., Quahog OR 90938

Mary Patterson / 988-555-6112 / 956 Park St., Valyria CT 81541 / marypatterson@bogusemail.com
Jane Stuart / 623-555-3006 / 983 Oak St., Old-town RI 15445 / janestuart@bogusemail.com

I try to do for the name and address

Hi @coder

May i know the output from this ??

Means what is the name and address here in your text that you want to extract ??

And also May i know is der anything constant everytime always ??

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

Hi @Pratik_Wavhal

There are many manes and address so i want execrate all it is same format for the example i write

it is clear ?

Hi @coder

Have a look to the below SS and let me know these are the names you are trying to extract or not :-

image

And also may i know the address means EmailId are you trying to extract or area address are you saying ??

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

Hey @Pratik_Wavhal

yes Thank you that what i try to do , and the address no i mean the address this

890 Main St., Pythonville LA 29947

884 High St., Braavos ME 43597

Hi @coder

So above i provided you the Regex for Names.

Now below is the regex for Address :-

image

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

2 Likes

Hi @coder

Combine Regex for both is as below :-

image

Mark as solution and like it if this helps you :slight_smile:

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

1 Like

Thank you so much @Pratik_Wavhal

yes it is help me ,

I cant see the sign of Mack as solution what is the problem ?

Hi @coder

When you mark as a solution it will look like as shown below to me :-

image

Actually the thing is i haven’t created any query/issue post till now so never marked as solution to anyone’s post yet :sweat_smile:

It might der at the end image

Or somewhere to my post here may be the option can be available i showed below :-

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer: