Extracting Data from PDF's

Hello everyone,

I’m having a bit of trouble extracting specific data in a PDF file. In the screenshots (and notepad) can be seen that I need to extract information such as “Hoog Laag Bed”, “Matras” and “Zitkussen”, at least everything that comes afterOp te halen hulpmiddelen” and beforewww.Blankout.nl” should come. I was thinking of splitting from “Op te halen hulpmiddelen” to “www.” (because there are two formats with two different email addresses). But after that I have no clue. Does anyone have any idea how to can extract data?

Their can be an total of 6 data extracted. So in pic 1 it is 3 data, in pic 2 it is 2 data and so on.

Any help is welcome! :slight_smile:

Nummer 3.txt (450 Bytes)
Nummer 2.txt (447 Bytes)
Nummer 4.txt (447 Bytes)
Nummer 1.txt (443 Bytes)

1 Like

Hi @s.altindag ,
Have you tried separating with regex after reading the data with OCR?

Hi @s.altindag

Could you please try below regex

Thanks

Hi @s.altindag

can you please try below split function on your pdf text variable.

Split(Split(txtString,“Op te halen hulpmiddelen”)(1).ToString,“www”)(0).ToString.Trim

Thank you
Debakanta

Hi Debakanta,

Can i see your workflow?

Kind regards
Sefa

Hi Boopathi.M

Thank It works, but how do i take care, in this case, that:
StrHulpmiddel1 = Hoog laag bed
StrHulpmiddel2 = Matras
StrHulpmiddel3 = Zitkussen
StrHulpmiddel4 = nothing
StrHulpmiddel5 = nothing
StrHulpmiddel6 = nothing

This is the part where i am struggling with and also taking care where StrHulpmiddel4/5/6 is not giving me an fault message because there is nothing in it.

Take care,
Sefa

and any idea how i take care, in this case, that:
StrHulpmiddel1 = Hoog laag bed
StrHulpmiddel2 = Matras
StrHulpmiddel3 = Zitkussen
StrHulpmiddel4 = nothing
StrHulpmiddel5 = nothing
StrHulpmiddel6 = nothing

This is the part where i am struggling with and also taking care where StrHulpmiddel4/5/6 is not giving me an fault message because there is nothing in it (in this one).

Hi,

check this workflow
SplitString.xaml (4.6 KB)

thank you
debakanta

Hi,

Any idea how i can take care what i said above?

Kind regards,
Sefa

Hi,

you can split the output text with enviroment.newline and remove the empty lines or nothing part.

Thank you,
Debakanta

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.