Read range of lines in PDF

Hey Community i have a small problem can you please help me ! :slight_smile:
I try to read a DATA from pdf but i want just to read a range of lines for example I want to read from line 9 to line 27 and not all the PDF documents any suggestions please ??? thank you :slight_smile:

hy,

scraping does not work. Use a get text. it returns a String. It is then necessary to rebuild a table (uses Split / substring)

not magic … maybe a solution exists here : https://gallery.uipath.com/packages?q=pdf

1 Like

Hi,

You can use read PDF text and use regular expressions for the text you want to extract.

1 Like

Hey thank you bro @anil5 But for example I want to extract a DATA that start by the word “ABCD” and ends with the word “EFGH” how can I ignore all the rest and pick just this part of text Thank you so much for your help! :slight_smile:

We can do that using regular expressions,we have to use anchors to get the required text, if possible provide the text.

But I’m extracting the all DATA with PDF reader and I save it into a text file after i want to delet all the DATA and keep just a specific range!

After extracting all the data from PDF reader you will get the output as text, so you have to find some anchors or constants before or after the range you want to extract and based on that you will be getting the required range.

1 Like

okay bro let’s suppose That the constant for the begining of the Data is “Table2” and the constant for the end of the Data is “Source” so how can I define the regExp that keep just that DATA between the two constants ? and thank youuuuu soooooo much brother for your help :slight_smile:

Hi,

(?<=Table2)(And here come your pattern)(?=Source)

(?<=Table2) this specifies extract whatever is after this and ignore the constant
(?=Source) - Specifies extract whatever is before the constant and ignore the constant
(And here comes the pattern) - Here you have to specify whether you data contain letters, words, special symbols, line breaks, spaces , tabs

1 Like

I used the Matches activitie and i gave the folowing regexp " (?<=Table2)(.)(?=Source)"but it doesn’t work… I don’t know what to do know @anil5

Hi,

Use System.text.regularexpression.match method in assign statement

If possible provide the text, so i can help you with that.


Please look in to the above link

Okay here is the texte :
" 14_February 2019
FREIGHT
SULPHUR –FREIGHT_INDICATIONS – /T //// Route Cargo_size_(t) Latest_rate Jubail–WC_India 35,000 13-14 Middle_East–EC_India 30,000 17-18 Middle_East–China 35,000 17-18 Jubail–Morocco 35,000 15-17 Vancouver–China 50-60,000 18-20 //// NB: All_rates_indicated_are_based_on_averages. Exact_rates_will_depend_on_port_loading_and_discharge_rates EXCHANGE_RATES EXCHANGE_RATES –LOCAL_CURRENCY:US
14/02/19 07/02/19 14/02/18"

I want to extract the DATA between"///" (the /// don’t exist in the data I use it just to show you what I want to extract) Thank you brother !:slight_smile:

Hi,

Use this regex mentioned in the below screenshot

image

in the below website

Paste the text in Test string.

1 Like

Hi @anil5 when i tried to write the result of matches activity it gaves me System.Linq.Enumerable+d__95`1[System.Text.RegularExpressions.Match]

any suggestions?
Thanks :slight_smile:

Import namespace, system.text.regularexpressions

And use a assign statement and use

Regex.match(input string, pattern)

Hi brother @anil5, there is a miss understanding between us because the text that i sent to you doesn’t contains the “////” I just used it to show u the DATA that I want to extract so the RegExp is not working can you please help me with the right regexp THANK YOU SOOO MUCH BRO :slight_smile: you are the best

Take a constant word in place of //// which will be constant mostly and replace with ////

If you could send me the original text, I could help you

Okay bro here is the original text:
" 14_February 2019
FREIGHT
SULPHUR –FREIGHT_INDICATIONS – /T Route Cargo_size_(t) Latest_rate Jubail–WC_India 35,000 13-14 Middle_East–EC_India 30,000 17-18 Middle_East–China 35,000 17-18 Jubail–Morocco 35,000 15-17 Vancouver–China 50-60,000 18-20 NB: All_rates_indicated_are_based_on_averages. Exact_rates_will_depend_on_port_loading_and_discharge_rates EXCHANGE_RATES EXCHANGE_RATES –LOCAL_CURRENCY:US
14/02/19 07/02/19 14/02/18
€ Euro 0.88446 0.87825 0.81086
ÂŁ Pound_Sterling 0.77599 0.77209 0.72078
Turkish_Lira 5.26062 5.21117 3.80117
Rupee_India 70.6558 71.5121 64.2084
Real_Brazil 3.72660 3.68432 3.29192
China_RMB 6.75851 6.74310 6.33495
SULPHUR –RECENT_AND_ACTIVE_TENDERS
Country_Holder_Type ’000 t_Tender_close_Shipments_Remarks
Qatar_Muntajat_Granular_(sales) 35_Mid_Sep_Sep/Oct_$168 fob
Libya_NOC_Granular_(sales) 8 20_Sep_End_Sep_Low-mid_$130s"
and I want to extract the folowing part from that text:
“Route Cargo_size_(t) Latest_rate
Jubail–WC_India 35,000 13-14
Middle_East–EC_India 30,000 17-18
Middle_East–China 35,000 17-18
Jubail–Morocco 35,000 15-17
Vancouver–China 50-60,000 18-20”

So in this you want from which line to which line.