Read range of lines in PDF

Mehdi_El_Aissi · February 22, 2019, 8:19am

Hey Community i have a small problem can you please help me !
I try to read a DATA from pdf but i want just to read a range of lines for example I want to read from line 9 to line 27 and not all the PDF documents any suggestions please ??? thank you

decalajoraire · February 22, 2019, 9:15am

hy,

scraping does not work. Use a get text. it returns a String. It is then necessary to rebuild a table (uses Split / substring)

not magic … maybe a solution exists here : RPA Listings - Collections, Integration Packs | UiPath Marketplace

anil5 · February 22, 2019, 9:21am

Hi,

You can use read PDF text and use regular expressions for the text you want to extract.

Mehdi_El_Aissi · February 22, 2019, 9:52am

Hey thank you bro @anil5 But for example I want to extract a DATA that start by the word “ABCD” and ends with the word “EFGH” how can I ignore all the rest and pick just this part of text Thank you so much for your help!

anil5 · February 22, 2019, 9:58am

We can do that using regular expressions,we have to use anchors to get the required text, if possible provide the text.

Mehdi_El_Aissi · February 22, 2019, 10:11am

But I’m extracting the all DATA with PDF reader and I save it into a text file after i want to delet all the DATA and keep just a specific range!

anil5 · February 22, 2019, 10:14am

After extracting all the data from PDF reader you will get the output as text, so you have to find some anchors or constants before or after the range you want to extract and based on that you will be getting the required range.

Mehdi_El_Aissi · February 22, 2019, 10:19am

okay bro let’s suppose That the constant for the begining of the Data is “Table2” and the constant for the end of the Data is “Source” so how can I define the regExp that keep just that DATA between the two constants ? and thank youuuuu soooooo much brother for your help

anil5 · February 22, 2019, 10:30am

Hi,

(?<=Table2)(And here come your pattern)(?=Source)

(?<=Table2) this specifies extract whatever is after this and ignore the constant
(?=Source) - Specifies extract whatever is before the constant and ignore the constant
(And here comes the pattern) - Here you have to specify whether you data contain letters, words, special symbols, line breaks, spaces , tabs

Mehdi_El_Aissi · February 22, 2019, 10:44am

I used the Matches activitie and i gave the folowing regexp " (?<=Table2)(.)(?=Source)"but it doesn’t work… I don’t know what to do know @anil5

anil5 · February 22, 2019, 10:46am

Hi,

Use System.text.regularexpression.match method in assign statement

If possible provide the text, so i can help you with that.

Itachi · February 22, 2019, 10:51am

Please look in to the above link

Mehdi_El_Aissi · February 22, 2019, 10:53am

Okay here is the texte :
" 14_February 2019
FREIGHT
SULPHUR –FREIGHT_INDICATIONS – $/T
////
Route Cargo_size_(t) Latest_rate
Jubail–WC_India 35,000 13-14
Middle_East–EC_India 30,000 17-18
Middle_East–China 35,000 17-18
Jubail–Morocco 35,000 15-17
Vancouver–China 50-60,000 18-20
////
NB: All_rates_indicated_are_based_on_averages. Exact_rates_will_depend_on_port_loading_and_discharge_rates
EXCHANGE_RATES
EXCHANGE_RATES –LOCAL_CURRENCY:US$
14/02/19 07/02/19 14/02/18"

I want to extract the DATA between"///" (the /// don’t exist in the data I use it just to show you what I want to extract) Thank you brother !

anil5 · February 22, 2019, 10:59am

Hi,

Use this regex mentioned in the below screenshot

in the below website

Paste the text in Test string.

Mehdi_El_Aissi · February 22, 2019, 12:42pm

Hi @anil5 when i tried to write the result of matches activity it gaves me System.Linq.Enumerable+d__95`1[System.Text.RegularExpressions.Match]

any suggestions?
Thanks

anil5 · February 22, 2019, 1:13pm

Import namespace, system.text.regularexpressions

And use a assign statement and use

Regex.match(input string, pattern)

Mehdi_El_Aissi · February 22, 2019, 1:20pm

Hi brother @anil5, there is a miss understanding between us because the text that i sent to you doesn’t contains the “////” I just used it to show u the DATA that I want to extract so the RegExp is not working can you please help me with the right regexp THANK YOU SOOO MUCH BRO you are the best

anil5 · February 22, 2019, 1:26pm

Take a constant word in place of //// which will be constant mostly and replace with ////

If you could send me the original text, I could help you

Mehdi_El_Aissi · February 22, 2019, 1:30pm

Okay bro here is the original text:
" 14_February 2019
FREIGHT
SULPHUR –FREIGHT_INDICATIONS – $/T
Route Cargo_size_(t) Latest_rate
Jubail–WC_India 35,000 13-14
Middle_East–EC_India 30,000 17-18
Middle_East–China 35,000 17-18
Jubail–Morocco 35,000 15-17
Vancouver–China 50-60,000 18-20
NB: All_rates_indicated_are_based_on_averages. Exact_rates_will_depend_on_port_loading_and_discharge_rates
EXCHANGE_RATES
EXCHANGE_RATES –LOCAL_CURRENCY:US$
14/02/19 07/02/19 14/02/18
€ Euro 0.88446 0.87825 0.81086
£ Pound_Sterling 0.77599 0.77209 0.72078
Turkish_Lira 5.26062 5.21117 3.80117
Rupee_India 70.6558 71.5121 64.2084
Real_Brazil 3.72660 3.68432 3.29192
China_RMB 6.75851 6.74310 6.33495
SULPHUR –RECENT_AND_ACTIVE_TENDERS
Country_Holder_Type ’000 t_Tender_close_Shipments_Remarks
Qatar_Muntajat_Granular_(sales) 35_Mid_Sep_Sep/Oct_$168 fob
Libya_NOC_Granular_(sales) 8 20_Sep_End_Sep_Low-mid_$130s"
and I want to extract the folowing part from that text:
“Route Cargo_size_(t) Latest_rate
Jubail–WC_India 35,000 13-14
Middle_East–EC_India 30,000 17-18
Middle_East–China 35,000 17-18
Jubail–Morocco 35,000 15-17
Vancouver–China 50-60,000 18-20”

anil5 · February 22, 2019, 1:32pm

So in this you want from which line to which line.

Topic		Replies	Views
How to extract the data from pdf between two names Activities pdf	10	280	December 6, 2023
How can i extract many information from pdf from page 16 for example plz Studio excel , pdf , studio	8	1338	October 15, 2020
How to use Regex based extractor activity Activities uiautomation , activities , question	4	1168	October 16, 2020
How to Take between the Matches? Academic Alliance academic_alliance , question	9	1157	December 14, 2021
Pdf data scrapping Help	16	1866	April 16, 2019

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Read range of lines in PDF

Related Topics