ANSHUL
(MANTRI)
February 14, 2019, 5:32am
1
Hi All
Im stuck at extracting text from PDF, i need to extract table from Annex B, sometime this text can be ANNEX B or Annex B.
I tried element exists, retry scope, ocr google. However it fails.
attached the sample PDF.
Looking for the table at Annex B page as below:
any suggestions please.sample_3.pdf (32.8 KB)
1 Like
anil5
(Anil Kumar Bandam)
February 14, 2019, 7:19am
2
Hi,
Use Read pdf text and the output of that paste in the regex builder and use the below regex.
Use this regex : (?<=ANNEX B |Annex B)\s+(\w{3}\s\w{4}\s\w{3}\s\w{4}\s\w{3}\s\w{4}\s\w{3}\s\w{4}\s\w{3}\s\w{4}\r\n){1,}
1 Like
AkshaySandhu
(AkshaySingh Sandhu)
February 14, 2019, 9:59am
3
Hello @ANSHUL kindly confirm if you have Adobe Acrobat installed in your machine…
cause I do have a solution that will work for you…
1 Like
ANSHUL
(MANTRI)
February 14, 2019, 10:10am
4
yes i do have it installed, appreciate any help.
1 Like
AkshaySandhu
(AkshaySingh Sandhu)
February 14, 2019, 12:38pm
5
Hello @ANSHUL
kindly test the attached excel file containing macro to convert your pdf file to excel file…Split-Merge_pdf.zip (25.3 KB)
1 Like
ANSHUL
(MANTRI)
February 14, 2019, 1:32pm
6
thanks but do i have to park my pdf inside split-Merge_pdf folder ?
getting this when i copy the file in that folder and run.
1 Like
AkshaySandhu
(AkshaySingh Sandhu)
February 14, 2019, 2:09pm
7
yes @ANSHUL please create input folder as in excel file…
then keep the pdf file in input folder and then run the macro
1 Like
AkshaySandhu
(AkshaySingh Sandhu)
February 14, 2019, 2:29pm
9
let me check from my side
Update: working fine for me with pdf file that you have attached…
1 Like
ANSHUL
(MANTRI)
February 14, 2019, 2:37pm
10
So far i’ve found these steps helpful:
send start process open pdf
send hot key ctrl+0
send hot key ctrl + home
send hot key ctrl+f
type activity- Annex B[k(enter)]
click the search box
send hot key end
send hot key enter
This gives me the page to extract the table( last occurrence of Annex B)
now im trying to extract the table for that page using some relation.
if you have any fast way of next steps please share your thoughts.
Regards
1 Like
AkshaySandhu
(AkshaySingh Sandhu)
February 14, 2019, 2:55pm
11
sadly I do not have any other option which can extract your table with 100% accuracy from PDF…
If you or anyone from your team have VBA knowledge then I would recommend tweaking the given excel macro file to make it work…
1 Like
ANSHUL
(MANTRI)
February 14, 2019, 3:07pm
12
ok u share whatever VBA you have fixed.
im working on dynamic selector to scrap the table.
1 Like
AkshaySandhu
(AkshaySingh Sandhu)
February 14, 2019, 3:20pm
13
i didn’t fixed anything in that excel file… it is writing as it is for me
1 Like