Extracting all headers in a pdf

Funky_Monks · November 16, 2021, 4:56am

Hi, I want to extract all the possible headers which are available in a pdf file and store the headers and their values in an excel file I am new to this can anyone help me with the issue

THIRU_NANI · November 16, 2021, 4:58am

Hi!

if your using native pdf it is easy to get the values from PDF.

https://epsilonai.com/how-to-extract-table-from-pdf-in-uipath

Regards,
NaNi

Funky_Monks · November 16, 2021, 5:16am

@THIRU_NANI Hi nani I checked it and I don’t have any tables in my pdf file, I want to extract the headers in the pdf file, for example if there is a name, date,loan issued, headers are available I want to extract those headers along with their values in an excel sheet

THIRU_NANI · November 16, 2021, 5:19am

IF that pdf is a native pdf use read pdf activity
If that pdf is a scanned pdf use read pdf with ocr. take Tesseract OCR engine to read the pdf.

Regards,
NaNi

Funky_Monks · November 16, 2021, 5:21am

images
If you see the above image it has headers as rebrand, poster series, total,tax, subtotal headers, I want to extract all those headers along with their values in an excel

Funky_Monks · November 16, 2021, 5:23am

I got that nani, my problem is I only know how to get a single value from a pdf not bunch of values at a time

THIRU_NANI · November 16, 2021, 5:29am

In that case you can use Get text activity!

Regards,
NaNi

Boopathi.M · November 16, 2021, 5:33am

Hi @Funky_Monks

Is it possible to attach pdf file…let me try with regex.

Thanks,
Boopathi

Funky_Monks · November 16, 2021, 5:42am

201311_cfpb_kbyo_closing-disclosure.pdf (61.0 KB)
This is the pdf file and I need to extract all the headers in the 5 pages and store it in excel, I know how to get a single data but I never done extracting a bunch of headers at a time

Boopathi.M · November 16, 2021, 6:42am

Hi @Funky_Monks

Regex extraction would not be appropriate one for this extraction as the pdf contains lot of information with multiple pages and are the field headers remain constant in every pdf or changes?

or Please check if this activity helps you

Thanks,
Boopathi

Topic		Replies	Views
How to Extract table from non-native pdf? Studio datatable , excel , uiautomation , orchestrator , activities , studio , question , activities_panel	6	320	October 11, 2023
How to extract data from PDF and save it in excel Studio uiautomation	6	639	February 22, 2023
PDF Table extraction Studio	9	13486	July 15, 2023
PDF-Excel Studio studio , question , activities_panel	11	171	February 16, 2024
Read PDF text for new Invoice template PDF Studio	11	2784	August 26, 2020

Most Active Users - Yesterday
ashokkarale
Anil_G
Yoichi
yangyq10
postwick
chandreshsinh.jadeja
aravindbalineni123
Parvathy
aya
PRASHANT_GABHANE
More details...

Extracting all headers in a pdf

Related Topics