How to read Highlighted or ticked data extract from PDF

divya.17290 · November 30, 2022, 7:23am

Hello all

I need to extract the higlighted or ticked the data from Specified PDF.

Can anybody please help to do it?

alan.prakash · November 30, 2022, 7:25am

use read pdftext activity and then write it into the text file then we can extract the data using regex

divya.17290 · November 30, 2022, 7:28am

I need to take higlighted cell of Description and ICD10 from Second image
Acq Keratosis paimaris et dystrophy L85.1
Mycotic nails,multiple B3.5 and so n on…

It may vary to other documents

alan.prakash · November 30, 2022, 7:32am

can u provide the sample pdf file

divya.17290 · November 30, 2022, 7:33am

I have pasted sample file in the question

divya.17290 · November 30, 2022, 7:43am

Can anybody please help me how to do it?

alan.prakash · November 30, 2022, 8:46am

if possible can u post sample pdf file,above posted file is image file

supermanPunch · November 30, 2022, 9:07am

Hi @divya.17290 ,

When working with data extraction from documents, we would firstly need to understand whether the document is going to be Digital or Scanned. In your case, need to know whether it is a Digital PDF or a Scanned PDF. Based on this, we could move forward with the appropriate suggestions.

Also, would need to understand what is the second image that you have provided ? Is it from the application ?

If your document is scanned and you would need to extract only the ticked data, then you would need to use Document Understanding where you would also need to train the DU model by labelling these datasets.

Or you could try alternative ways to get the data in the form of Digital document maybe by understanding how the document is being generated. Then we should be able to extracted the needed from the digital document.

Let us know your thoughts on the suggestions provided.

divya.17290 · November 30, 2022, 9:22am

It will be a PDF file, the second one

divya.17290 · November 30, 2022, 9:24am

A document contains multiple pages and each page is belongs to a different account number the enter the data per account number in one application.

From the page , i should extract highlighted cell (Right to Left).

supermanPunch · November 30, 2022, 9:25am

@divya.17290 ,

It does seem to be a digital PDF, but we cannot confirm from the image. Could you confirm on this and let us also know if there are Checkboxes in the PDF.

Secondly, What is the purpose of the First Image ? Are we supposed to check the first Image and extract the ticked data from the Second PDF ? Is that the normal process or is there a condition being applied to extract data from the PDF in the second image ?

Do provide us more details on the normal process steps in order for us to understand better.

divya.17290 · November 30, 2022, 9:44am

Regarding first image is currently using in Manual Process and second image is new one which i got development work.

I have attached sample pdf file. There are 3 table in one page.
Regarding First table–> i need to extract Right to Left data which is beside of Highlighted cell( like Circle)
Ex: Array={L85.1 , L6B35.1M20.22,E10.42,L85.3}

Second table – >I need to extract highlighted cell CPT and highlighted cell Modifier
Ex : CPT = 99213 , M2 =25

Third Table–> I need to extract Left to Right Highlighted cell data.
Ex: CPT = 10060 M1=LT
CPT= 11056 M3=50…

TEMP.pdf (115.9 KB)

divya.17290 · November 30, 2022, 12:04pm

Can anybody suggest me how to do it?

divya.17290 · December 1, 2022, 5:20am

I have attached PDF, can you suggest me?

alan.prakash · December 1, 2022, 5:41am

You have to use document understanding let me check this if i got the solution i will post here

divya.17290 · December 1, 2022, 12:53pm

Yes thank you.

divya.17290 · December 1, 2022, 12:55pm

I have tried with Document understanding concept but i got Object reference error while use Digitialize document activity and i have raised this another post How to fix Object reference error in Digitize document

divya.17290 · December 1, 2022, 5:32am

Hi all

I need to extract shadded circle data from Digital PDF, can anybody suggest me how to do it?

sangeethaneelavannan1 · December 1, 2022, 5:50am

Hi @divya.17290

Read PDF OCR activity can be used

Rahul_Unnikrishnan · December 1, 2022, 6:12am

Hello @divya.17290

For digital pdf you have to use the Read Pdf with OCR activity. You can try changing the ocr and see the accuracy.

If still not working , share a sample file here.

Thanks

Topic		Replies	Views
Extract pdf specific data Help pdf , activities , data_scraping , string , question	4	4245	November 27, 2019
How to extract data from digitize pdf Studio studio , question , activities_panel	4	45	March 28, 2025
Get Highlighted Text from Scanned PDF Image Learning Hub studio	8	1482	February 20, 2020
I want to extract specific data in Scanned pdf file Activities ocr , activities , question	6	247	April 27, 2024
Extract all data with text and tables from different pdf and store it in different excel files Studio	1	704	March 9, 2022

How to read Highlighted or ticked data extract from PDF

Related topics