OCR TO EXCEL

@Srini84
Hi Bro, I would like to extract the words (Ex:TOYOTA COROLLA 2012-ID:125) available in the PDF file to Excel file and by matching the details already available in the Vehicle sheet i need to separate the Vechicle ID (Ex: 125 from (TOYOTA COROLLA 2012-ID:125). and i need to match the vehicle ID in Codes sheet and get the price into Vehicle Sheet. Hope you can help me.

Auto Sales.pdf (205.0 KB) RPA.xlsx (17.0 KB)

Hi @naveenkumar.sathiyamoorth,
Have you tried anything so far? I guess people could help you but need an engagement from you as well :wink:

Hi Pablito,

I am trying so hard to get the output, but i am unable to get it. Maybe i am not following the correct procedure.

I pulled the letters from PDF, I created build data table and added assign activities.
I know am missing to add the PDF output to data table, but i dont know how to add it when i try something it shows my PDF output is string and datatable variable type is datatable, i tried with Add data row function there also am facing the same issue

am trying to move the PDF output to Excel

Please advise if you could help , Attached my file

Naveen_OCR_V.08.xaml (16.0 KB)

Let’s start from the basics :slight_smile: Instead of trying to find direct solution try to build anything what works firstly.
You said that you are able to get text from PDF and assign it to variable. This is great start point. Instead of trying to build datatable use Write Range and push it to Excel. If you will be able to do this than you can grab whole excel table using Read Range and you have datatable already :slight_smile:

Actually i alreaday tried write range, there also am getting same issue
The output from PDF is string, when i do a write range its a data table… so there is an error showing , i posted this question yesterday itself

image

Excuse me for my mistake. I wanted to say “Write Cell” activity.

THanks much,

Now the data avaiable like below, But i need each name in each row and the value available after “ID:” must be moved to next column. Is there any function i can use?

Sorry to bother you so much, but only in this step i am facing some issues where
i need help

I tried below, I got the output from PDF → using write text file PDF output updated in text file and saved → then reading the same text file → trying to move the data to excel using read range & Write cell It ran successfully but i cant see an output in excel

am i doing in the right way? Please advise

Naveen_OCR_V.09.xaml (13.4 KB)

If you are able to get this text from PDF then I suggest to work with Regex to extract text between ID and Date what let you have just names.

Hello Navin,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

2:00 GitHub free code for all the files
2:20 Logic of general workflow
4:40 File 1 simple PDF
9:50 File 2 PDF with a column with multiple lines
20:10 File 3 PDF with a column with multiple words ON the LAST column
27:00 File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 File 6 PDF with a column with multiple lines
39:10 File 8 simple PDF
42:15 File 9 PDF with multiple spaces on that need to be correct
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
55:50 File 11 simple PDF with protection empty Cells
58:35 File 12 Big PDF with an empty line and Empty columns and partial total
1:02:25 File 13 PDF with multiple columns that have multiple words and hard to define a rule
1:10:15 File 15 PDF with multiple columns that have multiple lines
1:12:50 File 17 simple PDF remove spaces from headers also remove space from Data
1:16:05 File 18 simple PDF
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
1:22:10 File 20 PDF with multiple columns that have multiple lines
1:25:00 File 21 PDF with empty columns and subtotal

Code:

Thanks,
Cristian Negulescu

same i dont know how to split it