I am unable to read and extract data from pdf file

Vikas.Jain · April 25, 2017, 10:54am

Firstly, Welcome to UiPath community.

As per your current use case, if you are using OCR you might not get 100% accuracy, the results varies and this is due to limitations of OCR.

File is not being read properly row by row and also after reading pdf it is converting some letters into special characters

You will get the entire file into a string and then you can split it on the basis of system.environment.newline and store it in an array and then read the array line by line. Here again due to document quality and OCR limitations the execution might not give 100% accurate results so you need to play with scale and different OCR engines and ensure a decent(good) quality pdf to be read.

2)In Few Fields data is not coming sequentially in text file after reading pdf
Data may not come sequentially but there will be some pattern which you can identify and then extract the data out of it, for instance, If you want to extract Invoice Number however the Invoice Number is in second line and after that you are getting “Date” in that case you need to first find the index of “Invoice Number” and then extract data between “Invoice Number” and “Date”.

So while extracting data from text file is getting problem, data is not properly extracted in that case.
Are you using Read text File activity or is it via OCR?
If Read text file, please gives us a sample file and we will test if there are any issues.
If via OCR, then point 1 holds the same for this.

Happy Designing!

Regards,
V

Topic		Replies	Views
I am unable to read and extract data from pdf file Help	2	1383	March 31, 2018
How to extract form values or editable text from PDF files? Help	3	4295	November 21, 2018
How to extract data from multiple pdf Academy Feedback studio	6	4597	September 18, 2019
Extract data from pdf document Help pdf , activities , question	18	1620	February 3, 2020
Scenario pdf data extraction Help	7	883	October 24, 2019