Extract information from a PDF

cristian_urrego · May 9, 2019, 2:32pm

Hi guys

I am working on a project in which I have several PDF files that are in image format and I have to extract certain data that is not in order, I managed to make a loop that reads the file with OCR and passes it to a notebook what I can not to do is that from that notebook I can find a word and that I copy what follows example.

NIT 123
I look for the Word Nit and that brings me the 123 the nuemro will always change the word not

please give me a help how to do this thanks

DanielMitchell · May 9, 2019, 2:55pm

You can loop through all of the lines in the text file with a For Each activity.
The TypeArgument should be String and the Values should be File.ReadAllLines(filepath)

Then inside the loop you can do an If statement and check whether the current line contains “NIT”. If it does you can split the line by spaces and grab the last item, which should be your number.

cristian_urrego · May 9, 2019, 3:17pm

Hola @DanielMitchell , thank you for your prompt response. I did what you tell me, check the lines of the file but can not find the word NIT

I did it this way

DanielMitchell · May 9, 2019, 3:20pm

Let’s say item is equal to “NIT 123”.

item=“NIT” fails because item isn’t equal to NIT.
Instead, do item.Contains(“NIT”)

cristian_urrego · May 9, 2019, 3:27pm

@DanielMitchell

Works well you find the word Sorry I’m a little new in Uipath you help me know how I could divide the line by spaces so that I take the last element

lakshman · May 9, 2019, 3:34pm

@cristian_urrego

Str = “Robotics Process Automation”

Str.split(" ".TocharArray)(0) - Robotics

Str.split(" ".TocharArray)(1) - Process

Str.split(" ".TocharArray)(2) - Automation

DanielMitchell · May 9, 2019, 3:41pm

@cristian_urrego you can refer to @lakshman’s answer. The String Split method splits a single string up into an array of strings. You can then loop through them or process them however you want.

For your specific case, if the line is “NIT 123” then you can do
item.Split(" ".ToCharArray)(1) to split the line into “NIT” and “123” and grab the second item. (Indexes start at 0 so index 1 is second piece).

cristian_urrego · May 9, 2019, 3:43pm

@DanielMitchell It works perfect. Thank you very much for your help you are very crack

cristian_urrego · May 9, 2019, 3:43pm

Hola @lakshman

Muchas gracias me funciona perfecto

system · May 12, 2019, 3:44pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get info from PDF Help activities , question	5	1196	October 1, 2020
PDF Keyword search Help	4	1332	December 15, 2019
How to find all values after a specific word Help uiautomation , pdf , activities , data_scraping , duplicate , question	18	4273	January 19, 2020
Looping pdf files in the folder and extracting particular data from each pdf file Help	9	3236	October 17, 2019
Find word in a PDF (Compare PDF) Help	10	3690	March 10, 2017

Most Active Users - Yesterday
Anil_G
ashokkarale
jinal.shah
Gautham_Pattabiraman
postwick
chandreshsinh.jadeja
vrdabberu
Ajay_Mishra
sven.wullum1
Vyshnavi_Nalumachu
More details...

Extract information from a PDF

Related Topics