How to Extract Data from Word File

i have image file in pdf format i convert that image file in word file using ocr software now i want to extract some field data like Order Number,Invoice Date,Address,Firm Name and Table please help me i have attached the word file

1 Like

We can use word package and read the file which gives us string value as output
Then using string manipulation with Regex or split method we can get the value we want

Cheers @sachin_sharma

Hi i follow the step as you advice but my out put is coming in wrong format please look

put this in text file and use that text to write Regex Pattern on it!

get that output in file .txt and than apply regex on that @sachin_sharma

but how to apply regex for Firm Name and Address Please look the First image .
Thanks for reply

get that in notepad and post it here we will see! @sachin_sharma

please see this png file

after reading pdf text or image the content may change or it differs
thats the reason we are asking you to show the sample!
anyway you can extract firm name like this (?<=Firm Name).*(?=Read Office:)
and address like this (?<=Address:).*

hi the first image i attached that file is in word format i have attached the file in image format because there is no option to attache the word file in forum

convert it to string and use the above regex!

Did we try with the method suggested here

Cheers @sachin_sharma

Hi @sachin_sharma,

(?<=Address)\n.* use this to get Name value and put it into variable
(?<variable.Trim)\n.* you will get office name.

hi thanks for reply but first see the the output format of word document

this is the output i get from word file now suggest what can i do

hi @ mitesh_parmar
please see the output file i attache

No buddy
Actually this method converts the word to excel and from excel we can access the row value we want
Did we try that method

Cheers @sachin_sharma

Hi @ Palaniyappan
i try using send hot key ctrl+g ,ctrl+c but hoe i get the output for ctrl+c command

1 Like

Fine use a START PROCESS activity and pass the filepath of word document as input to Filename property

Now use a SEND HOT KEY activity and mention the key as ctrl+a
Then another hot key with key as ctrl+c

after copying inside the word document use a START PROCESS activity where pass the filepath of new excel file you have created to the property FileName
This will open the excel file and now use a SEND HOT KEY ACTIVITY with key as ctrl+v

Cheers @sachin_sharma

No output file is attached ? @sachin_sharma please attached it again.

HI @ mitesh_parmar please see the message box image i attached