How to get Required text from(.txt,.doc.,etc)files


#1

How to get the essential text only from the whole (.txt,doc.,etc)files and save it into the csv file?


#2

Hello Pratheesh,
Sure your idea will work.First your create one sample workflow and upload it here.


#3

Here the workflow is not necessary…I am asking the idea for how to get only required text from the whole file


#4

use OCR method its the most reliable one…


#5

Hi @PratheeshKG,

In order to extract essential text (partial), you need to do string manipulation over the Output of particular text file.

It can be reading the contents line by line and extracting on the basis of Index, etc

Regards,
V


#6

How to do string manipulation?


#7

Hi,

Please find the below link which will help you get started :

https://msdn.microsoft.com/en-us/library/aa903372(v=vs.71).aspx

https://msdn.microsoft.com/en-us/library/dd789093.aspx

https://www.tutorialspoint.com/vb.net/vb.net_strings.htm

Regards,
V


#8

NF2261765128405.Invoice.pdf (138.5 KB)
i am trying to scrap this pdf file and i need whole text from this.But i am not able to scrape the entire text from this file.I think in this pdf file.,text are placed inside image format.


#9

In that case you need to use “Read PDF with OCR” activity to extract the pdf file into string.

Regards,
V


#10

yes i tried with that…but i cant able to extract whole data from that pdf file


#11


#12

@Vikas.Jain
You tried with that pdf?


#13

Hi @PratheeshKG,

Attached is a xaml file for your reference, Refine the output as per your needs.
PDF Extraction.zip (100.6 KB)

Regards,
V


#14

@Vikas.Jain
Getting empty output file.No data extracted from that pdf file.


#15

Get the content of the text file in a string.
Use string manipulation or Regular expression to extract the required data.
try this regex: ((.*?)(?=))
replace and with the keywords from the content of the text file in order to extract the data between them.


#16

start process activity- path of the file.
Ctrl +A and Ctrl+C when the file opens.
Get from clipboard to store the text in variable. then extract the required text
if file has text within images, try using ocr … or use the ocr of the acrobat application