How to scrap data from a word file

Hi, i have a word file, i would like to scrap some specific fields on this file, do you know how i should do that ?

Thanks in advance

Hi @tharuler

Use Word Application Scope and use Get Full text or use get Attribute activity

Thanks
Ashwin S

3 Likes

Hi
i would go with @AshwinS2 but in addition to that
– go to design tab in Studio and click on manage packages and in official tab search as uipath.word and install it
–and once after installing word package in activity panel search as word and use word application scope and pass the file path of word and inside the scope use READ activity and get the output with a variable of type string named str_input
–now using REGEX or string manipulation method like split method we can get the exact value we want

Cheers @tharuler

2 Likes

By using read text im getting all the text in the file, and I only want some fields.
and Get attribute also select all the file

Hi, thanks ill try string manipulation, just one question i can’t use ocr activity on a word file right ?

Hi @tharuler

Based on all the lines in the text you can split the text and you can do StringOutput.Split(“”.ToCharArray)

Thanks
Ashwin S

1 Like

yah we can use but we need to open the word document in foreground to fetch the data…
so compared to that string manipulation would be easy
Cheers @tharuler

im getting this now

Hi @tharuler

is the document file empty if not try to use get attribute or use get full text

Thanks
Ashwin S

Fine
are we using any variable with this sequence that doesnt have any value in it
check all the variables that is used within this scope
Cheers @tharuler

No the document is not empty.
work
I can read the file, and use a message box to display all the text.

But when i try something like that ( wordText.Substring(wordText.IndexOf("Nb de colis: ")+"Nb de colis: ".Length).Split(Environment.NewLine.ToCharArray)(0) ) to retrieve a field in the file i get an error " object reference not set to an instance of an object "

1 Like

I only got two variable
wordText ( retrieve the text of the document )
and subText a variable where i try string manipulation to retrieve only a field ( wordText.Substring(wordText.IndexOf("Nb: ")+"Nb: ".Length).Split(Environment.NewLine.ToCharArray)(0) )

this is the issue as we have nt obtained the value for wordText variable it is initialized in the variable panel right…
so remove that default value for variable1 in the variable panel and use a assign activity next to this SCOPE and assign there with that variable name and this expression
wordText.Substring(wordText.IndexOf("Nb: ")+"Nb: ".Length).Split(Environment.NewLine.ToCharArray)(0) )

Cheers @tharuler

1 Like

Thanks @Palaniyappan it’s work.

Now it’s just that the string manipulation isn’t that great to retrieve the exact value needed

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.