I have some web URLs there a large number of data is stored which I need to extract in a Word document. I am doing this using the “Get Text” activity to get the text in a variable and then append the value in the word document using the “Append Text” activity but after completing the sequence I found that there are lots of data available in that web page which are not extracted in my word document. I use the “Get full text” activity as well but the issue still occurs. for reference I attach on of the Url here “Résumé des caractéristiques du produit - ABACAVIR ARROW 300 mg, comprimé pelliculé sécable - Base de données publique des médicaments” I need the information which are stored in point 1,2,3,4 … 12.
Thanks in advance.
Can you please share how you developed the workflow? For example, which selector you used, etc…
Hi @Dipshankar_Dasgupta
But I can able to extract the whole data from the web page…
Can you please explain your proper requirement , Do you want extract only the points from 1 to 12 except rest of the data.?
Thanks,
Shaik Najeer.
I want the whole data from point 1 to point 12 the entire data .can you please share your sequence . Thank you
I used the Get Text activity with your selector and I was able to extract all data. Can you please explain which parts are not being extracted when you run your workflow?
The data is extracted till page 11 and middle number 5 point. this is issue I am facing right now.
Thank you for helping.
This seems to be a limitation of the Append Text activity…? I wonder if the string you’re writing to the Word file is too long.
A possible workaround could be to split your string in smaller parts (either using Split or using Substring) and looping.
Here’s an example (just for testing purpose!!) that will write the entire content:
Where:
1- This is a new list of strings
2- This is set to textExtractedFromWebPage.split("."C).ToList
, where textExtractedFromWebPage
is the text you obtain from the Get Text activity, using the selector you already have. Note that I used the “.” symbol as input for the Split function just for the sake of testing!!! If you use this approach you’d need to find a better character.
3- This is a loop to go through all the smaller substrings of your full text
4- This is the Append activity, that will be done in a loop. Note that I’ve deselected the option to add a new line before the text is added.
However, note that with your approach (which uses the GetText activity on the entire page) you’ll lose structured content like this:
Thanks a lot. let me try this and will reach you if facing issue further. Thank you
It seems that the way you are do the sequene that is good and also I can able to run that in my case . But there is another issue like it take a little bit much time to complete a single document where I have more than 10k data to complete . Can you suggest me any other process by which I can perform the same task but in less time. Thank again for the previous help.
As I stated, the sequence was just an example to show a different approach. Instead of splitting by a specific char (as I did), you could split your text into by a fixed amount of chars (for example, you can use a very large number, so that you have a smaller amount of “Append” operations to use), and then loop through the list (same as in my example).
Can I do that by giving any large number inside the first bracket of “textExtractedFromWebPage.split(”.“C).ToList” . like this → textExtractedFromWebPage.split(2500).ToList
Also can you tell me please is this work can be done using recording feature of UiPath . Like first I select the data from point 1 to 12 then paste it in a document. can I do this using web recording does it faster than what we are doing now ?
Thank you for helping
No, the split function does not accept a fixed length as an argument, you’ll have to find a different way to achieve the same result (most likely a for loop with some assign operations). Also, I don’t think you’ll be able to do this by recording the steps, as it’s a manipulation done in the backend, so unfortunately your only option is to code the solution yourself.
it would be interesting to know what’s the max length of test accepted by the Append activity for Word, but I couldn’t find any official info on the UiPath documentation.
Also can you please help me , I want to extract the table format data as it is in document . Can you give me any solution for that ? thank you
You cannot just do it with Get Text.
Is there any reason why you need that info in Word? Can’t you print the doc to PDF instead?
yes, I can do that but in that case, the pdf also looks like the webpage because I print the webpage directly. where I only need the point-wise data .not the extra things.
Then you can retrieve all the text, save it as a .txt file, then open it with Word and save it as .docx.
However you won’t be able to extract the tables the way you want them to be extracted.
Otherwise try to find a way to select the text you want on the UI (you’ll have to play with clicks and shift buttons), then save it to clipboard, open a Word file and paste the content from the clipboard. Not sure if it will work though.
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.