Problem with data scraping from PDF file

uiautomation
studio

#1

Hello!
I use activity “Get Text” and try to get text from specific element in pdf file, but element detecting not correctly.
I need to get only numbers without characters.


How can i do this?


#2

Hi :slight_smile:

If the element you are trying to get contains both the letters and the numbers, then an easy workaround would be the following.

If the element always contains three letters you can get the substring.
yourVariable = NHH 7717027908
yourSubstring = yourVariable.toString.SubString(4, 10)

If there is always letters but not always the same amount, you can get the index og the first space, and do the substring from there.
Again if the amount of numbers are not always the same, you can get the total length of the String, and end it there.


#3

thanks!
And if this Element will be dynamic (count of symbols more 10 or less 10) ?
what then needs to be done?


#4

Hi @Olegik_Super,

yourVariable = NHH 7717027908
yourSubstring = yourVariable.toString.Replace("NHH","")

Else you can use regex expression also to get the numbers alone

System.Text.RegularExpressions.Regex.Replace(yourVariable ,"([^0-9])",string.Empty)

Regards,
Arivu


#5

I see you got a good answer with regex, but if you wanted to use substring later on I’ll answer your question anyway.
If it’s dynamic you can say substring from the index of the first space and then you’ll need to know how long the whole string is, which is theString.toString.count as I recall. So the length of the substring would be the length minus the index of the first space.