I’m trying to extract data from a PDF-file using Get Text activity and store it in a string/generic variable. When the data contains dashes (this one “-”), they automaticly dissapears.
“Todays date is 2020-05-27.” becomes “Todays date is 20200527.”
Any takes on this? I think it’s related to Adobe Acrobat.
Please use Read pdf text activity, and then use regex to extract data from it.
@gustav.wendefors, hi and welcome to the community.
Is there a reason why you chose to use the Get Text activity instead of using the read pdf activity?
if there isn’t. i suggest you use that instead, and use the links posted above, and if you need help with regex to get the specific text, feel free to let us know.
Ah, Read PDF are better overall, thanks. BUT, the issue remains with that activity to, it removes my dashes in the string variable…
My text looks like this when I use Read PDF:
"Skicka in ditt ärendeNamn
Gustav Wendefors E-post
Centerpartiet Välj ditt ärende
Avsägelse Ärende gällande
**Detta är en text. Med datum, 20200630. **
**En ny hård rad. **
Ett mellanrum mellan raderna. "
I want to extract the information in bold into seperate variables. How do I do that? The information in bold are different every time, but the other information (not in bold) are the same every time.
I was trying this out but I couldn’t because my studio is set up to run English text so if i copy your text over to my studio it doest recognise it as the correct format to be a string. And i think if i translated the text to english that would defeat the purpose right?
I can translate it, it’s okay.
"Send your issueName
Gustav Wendefors E-mail
email@example.com Party affiliation
Centerpartiet Choose your issue
Avsägelse Issue regarding
This is a text. With date, 20200630.
A new hard line break.
An empty row between the lines."
I am replying a bit late, but I hope this can help someone.
Have you checked if your dashes are “real” dashes/hyphen and not soft-hyphen?
In my case, the hyphens in my pdf file were soft-hyphens. I fixed this by replacing them with real hyphens.