I use a read PDF and than I want to have a particular part of that PDF instead of everything. However, het text I want to filter is not everytime the same, it is like:
PDF 1:
Street: Streetname1
City: City1
X Y
1 2
PDF2:
Street: Streetname2
City: City2
X Y
3 4
Thus, I only want to have the table from X & Y (returning 1,2 & 3,4). How can I manipulate the string if I don’t know the streetname and city in advance (thus unknown, thus using wildcards?)
Try @vvaidya’s idea of using Regex. You will need to do something like System.Text.RegularExpressions.Regex.Matches(text, pattern as string) to create an array, then you can use String.Join() to combine it back together if you want.
You can also try string manipulation like this:
Assuming “X Y” is consistent, you can split by that and the newline.
For example,
text in the above example is the text variable from the PDF.
Basically, it creates an array by first splitting by “X Y” then using .Select to pull only the first line from each item. Then, using String.Join() to combine the string back together.
The Matches activity and Regex.Matches are the same thing.
If you take the Pattern from your Matches activity and use it in the Regex, you should get the Regex method working.
So, I had to adjust your initial Assign activity to include Newline characters. I may have set that up for you but just went on how I understood your data.
Secondly, I added a LINQ to select and replace “X Y” from the Matches.
You can use either Matches or the Regex; both are practically identical.
Attached is your Example after I made a few adjustments.