Filter string using wildcards from a read PDF

pdf
studio
variable
string

#1

Dear all,

I use a read PDF and than I want to have a particular part of that PDF instead of everything. However, het text I want to filter is not everytime the same, it is like:

PDF 1:
Street: Streetname1
City: City1

X Y
1 2

PDF2:
Street: Streetname2
City: City2

X Y
3 4

Thus, I only want to have the table from X & Y (returning 1,2 & 3,4). How can I manipulate the string if I don’t know the streetname and city in advance (thus unknown, thus using wildcards?)

Sincerely,
Robert


#2

Try this regex in matches activity.

X Y[\r\n]+([^\r\n]+)


#3

Hello,

Try @vvaidya’s idea of using Regex. You will need to do something like System.Text.RegularExpressions.Regex.Matches(text, pattern as string) to create an array, then you can use String.Join() to combine it back together if you want.


You can also try string manipulation like this:
Assuming “X Y” is consistent, you can split by that and the newline.
For example,

String.Join(System.Environment.Newline, text.Split({"X Y"},System.StringSplitOptions.RemoveEmptyEntries).Skip(1).Select(Function(x) x.ToString.Split(System.Environment.Newline(0))(0) ).ToArray)

text in the above example is the text variable from the PDF.
Basically, it creates an array by first splitting by “X Y” then using .Select to pull only the first line from each item. Then, using String.Join() to combine the string back together.

Regards.


#4

If you want to convert XY (then table,now string) back to Table. One way (with existing activities) is :

  • Replace space by comma in result (String.Replace)
  • Write to Text File
  • Read CSV (above textfile) with output as DataTable.

I wonder why Read CSV cannot take CSV String ?


#6

Thanks! Both of you! However, I’m still finding out how to get this right. How do I get the values? And how does this regex thing work?

Sincerely,
Robert

Example1.xaml (8.2 KB)


Read Invoice values from Text file
#7

Hi Robert,

The Matches activity and Regex.Matches are the same thing.
If you take the Pattern from your Matches activity and use it in the Regex, you should get the Regex method working.
image


Example1.xaml (8.7 KB)

So, I had to adjust your initial Assign activity to include Newline characters. I may have set that up for you but just went on how I understood your data.

Secondly, I added a LINQ to select and replace “X Y” from the Matches.

You can use either Matches or the Regex; both are practically identical.

Attached is your Example after I made a few adjustments.

Regards.


#8

Example1.xaml (7.7 KB)