I have extracted some data from PDFs and the way the PDF is structured, the extractor cannot recognize the two separate datapoints. How do I split this column into two separate ones? Example data:
10,570.00Installment
3,241.00 Final Premium Audit
556.00Refund: Final Audit
5,118.00Installment
5,596.00Transfer
5,756.00Installment
for the numbers try this…remaining would be the remaining string
System.Text.RegularExpressions.Regex.Matches(str,"[\d,.]+")
Use a loop and set type argument to System.Text.RegularExpressions.Regex.Match and use item.value to get each value
For getting second values
System.Text.RegularExpressions.Regex.Replace(str,"[\d,.]+","").Split({Environment.NewLine},StringSplitOptions.RemoveEmptyEntries)
This will give you array of remaining values
cheers
Hi
You can split it with this expression:
Regex.Split("10,570.00Installment","(?<=\d)(?= ?[A-Za-z])")
Regards,
Konrad
A typical work receip is like:
Use a regex as described by @Anil_G
Use the pattern within a regex.Replace and insert a delimeter char after the match e.g.
then feed this marked string to generate DataTable activity and configure the parsing (or use a LINQ against a prepared DataTable)
Outcome: Datatable with 2 cols
@ppr @Anil_G these fields are in a table that has other columns as well. Could you please attach a workflow so I can visualize this solution and mark it solved please?
If that is the case then add a column to the datatable.(Numbervalues columnname)
then use for each row in datatable…
inside for use two assigns
currentrow("Numbervalues columnname ") = System.Text.RegularExpressions.Regex.Match(currentrow("CombinedColumnName").ToString,"[\d,.]+").Value
currentrow("CombinedColumnName") = System.Text.RegularExpressions.Regex.Replace(currentrow("Combinedcolumnname ").ToString,"[\d,.]+","")
First will give the number second will give the next part in another column
Hope this helps
Cheers
@Anil_G can you attach a sample workflow? I am having trouble. Thanks!
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.