I want to extract only the parts I want from the data


An example of the conversion from pdf to text is shown above.
I just want to extract text from REF to REMARK in text where the same rule repeats, is there any activity I can use?

I don’t use Generate data table.
It’s even more complicated if I take this as an example.
I
PDF.txt (68.9 KB)
attach the example text.

@sssim4567

If you need only from ref to remark just use split activity or Regex

Split can be on REF and Remark or Regex would be this (?<=REF)[\d\D]*(?=Remark)

str.Split({"REF","Remark"},StringSplitOptions.None)(1)

Assumption ref and remark comes only once

cheers

The response is appreciated but not resolved.

The results are as follows

What I want is to remove the yellow marked part as shown in the picture below.

The yellow marked part keeps repeating, so if I do it in the way you told me, there will be only one left.



The form continues to repeat as shown above.

@sssim4567

Then please use this…this will give you all the required values as array…each element containing each set of the required string

str.Split({"REF","REMARK"},StringSplitOptions.None).Where(function(x,i) not (i mod 2) = 0).ToArray

String array is the output

cheers

I did as you told me.
I was notified to set the variable in Object format. I changed the object type to string type and do write line, but it comes out as System.String.

@sssim4567

The variable should be of array(of String) please change accordingly

also if you need to write to text file again use as below

String.Join(Environment.NewLine,abc)

cheers

It says it cannot be assigned to ARRAY format, so only OBJECT format is selected.


@sssim4567

Can you please delete the assign activity and re add it …I am using same without any error

cheers

It has been resolved. Thank you very much.
Have a nice day

1 Like

Thank you for your reply and I have one more question.

As you told me, I can see that I use it as below to extract only the parts I want
PDF.Split({“REF”,“REMARK”},StringSplitOptions.None).Where(function(x,i) not (i mod 2) = 0).ToArray

What you taught me is to leave only the data in the middle
In addition to that, how do I get rid of the unnecessary data that is between the necessary data?

I think it’s possible to apply what you told me
I’d appreciate it if you could tell me because it’s hard.

Unnecessary data is the same as the yellow mark below.

From INVOICE NO. to TON.
This also changes only the details and repeats the form as well.

@sssim4567

You can split on Description I guess and use

str.Split({"DESCRIPTION"},StringSplitOptions.None).Where(function(x,i) not (i mod 2) = 0).ToArray

Now str will the be the string.Join… output

cheers

It doesn’t work…Only the head was removed

I just want to get rid of the yellow mark

I guess what you told me to do
I think the bottom part is
str.Split({“DESCRIPTION”}

str.Split({“INVOICE NO.”,“.TON”}
I think it’s right to be

I think the expression below needs to be changed
Do you have any ideas?
“Where(function(x,i) not (i mod 2) = 0).ToArray”

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.