Trouble reading a PDF

At the beginning of my process I read an Excel file and store as a DT. One of the fields stored is the company number. For this example, I’ll use company 416. When I read the PDF I am getting the Total for each company in the PD.

Totals

I only need the total for the current company I’m dealing with (in this case the total for company 416 is 4,550.00.

I’ve been trying for hours to find a way to filter that but I haven’t been able to. Do you know what I’m missing? The section of the process is the sequence named “PAD.”
Oper_FS No Commission V2.xaml (145.2 KB)

Hi @rgardner5564

It was pretty difficult to understand the workflow, I add a flag to determine if the company number has been founded or not, also, if the Pad_Amount has been set (Is not null or empty) it exits the Do While activity

I hope that it works

Regards,Oper_FS No Commission V2.xaml (146.3 KB)

Andres

Hi @AndresTarazona,

Unfortunately that did not work. I was trying to solve this in a previous post: Find "Total" in PDF - #10 by rgardner5564

It may be easier to see what the process was supposed to do with these files and that post.
FindTotalInPdf_RG.xaml (12.8 KB)
ACH Control Docs_20200409_0106RLET8SMIL46.pdf (23.0 KB)

Here is the output for my process. It is finding the total for each company (which I need) but I need to figure out how to only return the total for the company I am working with. So, in this case, I want the total of $6,166.66 but I need to know it’s coming from company 414.

@msan was originally helping me out here so maybe he has an idea as well?

04/21/2020 08:58:24 => [Debug] Test started for activity: PAD
04/21/2020 08:58:26 => [Info] TA Recon execution started
04/21/2020 08:59:09 => [Info] Line= ACH/PAD/ACH Fee Paid Control Doc Print Date 04-21-2020
04/21/2020 08:59:09 => [Info] Line= Control Report Feeds Recon
04/21/2020 08:59:10 => [Info] Line= Co Fund PAD Sameday ACH ACH Fee Paid Sameday Late Sameday ACH Exception Sameday ACH ACH Fee Paid
04/21/2020 08:59:10 => [Info] Line= 405 031 500.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:11 => [Info] Line= TOTAL 500.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:11 => [Info] TOTAL: 500.00
04/21/2020 08:59:12 => [Info] Line= 414 001 500.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:12 => [Info] Line= 414 002 1,350.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:13 => [Info] Line= 414 003 200.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:13 => [Info] Line= 414 004 2,916.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:14 => [Info] Line= 414 005 1,200.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:15 => [Info] Line= TOTAL 6,166.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04/21/2020 08:59:15 => [Info] TOTAL: 6,166.66

@rgardner5564

Sorry, I was not able look at it this week-end. Here is an example that should do the work.

Another approach would be to parse the PDF file and store all (Co, PAD) into a DataTable then work with this DataTable for your lookups.

Oper_FS No Commission V2 (1).xaml (160.5 KB)

Thank you @msan. I was able to get it to work in your workflow. However, when I add it to mine it didn’t find any value:
Output

I think I am not using this portion correctly:
Forum Picture

I am using this Excel file to get my company number variables:
BOT Master Directory_Test.xlsx (8.8 KB)

So I would have expected it to find the total for companies 420 and 440 since they are in the PDF.
ACH Control Docs_20200409_0106RLET8SMIL46.pdf (23.0 KB)

Sorry, for all the questions on this. This is the most complex process I’ve ever built and this one piece is the only part I cannot get figured out!

@rgardner5564

No problem, don’t be sorry

  • You can let Company_Number variable alone: it is used to keep track of the previous’ line customerNumber
  • You must set Searched_Company_Number to the value you’re looking for.

I made a workflow from this specific section (I didn’t test it, beware). Please try it with InvokeWorkflowFile Activity. Pass as argument:

  • in_CompanyNumber (Company Number you’re looking for)
  • in_PadText (The text version of your PDF)

You’ll get your amount (as String) from out_Amount.

FlowchartGetPadAmount.xaml (18.8 KB)

1 Like

@msan I seriously can’t thank you enough for all of your help with this. I don’t know where you live but I wish you were close by so I could take you out for a beer and thank you in person. If you ever find yourself in Kansas City, Missouri you’ve got yourself a friend, a place to stay and as many beers as you can drink!

1 Like

@rgardner5564

Thanks, I appreciate it :beer: :beers: We have some good beverage in France too :wink: . Feel free to reach me whenever you want

I don’t know why I didn’t think about a simple regex. regex101: build, test, and debug regex

First import System.Text.RegularExpressions to avoid überlong expressions.

Assuming you have PdfAsText, you can either search for the amount for a specific CompanyNumber or you can scrap all CompanyNumber/Amount.

Common

  • Assign (RegexOptions)
    Multine = RegexOptions.Multiline

  • Assign (RegexOptions)
    Singleline = RegexOptions.Singleline

  • Assign (String)
    Template = "^\s*(?<company>{0})\b.+?TOTAL\s+(?<amount>[\d,\.]+)"

If looking for a CompanyNumber:

  • Assign (String)
    CompanyNumber = "420"

  • Assign (String)
    Pattern = String.Format(Template, CompanyNumber)

  • Assign (String)
    Amount = Regex.Match(PdfAsText, Pattern, Multiline Or Singleline)

If scrapping all CompanyNumber/Amount:

  • Assign (String)
    CompanyNumber = "\d+"

  • Assign (String)
    Pattern = String.Format(Template, CompanyNumber)

  • Assign (MatchCollection)
    Matches = Regex.Matches(PdfAsText, Pattern, Multiline Or Singleline)

  • ForEach Match in Matches (TypeArgument = System.Text.RegularExpressions.Match)

    • Assign (String)
      CompanyNumber = Match.Groups("company").ToString

    • Assign (String)
      Amount = Match.Groups("amount").ToString

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.