PDF reading in bianary

I am attempting to read of a .pdf file. I am using the activities UiPath.PDF.Activites Read PDF Text. It pulls in the entire pdf, and some of it I can find. Here is the raw text I get from the pdf.

" PETERBILT MOTORS COMPANY\r\n\r\nSOLD TO: TLG PETERBILT-LOUISVILLE DEALER # P615 DATE: 5-11-2020\r\n 4415 HAMBURG PIKE\r\n JEFFERSON IN 47130\r\n\r\n GLASSCOCK TRANSPORT INC PRE ID:GLASSCOCK SLEEPER 2020\r\n\r\nMODEL: ONE (1) PETERBILT 0005671\r\n\r\nVEHICLE IDENT NO.: 1XPCDP9X2MD745565 ENGINE NO.: Y231808\r\n\r\n \0\0PRICE\0EFFECTIVE\0DATE:\005-11-20\r\n\r\n \0LIST\0PRICE\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$\0\0228,121.00\r\n\r\n \0ADJUSTED\0LIST\0PRICE\0FOB\0FACTORY\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$\0\0228,121.00\r\n \0\0\0\0STANDARD\0DEALER\0DISCOUNT\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\052,468.00\r\n\r\n \0\0DEALER\0NET\0PRICE\0FOB\0FACTORY\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$\0\0175,653.00\r\n \0\0\0\0\0\0\0\0COMPETITIVE\0ALLOWANCE\0\0\001491605\0@\038.18\0%\0\0\0\0\0\067,064.00-\r\n \0MARKETING\0PROGRAMS,\0PROMOTIONS\0AND\0SERVICE\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0775.00\r\n \0TOTAL\0SURCHARGE/OPTIONS\0NOT\0SUBJECT\0TO\0DISC\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0600.00-\r\n\r\n \0\0ADJUSTED\0DEALER\0NET\0PRICE\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$\0\0108,764.00\r\n \0\0\0\0\0\0\0\0PREPAID\0FREIGHT\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\02,475.00\r\n\r\n \0TOTAL\0\0DEALER\0NET\0PRICE\0FOB\0FACTORY\0(US\0$)\0\0\0\0\0\0\0\0\0$\0\0111,239.00\r\n\r\n \0TOTAL\0INVOICE\0AMOUNT\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0(US\0$)\0\0\0\0\0\0\0\0$\0\0111,239.00\r\n\r\n \0TERMS:\0NET\0\015\0DAYS\0FROM\0INVOICE\0DATE.\r\n\r\n \0\0\0\0\0\0\0\0\0\0\0PLEASE\0FORWARD\0YOUR\0PAYMENT\0TO:\0PACCAR\0INC\r\n \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0P.O.\0BOX\01281\r\n \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0BELLEVUE\0\0\0\0\0\0\0\0\0\0\0\0\0\0WA\0\098009-1281\r\n\r\n \0\0\0\0\0\0THIS\0VEHICLE\0IS\0FINANCED\0BY\0AND\0SUBJECT\0TO\0A\0PURCHASE\r\n \0\0\0\0\0\0MONEY\0SECURITY\0INTEREST\0IN\0FAVOR\0OF\0PACCAR\0FINANCIAL\0CORP.\r\n\r\n \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0FLOORING\0REQUEST\0DATE:\005/26/20\r\n\r\n In connection with a potential retail purchase of this unit by PACCAR Financial Corp., notice is hereby given that the rights, but not the obligations, of \r\n PACCAR Financial Corp. to purchase the Asset have been assigned to PFC Exchange, LLC pursuant to an agreement between PACCAR Financial Corp. and INVOICE\r\n PFC Exchange, LLC."

The first part, like the VIN number I was able to get with
trim(mid(PDFOutput, PDFOutput.IndexOf("VEHICLE IDENT NO.: ") + 20, PDFOutput.IndexOf(“ENGINE NO.:”) - (PDFOutput.IndexOf("VEHICLE IDENT NO.: ") + 20)))

but when I try to search for anything else past this, the MID function doesn’t work. How do I turn this into just a text string, and not a bianary output?

Hello,

You have a legit string with some NULL characters sadly. you can clean it with some regex. Please first import System.Text.RegularExpressions. I’m not using my UiPath computer so I didn’t check before posting the following:

Remove all \0 starting a line
myText = Regex.Replace(myText, "^(\\0)+", "", RegexOptions.Multiline)

Reduc all \0 sequences to a single space
myText = Regex.Replace(myText, "(\\0)+", " ", RegexOptions.Multiline)

2 Likes

Thank you so much for the response. :slight_smile: Amazingly, I JUST figured this out, moments before you responded and used “Replace” with the RegEx builder.

thank you so much. Your solution may be faster then mine.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.