Text to excel- Data manipulation using Regex(VB code)

Hi,

I have extracted data from PDF to text and now trying to get the data into excel, with vb code only using regex,

attaching the extracted txt file and expected output file

8-103406 INVOICE_OmniPage_.txt (1.2 KB) Book1.xlsx (9.3 KB)

@kalyanDev @Cristian_Negulescu

@indrajit.shah - did you tried Document Understaing to extract from the details from your invoice?

Please share the pdf also, or use preserve format to true while reading the pdf and share the updated text file.

Hi @prasath17, DU is out of the scope, I have to achieve through regex and all.

I had attached the text extracted from pdf.

Thank you in advance.

Hello @indrajit.shah ,
So I think you need only Table from the middle. You know my video :

In your case you just connect your data after Regex directly to invoke Code from my structure (as an input argument) the rest remain the same.

Dim strtmp As String
    strtmp = strin.Substring(strin.IndexOf("TOTAL(USD)") + 11, strin.LastIndexOf("Total") - strin.IndexOf("TOTAL(USD)") - 11).Trim
    strtmp = strtmp.Replace(" INPP", "INPP")
    strtmp = strtmp.Replace(" ", "|")
    strout = "col1|col2|col3|col4|col5|col6|col7|col8|col9" + Environment.NewLine + strtmp

Thanks,
Cristian Negulescu

1 Like

@indrajit.shah - okay thanks… I see we have to write lot of regex pattern to achieve the output you are looking for…
For Ex: Below regex pattern covers 3 of your values but again you have to omit the others…

1 Like

Yes, Have to write loads of Regex and that’s why I need help buddy.

Thank you @Cristian_Negulescu , I am trying write the whole code in VB just to replicate and then will do on the UiPath tool.

1 Like

Hi @Cristian_Negulescu

Can you help me with below extraction

INVOICE NO. / INVOICE DATE 8000103406 / 20.01.2021

LC NUMBER /CONT# 2700016007
FOB VALUE (USD) : 1,532.10

ADJUSTMENT - I (ADD FREIGHT + INS 0.00
ADJUSTIMER -2 (ADJ AGAINST CREDIT 0.00
TOTAL INVOICE VALUE (USD) FOB: 1,532.10

@indrajit.shah this is not a table from my point of view so you will use just substring and Split like this:

'FOB VALUE (USD)
    Dim strtmp As String
    strtmp = strin.Substring(strin.IndexOf("(USD) :") + 7, strin.LastIndexOf("ADJUSTMENT") - strin.IndexOf("(USD) :") - 8).Trim

    'INVOICE NO. / INVOICE DATE
    Dim strtmp As String
    strtmp = strin.Substring(strin.IndexOf("INVOICE DATE") + 12, strin.LastIndexOf("LC") - strin.IndexOf("INVOICE DATE") - 13).Trim
    innumber = strtmp.Split("/")(0).Trim
    inDate = strtmp.Split("/")(1).Trim
1 Like