Extract the Text before a "word as determiner"

Hello, I have a problem to extract the Text before a “word as determiner” example:
“Tributaria”

I have a large text conversion resulting from a ReadPDF activity from .pdf to .txt in a variable “Read_PDF_To_txt” I converted this text to a List (Strings) but using for example the following Syntax does not work for me:

Variable_name(#).Split(“.”.ToCharArray)(#) ← doesn’t always work because the text shifts out of position, so we decided to use the following function:

Read_PDF_To_txt.Substring(Read_PDF_To_txt.IndexOf(“Tributaria”))

The problem with this syntax is that it only brings me the last thing after the word “Tributary”, I need to bring what is in that Text Row just before the word “Tributaria”

Text extracted from PDF (example):

IDENTIFICACIÓN DEL BENEFICIARIO
Nombre Contribuyente País de Origen No. de Identificación
LESSER SUPPLY CHAIN PTE SINGAPORE Tributaria Datos de Inscripción
202001197K
Calle, Avenida, Carretera
Apartado Zona Postal Teléfono Ciudad Departamento o País
Singapore Estado Singapur

Object:
-Extract this text:
LESSER SUPPLY CHAIN PTE SINGAPORE

(without using regex or ( )Split.( .ToCharArray() )

Hi,

How about the following?

strBefore = strData.Substring(0,strData.IndexOf("SINGAPORE"))
strBefore = strBefore.Substring(strBefore.LastIndexOf(vblf)).TrimStart

Then

strBefore+"SINGAPORE"

If there is possibility target keyword does not exist, please check strData.IndexOf(“SINGAPORE”) is positive or zero, in advance.

FYI, regex:

System.Text.RegularExpressions.Regex.Match(strData,".*SINGAPORE").Value

Regards,

1 Like

WAOOO, brother you have helped me with almost 4 days of research on how I could solve it, even ChatGPT couldn’t HAHAHAHHAHA, the options I had to convert to List and from there obtain by row index but that didn’t work because the text can change position up or down, you are a genius, thank you very much, I send you greetings from Panama (Central America).

image
At the end, change “SINGAPURE” to “Tax”, since that is the word that is repeated in the PDF format and what changes is the name of the company that is before “tax”, but using the example you gave me worked wonders
image

Solution :smiley:
image

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.