assign a dictionary havving key as code and values as list of dates /or string can be not found or NA
Split the pdf text into array of strings : pdfStr , split with space and new line " " and “\n”
iteratrate for lenth of pdfstr
check the string if it match the regex of code
add the code in the key in dictionary
iterate from next string to get values of ditionary
check it if the current string is date
add date in list date
else if it not found or na
add list/string in dictonary values for cureent key i.e code
else if the string is code
update counter
break
find code util code value is ‘ZZZZ’
below is the pseudo code similar to python
dict1={}
for i in range(0,length(pdfstr)):
if(pdfstr[i] = regex(“code”,“pattern”):
dict[key]=arr[i]
for j in range(i+1,len(pdfstr)):
if(arr[j] = regex(“date”,“pattern”):
list1.append(arr[j])
elif(arr[j]=not found or arr[j]=NA):
list1.append(arr[j])
flag=True
elif(arr[j]=regex(“code”,“pattern”):
i=j
break
elif(pdfstr[i] =‘ZZZZ’):
break
Attached sample PDF file and Sample expected excel file
Sample PDF.PDF (46.7 KB)
Sample.xlsx (9.2 KB)