Extract the particular field data from pdf

Input:
Detailed audit report

  1. INTERESTING FACTS ABOUT FLOWERS

Findings
The world’s smallest flower is the watermeal, which measures just 0.1mm across.
. ABC
. Xyz

Root cause
The world’s largest flower is the Rafflesia arnoldii, which can grow up to 3 feet across.

Recommendation
The most expensive flower is the Juliet rose, which sells for $15,800 per stem.

  1. The Benefits of Beautiful Flowers

Findings
Flowers not only serve to delight us with their beauty and provide numerous health benefits, but they’re also essential to the environment and our ecosystem.

Recommendation
For a bouquet of beautiful flowers that’s already done for you, check out our Fresh Celebrations FruitFlowers® Bouquets.
. ABC
. XYZ

Root cause
They help improve air quality, reduce erosion, and maintain hearty soil.

Output:

Detailed audit report

  1. INTERESTING FACTS ABOUT FLOWERS

Findings = The world’s smallest flower is the watermeal, which measures just 0.1mm across.
. ABC
. Xyz

Root cause = The world’s largest flower is the Rafflesia arnoldii, which can grow up to 3 feet across.

Recommendation = The most expensive flower is the Juliet rose, which sells for $15,800 per stem.

And so on continues for 2,3,4,…untill the number points exists.
Note: Detailed audit report,number 1,2,…
And finding,root cause, recommendation are common fields in pdf

Have multiple pdf and in that Want to extract data from pdf points wise as shown in output.
Please help me to do so…

Duplicate

The code is working but some of the unwanted data is populating…and it include the header and footer value also.

@yashashwini2322

Are the headers and footers constant?

if so you can try using replace function to replace them

if not is there a sample available?

cheers

Header and footer is not constant…

The highlighted are the header and footer,
only" Confidential Human + Intuitive + One Bank" is constant

@yashashwini2322

If it is three lines always then we can use regex to replace the line above and below the constant value line you provided using below line…can you try using this after extraction please and check

System.Text.RegularExpressions.Regex.Replace(str,".*\nConfidential Human \+ Intuitive.*\n.*","")

str is the variable where we need to replace the variable in which you need to replace

Example implementation:
System.Text.RegularExpressions.Regex.Replace(currentText.Split({"Finding:","Root Cause:","Risk Assessment:"},Stringsplitoptions.None)(1),".*\nConfidential Human \+ Intuitive.*\n.*","")

image

cheers

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.