I’m trying to extract data from this pdf french frequency list (begins on page 18). It lists the top 5000 french words by use. The words are listed like this: 1 je…2 tu…etc. I want to take all these words and put them in an excel spreadsheet. However, I don’t know what function to use to extract this information. Please help me.
Welcome to the Community
You can use following methods to extract data from your PDF
- Try to use Document Understanding Framework
if you wan go into further you can use AI Fabric ML model
otherwise old method you can read the document using PDF activities and do identification using Regex
Thank you, Document Understanding Framework with the Regex extractor was exactly what I needed. However, I have a new problem, which is that for each field in my document type, I am only extracting a single word.
Can you provide a sample, output and tell us about the pattern / what you are trying to match?