Help with project I'm working on

I’m trying to extract data from this pdf french frequency list (begins on page 18). It lists the top 5000 french words by use. The words are listed like this: 1 je…2 tu…etc. I want to take all these words and put them in an excel spreadsheet. However, I don’t know what function to use to extract this information. Please help me.

Hi @Trevor_Guo

Welcome to the Community

You can use following methods to extract data from your PDF

  1. Try to use Document Understanding Framework
  1. if you wan go into further you can use AI Fabric ML model

  2. otherwise old method you can read the document using PDF activities and do identification using Regex

Thank you, Document Understanding Framework with the Regex extractor was exactly what I needed. However, I have a new problem, which is that for each field in my document type, I am only extracting a single word.

I need to extract 5000 words. I hope that I do not need to create 5000 fields. Please advise ASAP.

Hi Trevor

Can you provide a sample, output and tell us about the pattern / what you are trying to match?