PDFファイルからExcelに文字列転記したい

gorby · November 12, 2023, 10:43am

こんばんは。UiPath Studio EnterPrise edition最新版を1週間後から使う予定の初心者です。
UiPathはまだPCにインストールしていません。

PDFファイルの文字列をExcel転記するWFを開発予定ですが、
初めてUiPathでPDFファイルを操作するため基本的なことを幾つか教えてください。

1.インストールするするパッケージはUiPath.PDF.Activitiesでよろしいでしょうか？
2.PDFファイルの文字列を変数に取得するアクティビティは、Read PDF Textアクティビティでよろしいでしょうか？
3.このアクティビティはPDFの全テキストを変数に取得するのでしょうか？回答がYESの場合、セレクタで指定して開いたPDFファイルの指定箇所の文字列を取得する方法はありませんか？PDFで表示されているのは印刷文字なのですが、Get Textアクティビティで取得できませんか？

Nguyen_Van_Luong1 · November 12, 2023, 12:29pm

こんばんは。 @gorby ,

You can go to '‘manager package’ and find them, some OCR packages may be useful, for Japanese text, I have used the OCR activity for Chinese, Korean, Japanese, it works well.
Absolutely, reading a PDF file or reading with OCR will return a String variable.
You can specify this on a specific page of that file, or extract it using regex.

I use google translate, hope it helps you
Regards,

Anil_G · November 12, 2023, 5:22pm

@gorby

You can use pdf activities no issues
Read pdf can be used to read it and then can use regex to get the values of possible …this is preferred
If 2 is not an option can go with trying with du (document understanding to get the data as needed using ai center
Selectors also can be used but pdf shpuld not be scanned and accessibility option should be enabled in pdf

Cheers

Yoichi · November 12, 2023, 11:28pm

こんにちは

UiPathでPDFファイルを処理するアプローチは大きく分けて３種類が考えられます。
一つはPDFファイルをアプリケーション（Adobe reader やChrome,Edge等）を用いて開き、そのアプリケーションでのセレクターを用いて処理します。この場合必要なものUiPath.UiAutoamtion.Activiites packageになります。
二つ目はUiPath.PDF.Activities packageを使う方法です。この場合PDFファイルのページ抽出等の操作ができる反面、テキスト取得はページ単位でしかできません。必要なテキストを抽出するためには正規表現等を用います。
三つめはUiPath.IntelligentCOR.Activities packageを使う方法です。いわゆるDocumentUnderstandingFrameworkを使う方法になります。こちら抽出器は基本的にCloudのAIエンジンを使うケースが多いので、コスト面含め留意が必要です。

これらを踏まえて

1.インストールするするパッケージはUiPath.PDF.Activitiesでよろしいでしょうか？

選択肢の一つになります。

2.PDFファイルの文字列を変数に取得するアクティビティは、Read PDF Textアクティビティでよろしいでしょうか？
3.このアクティビティはPDFの全テキストを変数に取得するのでしょうか？

画像ではなく文字として埋め込まれているなら、ページ単位で取得できます。

回答がYESの場合、セレクタで指定して開いたPDFファイルの指定箇所の文字列を取得する方法はありませんか？PDFで表示されているのは印刷文字なのですが、Get Textアクティビティで取得できませんか？

GetTextを使用するのであれば上記のUiPath.UiAutaomtion.Activiities pacakgeを使う方法になりますので、UiPath.PDF.Activities packageが不要になります。(PDF packageではできません）

あるいはUiPath.IntelligentOCR.Activities packageでは座標情報も取得できるので、指定位置の文字列を取得することは可能です。ただしForm抽出器を使う場合は、AutoationCloudのAIユニット(エンタープライズでは基本的に有償)が必要になります。

Topic		Replies	Views
PDFのテキスト化された表データをExcelに転記したいフォーラム excel , pdf , studio , studiox	11	6547	March 23, 2021
Pdfデータよりエクセルへ転記する方法についてフォーラム activities	14	10462	May 20, 2018
Need help in making pdf to excel Studio studio , question , activities_panel	2	643	July 9, 2021
I try to extract a specific data from pdf Studio pdf , question	2	853	March 7, 2020
複数のPDFの中のそれぞれのテキストの取得が上手くいきません。フォーラム	9	84	April 7, 2025

PDFファイルからExcelに文字列転記したい

Related topics