Digitize Document: Unsupported content type: audio/mpeg3

Facing the below error: Digitize Document: Unsupported content type: audio/mpeg3.

The error occurs in digitize step of few of the invoices. For few of the invoices i don’t face this error.

Ocr tried: Omnipage, UiPath document ocr.

@subha.subramanian,

Welcome to the UiPath Community!

The error message “Unsupported content type: audio/mpeg3” indicates that the OCR (Optical Character Recognition) engine used by UiPath’s “Digitize Document” activity does not support the content type “audio/mpeg3”. This suggests that the document being processed contains audio content in the form of an MP3 file, which cannot be processed by the OCR engine.

To resolve this issue, you’ll need to ensure that the documents being processed are in a format that can be handled by the OCR engine. OCR engines typically work with image-based documents such as scanned images or PDF files containing scanned images. They are not designed to process audio content.

Here are some steps you can take to address the issue:

  1. Verify Document Format: Check the format of the documents causing the error. Ensure that they are in a supported format for OCR processing, such as scanned images (e.g., JPEG, PNG) or PDF files containing images.
  2. Preprocess Documents: If the documents contain audio content, you may need to preprocess them to remove the audio or extract the relevant image-based content for OCR processing. This could involve using audio processing tools to separate the audio from the document or converting the document to a suitable image format.
  3. Check OCR Engine Settings: Review the settings of the OCR engine being used (Omnipage or UiPath Document OCR). Ensure that the settings are configured appropriately for the type of documents you are processing and that any relevant options for handling audio content are enabled or disabled as needed.

[Generated by LLM, reviewed by me]

Thanks,
Ashok :slight_smile:

Hi ashokkarle,

Thanks for the detailed info!

The document is in Pdf format and i don’t see any audio related items present.

Friendly reminder, if you use an LLM to generate the response to a question, forum guidelines say you are supposed to indicate the text you use is AI generated.
Yours for sure looks straight out of chatGPT.

@subha.subramanian,

Check if the folder you are iterating have any non pdf files or the pdf files are able to open.

What is the file extension of the file you are trying to use?

@Jon_Smith,

Thanks for highlighting. Updated the same in my answer :robot:

Thanks,
Ashok :slight_smile:

1 Like

The folder has 10 pdf files and all extensions are in pdf… I am able to open all the files manually through adobe acrobat but upon digitizing document through UiPath facing errors for few files(4/10)…

I dont see any audio related items in the pdf

@subha.subramanian,

Would it be possible to share that file showing as audio? Will try to recreate the scenario.

Can you show us a screenshot of the files, with the extensions please? So we can validate.


Couldnt share the file as its a invoic… attached file type and extensions for references.

Below are the error details:

System.ArgumentException: Unsupported content type: audio/mpeg3 at UiPath.IntelligentOCR.Digitization.IntelligentOcrDigitizer.Digitize(Content content, IOcrEngine ocrEngine, ApplyOcrOnPdf applyOcrOnPdf, Boolean detectCheckboxes, IDigitizationScheduler scheduler, IDigitizerTelemetryService telemetryService, CancellationToken token)
at UiPath.IntelligentOCR.Activities.Digitization.DigitizeDocument.ExecuteAsync(NativeActivityContext context, CancellationToken cancellationToken)
at UiPath.Shared.Activities.AsyncTaskNativeImplementation.BookmarkResumptionCallback(NativeActivityContext context, Object value)
at UiPath.Shared.Activities.AsyncTaskNativeActivity.BookmarkResumptionCallback(NativeActivityContext context, Bookmark bookmark, Object value)
at System.Activities.Runtime.BookmarkCallbackWrapper.Invoke(NativeActivityContext context, Bookmark bookmark, Object value)
at System.Activities.Runtime.BookmarkWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager)

image


Attached the digitize document pdf and its properties for additional information

its a confidential document so will not be able to share those. instead attached extensions and file details for reference below.

Additionally here are the package details i am using currently:

Studio version:2023.4.4
DU.ML activities :1.24.0
Intelligent OCR :6.14.1

Also i am able to read the pdf with pdf activities without any issues…
used read pdf activities and read pdf with ocr… both are working fine

Weird… the documents have the right extension etc indeed.

So I asked ChatGPT if PDFs can have an mp3 embedded, apparently they can.
Could these documents actually have an mp3 in them and thats whats messing this up…?

Maybe the error is legit?

I don’t see any embedded mp3. Also i don’t see any differences between the invoices that bot is able to read through and with the ones that are throwing errors. However i am able to read both pdf using read pdf activity.

Do you foresee any changes should be done with packages version?

Nope, at this point I’d open a support ticket with UiPath, everything looks correct.

1 Like