Document Understanding - XML Extraction?

kellingraywilliamson · November 29, 2021, 12:31pm

Hi all,

With the document understanding activities - when extracting data (in the DataExtraction Scope), there is a ‘Form Extractor’, and ‘Regex Based Extractor’ (etc), however, is there an equiverlant activity for XML based extraction that can be installed?

ppr · November 29, 2021, 12:41pm

Currently the docu is not listing a XML extractor
let us know your details. Maybe we can help to setup a custom XML extraction approach that can be integrated within the flow

kellingraywilliamson · November 29, 2021, 1:33pm

Is it something that we could do in a custom code block (or similar)?

Currently, we have a process which extracts fields from an xml document by supplying 2 elements of an xml path (a top level ‘parent’ field and a lower level ‘child’ field - for example, “Customer” and then “FirstName”).

dokumentor · November 29, 2021, 3:00pm

Do your xml files have a consistent structure? Or they are completely different from one each other. In case they have a common structure you may use XML activities that come with UiPath.WebAPI.Activities package.

Otherwise if they don’t have a common structure you may treat them as text and try to extract data using RegEx or searching keywords in string.

Hope it helps!

kellingraywilliamson · November 29, 2021, 3:01pm

They do have a consistant structure (for the most part). However, what we’re trying to do is process different instruction-documents from different clients. Some are XML documents (where we need to extract XML data), some are PDF files (which we use regex for)…

dokumentor · November 29, 2021, 3:04pm

So XML Activities should be useful in your case

ppr · November 29, 2021, 5:04pm

we prefer to process XML with XML Tools / Api

in case of only 2 elements are to retrieve and the XML element names are uniqu within the document, A regex approach can be checked.

Otherwise we would implement a custom XML Extractor step (do have the feeling that it not only a few lines within an invoke code) by:

define the document model as usual
define the XML Extractor config (e.g. which field will have which XPATH)
extract the the values, driven by the Document Type and it fields, using the Extractor config
manipulate / modify the ExtractionResult
Document Understanding

just to shortlist some essential building blocks

Topic		Replies	Views
Data Extraction using Document Understanding on Studio Web Studio Web document_understanding , uipath-drafts , data-extraction , studio-web	2	1453	March 24, 2022
Assist with extract data from doc Activities activities , question , document_understanding	4	952	September 24, 2021
Having Issues Extracting Data from semi structured pdf Document Understanding	2	1066	June 21, 2020
How to use Regex Based Extractor with an plain TXT file? Activities activities , question , document_understanding	4	569	April 13, 2023
ML extractor trainer Document Understanding activities , question , document_understanding	2	622	June 22, 2023

Document Understanding - XML Extraction?

Related topics