Need to extract particular table from large XML

Hi,
We have a large xml which contains several tables, we need extract only one table which is “III.1. Bruttorückstellung für noch nicht abgewickelte Versicherungsfälle”.
The table is mentioned below (the whole table is not mentioned, just some of the contents are mentioned)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="ba-jahresabschluss-v1.xsl" type="text/xsl"?><BA-Jahresabschluss xmlns:ebp="http://www.ebundesanzeiger.de/publikation/layout"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    BANr="220412025475"
                    Version="1"
                    xsi:noNamespaceSchemaLocation="ba-jahresabschluss.xsd"
                    pub_pua_publikationsart="135_JahresabschlussFinanzbericht"
                    PDF-Orientierung="quer"
                    ID="b73d7ba0-5b53-4e63-8211-8085e6a1029f"
                    pub_rubrik="Jahresabschluss"
                    pub_eingangsdatum=""
                    pub_eingangsart=""
                    pub_veroeffentlichungstyp="Jahresabschluss"
                    pub_ktitel="Jahresabschluss zum Geschäftsjahr vom 01.01.2021 bis zum 31.12.2021"
                    pub_terminvorgabe="schnellstmoeglich"
                    pub_termin="2022-06-02"
                    pub_termin2="0001-01-01"
                    pub_termin3="0001-01-01"
                    pub_pubdat_redaktion="2022-04-27"
                    pub_pubdat_redaktion2="0001-01-01"
                    pub_pubdat_redaktion3="0001-01-01"
                    pub_pubdat="2022-04-27"
                    pub_pubdat1="2022-04-27"
                    pub_print="nein"
                    IsEsef="nein"
                    pub_berichtigung="nein">
  <Unternehmens-Kopf>
      <pub_un_name>AXA Versicherung Aktiengesellschaft</pub_un_name>
      <pub_un_sitz>Köln</pub_un_sitz>
      <pub_un_strasse>Colonia-Allee 10-20</pub_un_strasse>
.
.
.

      </TR>
            <TR>
               <TD>Gesamt</TD>
               <TD>883.016</TD>
               <TD>818.553</TD>
            </TR>
            <TR>
               <TD>Gesamtes Versicherungsgeschäft:</TD>
               <TD>10.200.630</TD>
               <TD>9.159.794</TD>
            </TR>
         </TBODY>
      </TABLE>
      <A>
</TABLE>
      <A>
         <b>B. III.1. Bruttorückstellung für noch nicht abgewickelte Versicherungsfälle</b>
      </A>
      <TABLE border="0" width="900">
         <COLGROUP align="right" valign="top">
            <COL align="left" valign="top"/>
            <COL valign="top" width="15%"/>
            <COL valign="top" width="15%"/>
         </COLGROUP>
         <THEAD>
            <TR valign="bottom">
               <TD valign="bottom">in Tsd. Euro</TD>
               <TD valign="bottom">2021</TD>
               <TD valign="bottom">2020</TD>
            </TR>
         </THEAD>
         <TBODY>
            <TR>
               <TD>selbst abgeschlossenes Versicherungsgeschäft</TD>
               <TD/>
               <TD/>
            </TR>
            <TR>
               <TD>Unfallversicherung</TD>
               <TD>615.688</TD>
               <TD>576.537</TD>
            </TR>
            <TR>
               <TD>Haftpflichtversicherung</TD>
               <TD>2.655.998</TD>
               <TD>2.632.426</TD>
            </TR>
.
.
.
.
 </TR>
         </TBODY>
      </TABLE>
.
.
 <A>
         <Fn FnID="1">
            <FnZ>1</FnZ>
            <FnA> Der Bericht über die Solvabilität und Finanzlage ist nicht Bestandteil des Lageberichts und damit nicht prüfungspflichtig.</FnA>
         </Fn>
      </A>
  </Bek-Text>
</BA-Jahresabschluss>

The xml contains other tables and other data as well, so can you give me any suggestion how we can extract this specific table from this large xml ?

Thanks.

Your XML loooks close to HTML

Data Scraping option:

  • Open XML File as HTML in Browser and use an dynamic selector to the table
  • for the table you can calculate e.g. the Table index when checking for the prepending III.1 A element

XML Option 1

  • Deserialize XML File
  • Calculate Child Index/Position for the prepending Anchor and take next following table

XML Option 2

  • Deserialize XML File
  • check for an XPath

with a more complete XML sample we can help you more for the detailed implementation options and approaches

1 Like

Thanks for your reply, yes the xml is quite complex one, is it possible for you provide some example (code or XAML) for XML option 1 and/or XML option 2 approach ?
Also if you have any other approaches please let me know as well.

I have some more information of the XML file in the question, as you can see its a complex structure with a lot of tables.

Thanks.

we would recommend that you will do some selftrainings

XML Namespace handling:

XML Examples

It looks like the mentioned sample is a merge of different samples, but not reliable enough to use it for prototypings. But we are sure, once you have done some minimal readings of the provided resources you can implement it by your self. Also we are here for further help

1 Like

no its not merge of different samples, I mean to say its a single file with a lot of tables, and I need to extract one specific table. Let me check the link that you have shared.

can you check:

xDoc.root.Descendants("b").where(Function (x) x.value.StartsWith("B. III.1. Bruttorückstellung")).First().Ancestors("A").first().ElementsAfterSelf("TABLE").First()
1 Like

Thanks for your help, it worked.
Now we have some further requirements, the one I shared with you is one xml file, likewise we have several xml files, not all xml follows the same tags.
Below you can see that we have another XML file where I need to search for another heading which is “C. II. 1. Bruttorückstellung für noch nicht abgewickelte Versicherungsfalle”. As you can see the tag in this file is different. (in the previous file you can see that the Ancestor tag is A where it is Z-Title in this file)

In that case is it possible to make a dynamic parsing query which will fetch table based on the heading value (supplied by an excel or list of string) ?

@ppr is it possible to make a dynamic query ?
Please let me know.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.