0

I want to group a XML by a trigger or flag value. I can only use XSLT 1.0. Original XML:

<?xml version = "1.0" encoding = "utf-8"?>
<root>
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_2_1.tif"/>
        <field   level = "document" name = "groupId" value = "1"/>
        <field   level = "document" name = "scanDokumentPos" value = "1 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_2_2.tif"/>
        <field   level = "document" name = "groupId" value = "0"/>
        <field   level = "document" name = "scanDokumentPos" value = "1 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_2_3.tif"/>
        <field   level = "document" name = "groupId" value = "0"/>
        <field   level = "document" name = "scanDokumentPos" value = "1 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_3_1.tif"/>
        <field   level = "document" name = "groupId" value = "0"/>
        <field   level = "document" name = "scanDokumentPos" value = "2 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_3_2.tif"/>
        <field   level = "document" name = "groupId" value = "0"/>
        <field   level = "document" name = "scanDokumentPos" value = "2 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_7_1.tif"/>
        <field   level = "document" name = "groupId" value = "1"/>
        <field   level = "document" name = "scanDokumentPos" value = "6 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_7_2.tif"/>
        <field   level = "document" name = "groupId" value = "0"/>
        <field   level = "document" name = "scanDokumentPos" value = "6 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_8_1.tif"/>
        <field   level = "document" name = "groupId" value = "0"/>
        <field   level = "document" name = "scanDokumentPos" value = "7 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_8_2.tif"/>
        <field   level = "document" name = "groupId" value = "0"/>
        <field   level = "document" name = "scanDokumentPos" value = "7 "/>
    </image>
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_9_1.tif"/>
        <field   level = "document" name = "groupId" value = "1"/>
        <field   level = "document" name = "scanDokumentPos" value = "8 "/>
    </image> 
    <image> 
        <field   level = "system" name = "Image Filename" value = "Batch121_10_1.tif"/>
        <field   level = "document" name = "groupId" value = "1"/>
        <field   level = "document" name = "scanDokumentPos" value = "9 "/>
    </image> 
</root>

Result should be:

<document>
    <childdocuments>
        <document GroupID=""> <!-- grouped because of groupId=1 -->
            <childdocuments>
                <document GroupID="1"> <!-- scanDokumentPos --> <!-- grouped because of groupId=0 and scanDokumentPos=1 -->
                    <pages> 
                        <page path="Batch121_2_1.tif"> <!-- Image Filename -->
                        <page path="Batch121_2_2.tif"> <!-- Image Filename -->
                        <page path="Batch121_2_3.tif"> <!-- Image Filename -->
                    </page>
                </document>
                <document GroupID="2"> <!-- scanDokumentPos --> <!-- grouped because of groupId=0 and scanDokumentPos=2 -->
                    <pages> 
                        <page path="Batch121_3_1.tif"> <!-- Image Filename -->
                        <page path="Batch121_3_2.tif"> <!-- Image Filename -->
                    </page>
                </document>             
            </childdocuments>
        </document>
        <document GroupID=""> <!-- start new document because of groupId=1 -->
            <childdocuments>
                <document GroupID="6"> <!-- scanDokumentPos --> <!-- grouped because of groupId=0 and scanDokumentPos=6 -->
                    <pages> 
                        <page path="Batch121_7_1.tif"> <!-- Image Filename -->
                        <page path="Batch121_7_2.tif"> <!-- Image Filename -->
                    </page>
                </document>
                <document GroupID="7"> <!-- scanDokumentPos --> <!-- grouped because of groupId=0 and scanDokumentPos=7 -->
                    <pages> 
                        <page path="Batch121_8_1.tif"> <!-- Image Filename -->
                        <page path="Batch121_8_2.tif"> <!-- Image Filename -->
                    </page>
                </document>     
            </childdocuments>
        </document>     
        <document GroupID=""> <!-- start new document because of groupId=1 -->
            <childdocuments>
                <document GroupID="8"> <!-- scanDokumentPos --> <!-- grouped because of groupId=0 and scanDokumentPos=8 -->
                    <pages> 
                        <page path="Batch121_9_1.tif"> <!-- Image Filename -->
                    </page>
                </document>     
            </childdocuments>
        </document>     
        <document GroupID=""> <!-- start new document because of groupId=1 -->
            <childdocuments>
                <document GroupID="9"> <!-- scanDokumentPos --> <!-- grouped because of groupId=0 and scanDokumentPos=9 -->
                    <pages> 
                        <page path="Batch121_10_1.tif"> <!-- Image Filename -->
                    </page>
                </document>     
            </childdocuments>
        </document>     
    </childdocuments>
</document>

The first grouping-key is groupId. If 1 start new document on level 1. So all elements after groupId=1 until next groupId=1 or the end of file belongs to one group. The second grouping-key is scanDokumentPos. Group all the pages to one document which has the same scanDokumentPos.

Testing: http://xsltransform.net/3N7GxDx/3

2
  • Where is your attempted XSLT? Testing link only shows default. Please research for an answer and come back with a specific question on earnest attempt. This grouping need in XSLT 1.0 is pretty extensive for us volunteering our free time. Commented Sep 2, 2020 at 15:19
  • Grouping in XSLT 1.0 is best done using the Muenchian method: jenitennison.com/xslt/grouping/muenchian.html For an example of grouping and subgrouping see:stackoverflow.com/a/58525214/3016153 Commented Sep 2, 2020 at 15:57

1 Answer 1

1

Like mentioned in the comments, grouping in XSLT 1.0 is best done using Muenchian Grouping.

The example given (https://stackoverflow.com/a/58525214/3016153) is a good one, but the difference with your question is your first grouping is more like a "group-starting-with" than just grouping on a value.

What I would do is create 2 keys.

The first selects image elements that don't have a groupId of "1" based on the generated id of the first preceding sibling that has a groupId of "1".

The second selects image elements that don't have a groupId of "1" based on a combination of the generated id (see above) and the "scanDokumentPos" value.

Here's an example. I didn't handle the processing of the image elements. I only show the grouping. The handling of the image elements should be trivial.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>
  
  <xsl:key name="doc_group" match="image[not(field[@name='groupId']/@value='1')]" 
    use="generate-id(preceding-sibling::image[field[@name='groupId']/@value='1'][1])"/>
  
  <xsl:key name="doc_pos_group" match="image[not(field[@name='groupId']/@value='1')]" 
    use="concat(generate-id(preceding-sibling::image[field[@name='groupId']/@value='1'][1]),'|',
    field[@name='scanDokumentPos']/@value)"/>
  
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="/*">
    <document>
      <childdocuments>
        <xsl:for-each select="image[field[@name='groupId']/@value='1']">
          <xsl:variable name="curr_image" select="."/>
          <xsl:variable name="curr_id" select="generate-id()"/>
          <document GroupID="">
            <childdocuments>
              <xsl:if test="1 >= count(key('doc_group',$curr_id))">
                <document GroupID="{normalize-space(field[@name='scanDokumentPos']/@value)}">
                  <pages>
                    <xsl:apply-templates select="$curr_image"/>
                  </pages>
                </document>
              </xsl:if>
              <xsl:for-each select="key('doc_group',$curr_id)[count(.|key('doc_pos_group', 
                concat($curr_id,'|',field[@name='scanDokumentPos']/@value))[1])=1]">
                <xsl:variable name="doc_pos" select="field[@name='scanDokumentPos']/@value"/>
                <document GroupID="{normalize-space($doc_pos)}">
                  <pages>
                    <xsl:apply-templates 
                      select="$curr_image[field[@name='scanDokumentPos']/@value=$doc_pos]|
                      key('doc_pos_group',concat($curr_id,'|',$doc_pos))"/>
                  </pages>
                </document>
              </xsl:for-each>
            </childdocuments>
          </document>
        </xsl:for-each>
      </childdocuments>
    </document>
  </xsl:template>
  
</xsl:stylesheet>

Fiddle: http://xsltfiddle.liberty-development.net/a9HjZV/4

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your solution. I noticed that my example was not so clear. It is also possible that there are several images coming, which have 1. I have adjusted the question.
@knobli - Please see my update. I added an if statement that checks the size of the group. There's probably a better way to account for the single images, but unfortunately I don't have time to spend on this today. Hopefully the change helps!
I tried to apply this procedure to a second case, but somehow I get 'document' as a node in the template instead of the fields. I think that I only get 1 document in the first childDocuments has the same mistake: xsltfiddle.liberty-development.net/a9HjZV/2
@knobli - I think the test in the "if" needs an adjustment. Does this look better? xsltfiddle.liberty-development.net/a9HjZV/3 If so I'll update my answer.
Went ahead and updated my answer since nothing changes in the output of the fiddle.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.