How to remove duplicate entry - XSLT

Question

I am try to remove duplicate entry after entity § and if contains the , in entry and after tokenize the start-with the ( round bracket then entry e.g (17200(b)(2), (4)–(6)) s/b e.g (<p>17200(b)(2)</p><p>17200(b)(4)–(6)</p>).
Input XML

<root>
    <p>CC &#x00a7;1(a), (b), (c)</p>
    <p>Civil Code &#x00a7;1(a), (b)</p>
    <p>CC &#x00a7;&#x00a7;2(a)</p>
    <p>Civil Code &#x00a7;3(a)</p>
    <p>CC &#x00a7;1(c)</p>
    <p>Civil Code &#x00a7;1(a), (b), (c)</p>
    <p>Civil Code &#x00a7;17200(b)(2), (4)–(6), (8), (12), (16), (20), and (21)</p>
</root>

Expected Output

<root>
   <sec specific-use="CC">
      <title content-type="Sta_Head3">CIVIL CODE</title>
      <p>1(a)</p>
      <p>1(b)</p>
      <p>1(c)</p>
      <p>2(a)</p>
      <p>3(a)</p>
      <p>17200(b)(2)</p>
      <p>17200(b)(4)–(6)</p>
      <p>17200(b)(8)</p>
      <p>17200(b)(12)</p>
      <p>17200(b)(16)</p>
      <p>17200(b)(20)</p>
      <p>17200(b)(21)</p>
   </sec>
</root>

XSLT Code

<xsl:template match="root">
    <xsl:copy>
        <xsl:for-each-group select="p[(starts-with(., 'CC ') or starts-with(., 'Civil Code'))]" group-by="replace(substring-before(., ' &#x00a7;'), 'Civil Code', 'CC')">
            <xsl:text>&#x0A;</xsl:text>
            <sec specific-use="{current-grouping-key()}">
                <xsl:text>&#x0A;</xsl:text>
                <title content-type="Sta_Head3">CIVIL CODE</title>
                <xsl:for-each-group select="current-group()" group-by="replace(substring-after(., '&#x00a7;'), '&#x00a7;', '')">
                    <xsl:sort select="replace(current-grouping-key(), '[^0-9.].*$', '')" data-type="number" order="ascending"/>
                    <xsl:for-each 
                        select="distinct-values(
                        current-grouping-key() ! 
                        (let $tokens := tokenize(current-grouping-key(), ', and |, | and ') 
                        return (head($tokens), tail($tokens) ! (substring-before(head($tokens), '(') || .)))
                        )" expand-text="yes">
                        <p>{.}</p>
                    </xsl:for-each>
                </xsl:for-each-group>
            </sec>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

Sebastien · Accepted Answer · 2021-01-25 17:02:34Z

0

You could do it like this, in a two-step approach where you first compute the list of existing elements and then use a for-each-group to remove duplicates.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/">
    <xsl:variable name="listP">
        <xsl:apply-templates select="root/p"/>
    </xsl:variable>
    
    <xsl:for-each-group select="$listP" group-by="p">
        <p><xsl:value-of select="current-grouping-key()"/></p>
    </xsl:for-each-group>
  </xsl:template>
  
  <xsl:template match="p">
    <xsl:variable name="input" select="replace(substring-after(.,'&#x00a7;'),'&#x00a7;','')"/>
    <xsl:variable name="chapter" select="substring-before($input,'(')"/>
    <xsl:for-each select="tokenize(substring-after($input, $chapter),',')">
        <p><xsl:value-of select="concat($chapter,replace(replace(.,' ',''),'and',''))"/></p>    
    </xsl:for-each>
  </xsl:template>
  
</xsl:stylesheet>

See it working here : https://xsltfiddle.liberty-development.net/gVrvcxQ

answered Jan 25, 2021 at 17:02

Sebastien

2,7231 gold badge10 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sam Over a year ago

Please check, missing (b) in last line e.g. 17200(b)(4)–(6) and if you given me some solution my existing code then for me like and I will appreciated.

Sebastien Over a year ago

@Sam What is the rule to determine that 17200(b) requires the first parentheses and then the other subsections but the other cases don't require it, like 1(a) is not 1(a)(b).

Sam Over a year ago

I think after tokenize the end with '(.*)(.*?)$' then then replace to group 1.

Sebastien Over a year ago

@Sam ok then you have the solution. I think my code goes far enough, you can modify it to suit the specific needs of your requirements.

Collectives™ on Stack Overflow

How to remove duplicate entry - XSLT

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related