1

I need to remove everything thing starting with "[@" and ending with "]" in the given XML. The text that falls in-between these sqaure brackers should also be removed. I tried the replace function but it turned the XML into a String and did not remove the full string including [@ and ]. Please see below sample input and expected output;

Sample input. See the [@ inside the < differences> tag:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
    <data id="FF2AD08D621">
        <StateFlag>0</StateFlag>
        <DCC>B</DCC>
        <LastName>MMNGorthy</LastName>
        <Name>Scott</Name>
        <Structure>NonProfit</Structure>
        <BusinessType>Engineering</BusinessType>
        <meams>AI</meams>
        <Count>0</Count>
        <CTier>000</CTier>
        <CTCond>111</CTCond>
        <CTPl>222</CTPl>
        <DO>1</DO>
        <AuthObject>0</AuthObject>
    </data>
    <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session[@id='1034215']/properties[@id='p0228B52620A948E480D38667394112C5']/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
    </differences>
</ns0:trans>

Expected output. Notice how every text that starts with "[@" has been removed under < differences> tag:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
    <data id="FF2AD08D621">
        <StateFlag>0</StateFlag>
        <DCC>B</DCC>
        <LastName>MMNGorthy</LastName>
        <Name>Scott</Name>
        <Structure>NonProfit</Structure>
        <BusinessType>Engineering</BusinessType>
        <meams>AI</meams>
        <Count>0</Count>
        <CTier>000</CTier>
        <CTCond>111</CTCond>
        <CTPl>222</CTPl>
        <DO>1</DO>
        <AuthObject>0</AuthObject>
    </data>
    <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session/properties/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session/properties/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
        <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session/properties/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce" />
    </differences>
</ns0:trans>

Here is the code that I tried:

%dw 2.0
import * from dw::core::Strings
output text/plain
---
(write(payload, "application/json") replace "[@" with "") replace "]" with ""

Output of my code has made everything Null under < differences> tag:

{
  "trans": {
    "data": {
      "StateFlag": "0",
      "DCC": "B",
      "LastName": "MMNGorthy",
      "Name": "Scott",
      "Structure": "NonProfit",
      "BusinessType": "Engineering",
      "meams": "AI",
      "Count": "0",
      "CTier": "000",
      "CTCond": "111",
      "CTPl": "222",
      "DO": "1",
      "AuthObject": "0"
    },
    "differences": {
      "difference": null,
      "difference": null,
      "difference": null
    }
  }
}
5
  • Please share what you tried and the result you got. Commented Mar 20 at 21:34
  • @aled. Thanks for getting back to me. I've updated the question with the code I've tried and the output I got. I used the replace function but it made everything null under < differences> tag. It also converted the XML into String which I am trying to avoid. Any help from you will be greatly appreciated. Thanks! Commented Mar 21 at 3:02
  • The information in your comment should go into the question too, not in a comment. Commented Mar 21 at 3:29
  • Why is the script writing a JSON if the expected output is in XML? Commented Mar 21 at 11:27
  • Does it really have to replace the regex everywhere? or is it only going to be at trans.differences.*difference.@fullPath ? Both are possible, but if you do not need it then it is not worth it to make the code complex just to make it generic Commented Mar 21 at 16:01

1 Answer 1

2

Replacing over the entire XML as a string causes the script to lose the structure provided by the XML object, which would make the transformation more fragile and prone to errors. What you really should do is to use a recursive function to transverse the input object, using pattern-matching to identify the current type and transform the values inside object attributes only. I recommend to get familiar with the basic structure of such functions because they are commonly needed.

fun recursiveFunction(x)=
    x match {
        case o is Object -> o mapObject ((value, key) -> (key): recursiveFunction(value)
        case a is Array -> a map recursiveFunction($)
        case s is String -> recursiveFunction(s)
        case n is Number -> recursiveFunction(n)
        ...
        else -> x 
    }

Note that in XML there is no concept of arrays, like in JSON, however I added a check for them anyway in case the input changes. JSON in the other hand does not has a concept of attributes so it may not be useful for you. You can remove or comment it.

I also added a check for the value of string XML elements. If you only want to transform attributes you can remove it.

To make it more generic we can pass the actual regular expression as a parameter. This is the recursive function

fun removeRegex(x, r: Regex)=
    x match {
        case o is Object -> o mapObject ((value, key) -> (key) @((removeInAttributes(log(key.@), r))): removeRegex(value, r) )
        case s is String -> removeInStrings(s, r) // if needed for element values too
        case a is Array -> a map removeRegex($, r) // XML doesn't has arrays but I'm leaving this branch in case the input format changes
        else -> x
    }

About the actual replacement of the string value, the script in the question only replaces the delimiters. You want to remove also the content between those delimiters. For that we can use a regular expression. We need to be careful to stop at the first ending delimiter or it could consume the text between two separate square brackets sections.

/\[@[^\]]+]/

I moved the actual removal to a separate function for clarity and because I reused it for Strings. In case the input object doesn't has attributes I used a second definition of the function to catch it without having to check for nulls.

fun removeInAttributes(o: Object, r: Regex) = o mapObject ((value, key) -> (key): removeInStrings(value, r) )

fun removeInAttributes(n: Null, r: Regex) = null // if there are no attributes do nothing

fun removeInStrings(s: String, r: Regex) = s replace r with ("")

Finally you just call the main function with the input XML and the regular expression:

---
removeRegex(payload, /\[@[^\]]+]/)

Output:

<?xml version='1.0' encoding='UTF-8'?>
<ns0:trans xmlns:ns0="http://Test.Sample.data/1.0" id="1495144" TPId="4aec" change="0" count="1" dateStamp="2024-08-02T03:07:48" effectiveDate="2024-08-01" transId="12447055">
  <data id="FF2AD08D621">
    <StateFlag>0</StateFlag>
    <DCC>B</DCC>
    <LastName>MMNGorthy</LastName>
    <Name>Scott</Name>
    <Structure>NonProfit</Structure>
    <BusinessType>Engineering</BusinessType>
    <meams>AI</meams>
    <Count>0</Count>
    <CTier>000</CTier>
    <CTCond>111</CTCond>
    <CTPl>222</CTPl>
    <DO>1</DO>
    <AuthObject>0</AuthObject>
  </data>
  <differences ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce">
    <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@dateModified" fullPath="/session/properties/@dateModified" oldValue="2023-09-29" newValue="2023-11-05" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
    <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@manuscript" fullPath="/session/properties/@manuscript" oldValue="PGR_CommercialAuto_Pages_US_7_8_0_0" newValue="PGR_TransACTMaster2_10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
    <difference type="2" domain="p0228B52620A948E480D38667394112C5" path="@caption" fullPath="/session/properties/@caption" oldValue="PGR CommercialAuto Pages US 7.8.0.0" newValue="PGR TransACT Master 2.10" ODSId="94e1f77c-d13a-4aec-9f6c-0d60c90175ce"/>
  </differences>
</ns0:trans>
Sign up to request clarification or add additional context in comments.

1 Comment

Your solution worked! Thanks a lot! @aled

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.