-
Notifications
You must be signed in to change notification settings - Fork 20
Description
To recreate, do this:
import maec
maec.parse_xml_instance("maec_4.1_examples/package_manual_analysis_example.xml")["api"].to_xml()
The problem is that to_xml() uses StringIO and mixes unicode and str, and some of the str elements are UTF-8 encoded. When you try to join elements like that, it defaults to the ascii encoding to convert all the str into unicode, and barfs. See the comment in StringIO.py for the getvalue() method:
"""
Retrieve the entire contents of the "file" at any time before
the StringIO object's close() method is called.
The StringIO object can accept either Unicode or 8-bit strings,
but mixing the two may take some care. If both are used, 8-bit
strings that cannot be interpreted as 7-bit ASCII (that use the
8th bit) will cause a UnicodeError to be raised when getvalue()
is called.
"""
This doesn't look like an isolated problem; the to_xml_file() method doesn't use codecs.open(), and expects that it's writing everything in ASCII.
I'm not sure whether you intend this all to be ASCII, or all to be Unicode strings. You have to pick one or the other, or you're going to get into trouble. You can do surgery on the StringIO object before you do getvalue(), but that'll still leave you with problems in to_xml_file(), I'm guessing.