Skip to content
Closed
57 changes: 31 additions & 26 deletions Doc/library/pickle.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,9 @@ an unpickler, then you call the unpickler's :meth:`load` method. The

Be sure to always open pickle files created with protocols >= 1 in binary mode.
For the old ASCII-based pickle protocol 0 you can use either text mode or binary
mode as long as you stay consistent.
mode as long as you stay consistent. It is preferred to always use binary
mode, because a pickle file written in text mode on Windows not always can
be correctly unpickled on other systems and in Python 3.

A pickle file written with protocol 0 in binary mode will contain lone linefeeds
as line terminators and therefore will look "funny" when viewed in Notepad or
Expand All @@ -176,31 +178,34 @@ process more convenient:
.. versionchanged:: 2.3
Introduced the *protocol* parameter.

*file* must have a :meth:`write` method that accepts a single string argument.
It can thus be a file object opened for writing, a :mod:`StringIO` object, or
any other custom object that meets this interface.
*file* must have a :meth:`write` method that accepts a single byte string
argument. It can thus be an on-disk file opened for binary writing,
a :class:`io.BytesIO` or :class:`StringIO.StringIO` object, or any other
custom object that meets this interface.


.. function:: load(file)

Read a string from the open file object *file* and interpret it as a pickle data
Read a a pickled object representation from the open file object *file* and
interpret it as a pickle data
stream, reconstructing and returning the original object hierarchy. This is
equivalent to ``Unpickler(file).load()``.

*file* must have two methods, a :meth:`read` method that takes an integer
argument, and a :meth:`readline` method that requires no arguments. Both
methods should return a string. Thus *file* can be a file object opened for
reading, a :mod:`StringIO` object, or any other custom object that meets this
interface.
methods should return a byte string. Thus *file* can be an on-disk file
opened for binary reading, a :class:`io.BytesIO` or
:class:`StringIO.StringIO` object, or any other custom object that meets
this interface.

This function automatically determines whether the data stream was written in
binary mode or not.
The protocol version of the pickle is detected automatically, so no
protocol argument is needed.


.. function:: dumps(obj[, protocol])

Return the pickled representation of the object as a string, instead of writing
it to a file.
Return the pickled representation of the object as a byte string, instead
of writing it to a file.

If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
Expand All @@ -210,10 +215,10 @@ process more convenient:
The *protocol* parameter was added.


.. function:: loads(string)
.. function:: loads(data)

Read a pickled object hierarchy from a string. Characters in the string past
the pickled object's representation are ignored.
Read a pickled object hierarchy from a byte string. Bytes past the pickled
object's representation are ignored.

The :mod:`pickle` module also defines three exceptions:

Expand Down Expand Up @@ -252,8 +257,9 @@ The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
.. versionchanged:: 2.3
Introduced the *protocol* parameter.

*file* must have a :meth:`write` method that accepts a single string argument.
It can thus be an open file object, a :mod:`StringIO` object, or any other
*file* must have a :meth:`write` method that accepts a single byte string
argument. It can thus be an on-disk file opened for binary writing,
a :class:`io.BytesIO` or :class:`StringIO.StringIO` object, or any other
custom object that meets this interface.

:class:`Pickler` objects define one (or two) public methods:
Expand Down Expand Up @@ -297,15 +303,15 @@ instance. If the same object is pickled by multiple :meth:`dump` calls, the
.. class:: Unpickler(file)

This takes a file-like object from which it will read a pickle data stream.
This class automatically determines whether the data stream was written in
binary mode or not, so it does not need a flag as in the :class:`Pickler`
factory.
This class automatically determines the protocol version of the pickle, so
it does not need a protocol argument as in the :class:`Pickler` factory.

*file* must have two methods, a :meth:`read` method that takes an integer
argument, and a :meth:`readline` method that requires no arguments. Both
methods should return a string. Thus *file* can be a file object opened for
reading, a :mod:`StringIO` object, or any other custom object that meets this
interface.
methods should return a byte string. Thus *file* can be an on-disk file
opened for binary reading, a :class:`io.BytesIO` or
:class:`StringIO.StringIO` object, or any other custom object that meets
this interface.

:class:`Unpickler` objects have one (or two) public methods:

Expand All @@ -316,8 +322,7 @@ instance. If the same object is pickled by multiple :meth:`dump` calls, the
the constructor, and return the reconstituted object hierarchy specified
therein.

This method automatically determines whether the data stream was written
in binary mode or not.
The protocol version of the pickle is detected automatically.


.. method:: noload()
Expand Down Expand Up @@ -688,7 +693,7 @@ performing any necessary imports, and it may raise an error to prevent
instances of the class from being unpickled.

The moral of the story is that you should be really careful about the source of
the strings your application unpickles.
the data your application unpickles.


.. _pickle-example:
Expand Down
14 changes: 14 additions & 0 deletions Lib/pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from copy_reg import dispatch_table
from copy_reg import _extension_registry, _inverted_registry, _extension_cache
import marshal
import os
import sys
import struct
import re
Expand Down Expand Up @@ -502,7 +503,10 @@ def save_unicode(self, obj, pack=struct.pack):
self.write(BINUNICODE + pack("<i", n) + encoding)
else:
obj = obj.replace("\\", "\\u005c")
obj = obj.replace("\0", "\\u0000")
obj = obj.replace("\n", "\\u000a")
obj = obj.replace("\r", "\\u000d")
obj = obj.replace("\x1a", "\\u001a") # EOF on DOS
self.write(UNICODE + obj.encode('raw-unicode-escape') + '\n')
self.memoize(obj)
dispatch[UnicodeType] = save_unicode
Expand All @@ -527,7 +531,10 @@ def save_string(self, obj, pack=struct.pack):
else:
if unicode:
obj = obj.replace("\\", "\\u005c")
obj = obj.replace("\0", "\\u0000")
obj = obj.replace("\n", "\\u000a")
obj = obj.replace("\r", "\\u000d")
obj = obj.replace("\x1a", "\\u001a") # EOF on DOS
obj = obj.encode('raw-unicode-escape')
self.write(UNICODE + obj + '\n')
else:
Expand Down Expand Up @@ -1373,6 +1380,13 @@ def decode_long(data):
from StringIO import StringIO

def dump(obj, file, protocol=None):
if type(file) is FileType and 'b' not in file.mode:
if protocol and os.linesep != '\n':
raise ValueError('File must be opened in binary mode')
if protocol or sys.py3kwarning:
import warnings
warnings.warn('File must be opened in binary mode',
DeprecationWarning, stacklevel=2)
Pickler(file, protocol).dump(obj)

def dumps(obj, protocol=None):
Expand Down
43 changes: 37 additions & 6 deletions Lib/test/pickletester.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import unittest
import pickle
import cPickle
import os
import StringIO
import cStringIO
import pickletools
Expand Down Expand Up @@ -1697,22 +1698,20 @@ def __getattr__(self, key):
class AbstractPickleModuleTests(unittest.TestCase):

def test_dump_closed_file(self):
import os
f = open(TESTFN, "w")
f = open(TESTFN, "wb")
try:
f.close()
self.assertRaises(ValueError, self.module.dump, 123, f)
finally:
os.remove(TESTFN)
support.unlink(TESTFN)

def test_load_closed_file(self):
import os
f = open(TESTFN, "w")
f = open(TESTFN, "wb")
try:
f.close()
self.assertRaises(ValueError, self.module.dump, 123, f)
finally:
os.remove(TESTFN)
support.unlink(TESTFN)

def test_load_from_and_dump_to_file(self):
stream = cStringIO.StringIO()
Expand All @@ -1736,6 +1735,38 @@ def test_callapi(self):
self.module.Pickler(f, -1)
self.module.Pickler(f, protocol=-1)

def test_dump_text_file(self):
f = open(TESTFN, "w")
try:
with support.check_py3k_warnings():
self.module.dump(123, f)
if os.linesep != '\n':
self.assertRaises(ValueError, self.module.dump, 123, f, 1)
self.assertRaises(ValueError, self.module.dump, 123, f, 2)
else:
with support.check_warnings(('', DeprecationWarning)):
self.module.dump(123, f, 1)
with support.check_warnings(('', DeprecationWarning)):
self.module.dump(123, f, 2)
finally:
f.close()
support.unlink(TESTFN)

def test_end_of_text_file(self):
try:
with open(TESTFN, "w") as f:
with support.check_py3k_warnings():
self.module.dump(u'a\x1ab', f)
with open(TESTFN) as f:
self.assertEqual(self.module.load(f), u'a\x1ab')

with open(TESTFN, "wb") as f:
self.module.dump(u'a\x1ab', f)
with open(TESTFN) as f:
self.assertEqual(self.module.load(f), u'a\x1ab')
finally:
support.unlink(TESTFN)

def test_incomplete_input(self):
s = StringIO.StringIO("X''.")
self.assertRaises(EOFError, self.module.load, s)
Expand Down
4 changes: 2 additions & 2 deletions Lib/test/test_signal.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,8 @@ def test_main(self):
# re-raises information about any exceptions the child
# raises. The real work happens in self.run_test().
os_done_r, os_done_w = os.pipe()
with closing(os.fdopen(os_done_r)) as done_r, \
closing(os.fdopen(os_done_w, 'w')) as done_w:
with closing(os.fdopen(os_done_r, 'rb')) as done_r, \
closing(os.fdopen(os_done_w, 'wb')) as done_w:
child = os.fork()
if child == 0:
# In the child process; run the test and report results
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Text (protocol 0) pickles dumped to files opened in binary mode can now be
loaded from files opened in text mode without loss. Dumping binary pickle to
a file opened in text mode on Windows will now produce a :exc:`ValueError`,
since it leads to data corruption. Dumping binary pickle on other systems and
text pickle on all systems in Py3k mode to a file opened in text mode will now
produce a :exc:`DeprecationWarning`.
21 changes: 20 additions & 1 deletion Modules/cPickle.c
Original file line number Diff line number Diff line change
Expand Up @@ -1390,7 +1390,9 @@ modified_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
}
#endif
/* Map 16-bit characters to '\uxxxx' */
if (ch >= 256 || ch == '\\' || ch == '\n') {
if (ch >= 256 ||
ch == '\\' || ch == 0 || ch == '\n' || ch == '\r' || ch == 0x1a)
{
*p++ = '\\';
*p++ = 'u';
*p++ = hexdigit[(ch >> 12) & 0xf];
Expand Down Expand Up @@ -5697,6 +5699,23 @@ cpm_dump(PyObject *self, PyObject *args, PyObject *kwds)
&ob, &file, &proto)))
goto finally;

if (Py_TYPE(file) == &PyFile_Type && !((PyFileObject *)file)->f_binary) {
#ifdef MS_WINDOWS
if (proto) { /* binary protocol */
PyErr_SetString(PyExc_ValueError,
"File must be opened in binary mode");
goto finally;
}
#endif
if (proto || Py_Py3kWarningFlag) {
if (PyErr_WarnEx(PyExc_DeprecationWarning,
"File must be opened in binary mode", 1) < 0)
{
goto finally;
}
}
}

if (!( pickler = newPicklerobject(file, proto)))
goto finally;

Expand Down