Skip to content

Conversation

@thomasballinger
Copy link
Member

@thomasballinger thomasballinger commented Nov 18, 2016

Boto is doing something pretty weird: in Python 3, it makes it possible to end up with bytestring docstrings. We fix this here by always assuming utf8 in this case. Previously we assumed ascii, and did it implicitly by letting string.split(u'\n') turn it into unicode, which was no good.

elif isinstance(docstring, str if py3 else unicode):
pass
else:
return []
Copy link
Contributor

@sebastinas sebastinas Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the elif and else really necessary? Or in other words: does the elif really cover all valid cases?

Copy link
Member Author

@thomasballinger thomasballinger Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cases to cover:

Py2 bytes -> decode
Py2 unicode -> nop
Py2 something else (integer etc) -> abort

Py3 bytes -> shouldn't happen, but decode
Py3 bytes -> nop
Py3 something else -> abort

Might be nicer to:

if unicode:
    pass
else:
    try:
        docstring = docstring.decode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question, docstrings should always be unicode in python 3, and in Python 2 they should always be bytestrings. (since we're getting them from pydoc.getdoc, which does this normalization) If we got a unicode string somehow in Python 2 that would be ok, but I don't know how that would happen. If we got a bytestring in Python3, which shouldn't happen, we would try to decode. So this does cover all valid cases, but it covers some extra too.

Now that I see where docstring comes from (pydoc.getdoc) I agree that the else isn't necessary.

The correct thing to do here is to find out the encoding of the source file the docstring comes from, since it doesn't have to be utf8, or at least catch errors here so a bad docstring doesn't crash bpython.

@sebastinas sebastinas merged commit f4f05b2 into master Nov 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants