Skip to content
This repository was archived by the owner on Mar 9, 2023. It is now read-only.

Add repr for Morpheme/MorphemeList#166

Closed
polm wants to merge 1 commit intoWorksApplications:developfrom
polm:feature/repr
Closed

Add repr for Morpheme/MorphemeList#166
polm wants to merge 1 commit intoWorksApplications:developfrom
polm:feature/repr

Conversation

@polm
Copy link
Contributor

@polm polm commented Sep 30, 2021

This makes it easier to check values when developing interactively. Probably should have been included with #124.

@kazuma-t
Copy link
Collaborator

kazuma-t commented Sep 30, 2021

For object.__repr__(), the Python language reference has the following description.

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).

(from 3.3.1. Basic customization)

This implementation does not seem to follow that purpose. Even if it can't, it is undesirable that it is the same as __str__(), I think.

@polm
Copy link
Contributor Author

polm commented Sep 30, 2021

The current output looks like this:

<sudachipy.morphemelist.MorphemeList at 0x7f9afedf7940>
<sudachipy.morpheme.Morpheme at 0x7f9afdc2c850>

That doesn't follow the purpose in the language reference and also isn't useful for anything as far as I can tell.

@kazuma-t
Copy link
Collaborator

kazuma-t commented Sep 30, 2021

At least, begin, end, dictionary_id, word_id, and is_oov are needed to identify morpheme .

@eiennohito
Copy link
Collaborator

It is impossible to instantiate Morpheme/MorphemeList directly, so I think we need to decide whether their __repr__ should be more useful to developers of SudachiPy or users of SudachiPy.

@kazuma-t
Copy link
Collaborator

kazuma-t commented Oct 1, 2021

How about a format like this,

<Morpheme (猫, 0:3, 0, 571365)>
<MorphemeList [(猫, 0:3, 0, 571365), (が, 3:6, 0, 45393), (ぴらる, 6:15, -1, -1)]>

( {surface}, {begin}:{end}, {dict_id}, {word_id}) (dict_id and word_id are -1 in OOV)

@eiennohito
Copy link
Collaborator

As the reasoning for the format, being able to detect whether the word comes from user dictionaries/system dictionaries or OOV can help to debug and resolve problems with user dictionaries.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants