Different segmentation with Spacy and when using pySBD directly

Firstly thank you for this project - I was lucky to find it and it is really useful

I seem to have found a case where the segmentation is behaving differently when run within the Spacy pipeline and when run using pySBD directly.  I stumbled on it with my own text where a sentence after a previous sentence that was in quotes was being lumped together. I looked through the [Golden Rules](https://s3.amazonaws.com/tm-town-nlp-resources/golden_rules.txt) and found this wasn't expected and then noticed that even with the text in one of your tests it acts differently in Spacy.

To reproduce run these two bits of code:

````python
from pysbd.utils import PySBDFactory
nlp = spacy.blank('en')
nlp.add_pipe(PySBDFactory(nlp))
doc = nlp("She turned to him, \"This is great.\" She held the book out to show him.")
for sent in doc.sents:
    print(str(sent).strip() + '\n')
````
> She turned to him, "This is great." She held the book out to show him.

````python
import pysbd
text = "She turned to him, \"This is great.\" She held the book out to show him."
seg = pysbd.Segmenter(language="en", clean=False)
#print(seg.segment(text))
for sent in seg.segment(text):
    print(str(sent).strip() + '\n')
````
> She turned to him, "This is great."
> 
> She held the book out to show him.

The second way is the desired output (based on the rules at least)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different segmentation with Spacy and when using pySBD directly #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Different segmentation with Spacy and when using pySBD directly #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions