Skip to content

Conversation

@nipunsadvilkar
Copy link
Owner

@nipunsadvilkar nipunsadvilkar commented Oct 25, 2019

char_span(optional) parameter will return TextSpan object having "sentence_str" & start_character_offset (int), end_character_offset (int) of respective sentence within original text

Example:

import pysbd
text = "My name is Jonas E. Smith. Please turn to p. 55."
seg = pysbd.Segmenter(language="en", clean=False, char_span=True)
print(seg.segment(text))
# [TextSpan(sent='My name is Jonas E. Smith.', start=0, end=26),
# TextSpan(sent='Please turn to p. 55.', start=27, end=48)]

…to sentence-char-span

* 'master' of https://github.com/nipunsadvilkar/pySBD:
  🔖  Bump up version to v0.1.3
  📝  Update readme
  🚨  Fix DeprecationWarning: invalid escape sequence
  ✅  Add regression tests for issues
  🐛 ♻️  BugFix & Refactor replace_multi_period_abbreviations
  🚑  Update lists_item_replacer > strip char list
  🐛  Fix abbreviation_replacer
  🐛  Fix lists_item_replacer
…to sentence-char-span

* 'master' of https://github.com/nipunsadvilkar/pySBD:
  🔖  Bump up version to v0.1.4
  ✏️  typo
  ✏️  typo
  🐛 ✨ ✅  Handle intermittent punctuations
  🔇  Remove debugging print statements
@nipunsadvilkar nipunsadvilkar merged commit a2bb451 into master Oct 25, 2019
@nipunsadvilkar nipunsadvilkar deleted the sentence-char-span branch October 25, 2019 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants