cc.factorie.app.nlp.segment

DeterministicSentenceSegmenter

class DeterministicSentenceSegmenter extends DocumentAnnotator

Segments a sequence of tokens into sentences.

Linear Supertypes
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DeterministicSentenceSegmenter
  2. DocumentAnnotator
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeterministicSentenceSegmenter()

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. val charOffsetBoundary: Int

    If there are more than this number of characters between the end of the previous token and the beginning of this one, force a sentence start.

    If there are more than this number of characters between the end of the previous token and the beginning of this one, force a sentence start. If negative, don't break sentences according to this criteria at all.

  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. val closingContinuationRegex: Regex

    Matches the Token.

    Matches the Token.string of tokens that may extend a sentence, such as quotes, closing parentheses, and even additional periods.

  10. val closingRegex: Regex

    Matches the Token.

    Matches the Token.string of punctuation that always indicates the end of a sentence. It does not include possible additional tokens that may be appended to the sentence such as quotes and closing parentheses.

  11. def documentAnnotationString(document: Document): String

    How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format.

    How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format. If there is no document annotation, return the empty string. Used in Document.owplString.

    Definition Classes
    DocumentAnnotator
  12. var doubleNewlineBoundary: Boolean

    If true every double newline causes a sentence break.

  13. val emoticonRegex: Regex

  14. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  16. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  18. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  19. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  20. def mentionAnnotationString(mention: Mention): String

    Definition Classes
    DocumentAnnotator
  21. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  22. var newlineBoundary: Boolean

    If true, every newline causes a sentence break.

  23. final def notify(): Unit

    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  25. def phraseAnnotationString(phrase: Phrase): String

    Definition Classes
    DocumentAnnotator
  26. val possibleClosingRegex: Regex

    Matches the Token.

    Matches the Token.string of tokens that might possibility indicate the end of a sentence, such as an mdash. The sentence segmenter will only actually create a sentence end here if possibleSentenceStart is true for the following token.

  27. def possibleSentenceStart(s: String): Boolean

    Returns true for strings that probably start a sentence after a word that ends with a period.

  28. def postAttrs: Iterable[Class[_]]

  29. def prereqAttrs: Iterable[Class[_]]

  30. def process(document: Document): Document

  31. def processParallel(documents: Iterable[Document], nThreads: Int = ...): Iterable[Document]

    Definition Classes
    DocumentAnnotator
  32. def processSequential(documents: Iterable[Document]): Iterable[Document]

    Definition Classes
    DocumentAnnotator
  33. val spaceRegex: Regex

    Whitespace that should not be allowed between a closingRegex and closingContinuationRegex for a sentence continuation.

    Whitespace that should not be allowed between a closingRegex and closingContinuationRegex for a sentence continuation. For example: He ran. "You shouldn't run!"

  34. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  35. def toString(): String

    Definition Classes
    AnyRef → Any
  36. def tokenAnnotationString(token: Token): Null

    How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format.

    How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format. If there is no per-token annotation, return null. Used in Document.owplString.

    Definition Classes
    DeterministicSentenceSegmenterDocumentAnnotator
  37. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from DocumentAnnotator

Inherited from AnyRef

Inherited from Any

Ungrouped