cc.factorie.app.nlp.segment

PhraseTokenizer

class PhraseTokenizer extends DocumentAnnotator

A tokenizer which will merge existing tokens if they are from one of the phrases given.

Efficiently uses a trie-like data structure to simulate the finite automaton for tokenization. The behavior is that if there is a long and a short phrase with the same prefix the longer one will be picked greedily.

This version gets all attributes from the last token in the phrase.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. PhraseTokenizer
  2. DocumentAnnotator
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PhraseTokenizer(phrases: Iterable[Seq[String]], mode: PhraseTokenizerMode = PhraseTokenizerModes.ADD_SEPARATELY)

    phrases

    The set of phrases to be picked.

    mode

    The mode. If ADD_SEPARATELY the new sections are only added to the attribute. If ADD_TO_SECTIONS the new sections are added to the document. IF REPLACE_SECTIONS the existing sections in the document are replaced.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def documentAnnotationString(document: Document): String

    How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format.

    How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format. If there is no document annotation, return the empty string. Used in Document.owplString.

    Definition Classes
    DocumentAnnotator
  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  15. def mentionAnnotationString(mention: Mention): String

    Definition Classes
    DocumentAnnotator
  16. val mode: PhraseTokenizerMode

    The mode.

    The mode. If ADD_SEPARATELY the new sections are only added to the attribute. If ADD_TO_SECTIONS the new sections are added to the document. IF REPLACE_SECTIONS the existing sections in the document are replaced.

  17. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  20. def phraseAnnotationString(phrase: Phrase): String

    Definition Classes
    DocumentAnnotator
  21. def postAttrs: Seq[Class[PhraseSectionList]]

    Definition Classes
    PhraseTokenizerDocumentAnnotator
  22. def prereqAttrs: Seq[Class[Token]]

    Definition Classes
    PhraseTokenizerDocumentAnnotator
  23. def process(document: Document): Document

    Definition Classes
    PhraseTokenizerDocumentAnnotator
  24. def processParallel(documents: Iterable[Document], nThreads: Int = ...): Iterable[Document]

    Definition Classes
    DocumentAnnotator
  25. def processSequential(documents: Iterable[Document]): Iterable[Document]

    Definition Classes
    DocumentAnnotator
  26. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  27. def toString(): String

    Definition Classes
    AnyRef → Any
  28. def tokenAnnotationString(token: Token): Null

    How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format.

    How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format. If there is no per-token annotation, return null. Used in Document.owplString.

    Definition Classes
    PhraseTokenizerDocumentAnnotator
  29. val trie: PhraseTrie

  30. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from DocumentAnnotator

Inherited from AnyRef

Inherited from Any

Ungrouped