cc.factorie.app.nlp.segment

TokenNormalizer1

class TokenNormalizer1[A <: TokenString] extends DocumentAnnotator

Clean up Token.string according to various standard practices. The aim here is to to put into plain text, the way most people would write an email message, e.g. un-escaped asterisks, plain quote characters, etc.

Linear Supertypes
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. TokenNormalizer1
  2. DocumentAnnotator
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TokenNormalizer1(newTokenString: (Token, String) ⇒ A, normalizeQuote: Boolean = true, normalizeApostrophe: Boolean = true, normalizeCurrency: Boolean = true, normalizeAmpersand: Boolean = true, normalizeFractions: Boolean = true, normalizeEllipsis: Boolean = true, undoPennParens: Boolean = true, unescapeSlash: Boolean = true, unescapeAsterisk: Boolean = true, normalizeMDash: Boolean = true, normalizeDash: Boolean = true, normalizeHtmlSymbol: Boolean = true, normalizeHtmlAccent: Boolean = true, americanize: Boolean = false)(implicit m: Manifest[A])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. val americanize: Boolean

  7. val apostropheRegex: Regex

  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. val currencyRegex: Regex

  11. val dashRegex: Regex

  12. def documentAnnotationString(document: Document): String

    How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format.

    How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format. If there is no document annotation, return the empty string. Used in Document.owplString.

    Definition Classes
    DocumentAnnotator
  13. val ellipsisRegex: Regex

  14. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  16. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  18. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  19. val htmlAccentRegex: Regex

  20. val htmlSymbolMap: HashMap[String, String]

  21. val htmlSymbolRegex: Regex

  22. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  23. val mdashRegex: Regex

  24. def mentionAnnotationString(mention: Mention): String

    Definition Classes
    DocumentAnnotator
  25. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  26. val newTokenString: (Token, String) ⇒ A

  27. val normalizeAmpersand: Boolean

  28. val normalizeApostrophe: Boolean

  29. val normalizeCurrency: Boolean

  30. val normalizeDash: Boolean

  31. val normalizeEllipsis: Boolean

  32. val normalizeFractions: Boolean

  33. val normalizeHtmlAccent: Boolean

  34. val normalizeHtmlSymbol: Boolean

  35. val normalizeMDash: Boolean

  36. val normalizeQuote: Boolean

  37. final def notify(): Unit

    Definition Classes
    AnyRef
  38. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  39. def phraseAnnotationString(phrase: Phrase): String

    Definition Classes
    DocumentAnnotator
  40. def postAttrs: Iterable[Class[_]]

    Definition Classes
    TokenNormalizer1DocumentAnnotator
  41. def prereqAttrs: Iterable[Class[_]]

    Definition Classes
    TokenNormalizer1DocumentAnnotator
  42. def process(document: Document): Document

    Definition Classes
    TokenNormalizer1DocumentAnnotator
  43. def processParallel(documents: Iterable[Document], nThreads: Int = ...): Iterable[Document]

    Definition Classes
    DocumentAnnotator
  44. def processSequential(documents: Iterable[Document]): Iterable[Document]

    Definition Classes
    DocumentAnnotator
  45. def processToken(token: Token): Unit

  46. val quoteRegex: Regex

  47. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  48. def toString(): String

    Definition Classes
    AnyRef → Any
  49. def tokenAnnotationString(token: Token): String

    How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format.

    How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format. If there is no per-token annotation, return null. Used in Document.owplString.

    Definition Classes
    TokenNormalizer1DocumentAnnotator
  50. val undoPennParens: Boolean

  51. val unescapeAsterisk: Boolean

  52. val unescapeSlash: Boolean

  53. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  54. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from DocumentAnnotator

Inherited from AnyRef

Inherited from Any

Ungrouped