cc.factorie.app.nlp

Token

class Token extends Observation[Token] with ChainLink[Token, Section] with DocumentSubstring with Attr

A word in a document, covering a substring of the Document. A Token is also a ChainLink in a Chain sequence; thus Tokens have "next" and "prev" methods returning neighboring Tokens. Token constructors that include a Section automatically add the Token to the Section (which is the Chain). Token constructors that include a Sentence automatically add the Token to the Sentence and its Section. Token constructors that include a tokenString automatically append the tokenString to the Document's string.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Token
  2. DocumentSubstring
  3. ChainLink
  4. ThisType
  5. Observation
  6. Attr
  7. AbstractChainLink
  8. AnyRef
  9. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Token(s: Sentence, tokenString: String)

  2. new Token(doc: Document, tokenString: String)

  3. new Token(sentence: Sentence, s: Int, e: Int)

  4. new Token(doc: Document, s: Int, e: Int)

    Token constructions that defaults to placing it in the special Section that encompasses the whole Document.

  5. new Token(sec: Section, s: Int, e: Int)

    Create a Token and also append it to the list of Tokens in the Section.

    Create a Token and also append it to the list of Tokens in the Section. There must not already be Tokens in the document with higher stringStart indices. Note that the start and end indices are character offsets into the Document string, not the Section string.

  6. new Token(stringStart: Int, stringEnd: Int)

Type Members

  1. type ThisType = Token

    Definition Classes
    ThisType

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def _setChainPosition(c: Section, p: Int): Unit

    This method should never be called outside Chain.

    This method should never be called outside Chain.+= or Chain.insert or Chain.remove

    Definition Classes
    ChainLink
  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. object attr

    A collection of attributes, keyed by the attribute class.

  9. def between(other: Token): Seq[Token]

    Definition Classes
    ChainLink
  10. def chain: Section

    Definition Classes
    ChainLink
  11. def chainAfter: IndexedSeq[Token]

    Definition Classes
    ChainLink
  12. def chainBefore: IndexedSeq[Token]

    Definition Classes
    ChainLink
  13. def chainHead: Token

    Definition Classes
    AbstractChainLink
  14. def chainLast: Token

    Definition Classes
    AbstractChainLink
  15. def charNGrams(min: Int, max: Int): Seq[String]

    Return all the word's character subsequences of lengths between min and max.

  16. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  17. def containsDigit: Boolean

    Return true if the word contains at least one digit.

  18. def containsLowerCase: Boolean

    Return true if any character of the word is lower case.

  19. def containsUpperCase: Boolean

    Return true if any character of the word is upper case.

  20. def docSubstring: String

    Return the substring of the original Document string covered by the character indices stringStart to stringEnd.

    Return the substring of the original Document string covered by the character indices stringStart to stringEnd. This may be different than the String returned by this.string if the TokenString attribute has been set. (Such substitutions are useful for de-hyphenation, downcasing, and other such modifications.

  21. def document: Document

    The Document containing this Token's Section.

    The Document containing this Token's Section.

    Definition Classes
    TokenDocumentSubstring
  22. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  23. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  24. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  25. def firstInSeq: Token

    Definition Classes
    ChainLink
  26. def followsNewline: Boolean

    Return true if the character immediately preceding the start of this token is a newline.

    Return true if the character immediately preceding the start of this token is a newline. The beginning of the document counts as a newline.

  27. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  28. def getNext: Option[Token]

    Definition Classes
    ChainLink
  29. def getPrev: Option[Token]

    Definition Classes
    ChainLink
  30. def hasFollowingWhitespace: Boolean

    Return true if the character immediately following the end of this token is a whitespace character (such as space, newline, tab, etc)

  31. def hasNext(n: Int): Boolean

    Definition Classes
    ChainLink
  32. def hasNext: Boolean

    Definition Classes
    ChainLinkAbstractChainLink
  33. def hasPrecedingWhitespace: Boolean

    Return true if the character immediately preceding the start of this token is a whitespace character (such as space, newline, tab, etc)

  34. def hasPrev(n: Int): Boolean

    Definition Classes
    ChainLink
  35. def hasPrev: Boolean

    Definition Classes
    ChainLinkAbstractChainLink
  36. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  37. def isCapitalized: Boolean

    Return true if the first character of the word is upper case.

  38. def isDigits: Boolean

  39. def isInSentence: Boolean

  40. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  41. def isPunctuation: Boolean

  42. def isSentenceEnd: Boolean

  43. def isSentenceStart: Boolean

  44. def lemma: TokenLemma

  45. def lemmaString: String

    Return the lemma of the string contents of the Token, either from its attr[TokenLemma] variable or,if unset, from token.

    Return the lemma of the string contents of the Token, either from its attr[TokenLemma] variable or,if unset, from token.string.

  46. def matches(t2: Token): Boolean

  47. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  48. def nerTag: NerTag

  49. def next(n: Int): Token

    Return the ChainLink "n" positions ahead.

    Return the ChainLink "n" positions ahead. If this goes past the end of the Chain, return null.

    Definition Classes
    ChainLinkAbstractChainLink
  50. def next: Token

    Definition Classes
    ChainLinkAbstractChainLink
  51. def nextWindow(n: Int): Seq[Token]

    Definition Classes
    ChainLink
  52. def normalizedString[C <: TokenString](attrClass: Class[C]): String

    Return the string contents of this Token, either from its specified attr[C], or if unset, directly as a substring of the Document.

  53. final def notify(): Unit

    Definition Classes
    AnyRef
  54. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  55. def parse: ParseTree

  56. def parseChildren: Seq[Token]

  57. def parseChildrenLabeled(label: CategoricalValue[String]): Seq[Token]

  58. def parseLabel: ParseTreeLabel

  59. def parseLeftChildren: Seq[Token]

  60. def parseLeftChildrenLabeled(label: CategoricalValue[String]): Seq[Token]

  61. def parseParent: Token

  62. def parseParentIndex: Int

  63. def parseRightChildren: Seq[Token]

  64. def parseRightChildrenLabeled(label: CategoricalValue[String]): Seq[Token]

  65. def posTag: PennPosTag

  66. def position: Int

    Definition Classes
    ChainLinkAbstractChainLink
  67. def positionInSection: Int

    Return the 0-start index of this token in its sentence.

    Return the 0-start index of this token in its sentence. If not part of a sentence, return -1.

  68. def positionInSentence: Int

  69. def precedesNewline: Boolean

    Return true if the character immediately following the end of this token is a newline.

    Return true if the character immediately following the end of this token is a newline. The end of the document counts as a newline.

  70. def prev(n: Int): Token

    Return the ChainLink "n" positions behind.

    Return the ChainLink "n" positions behind. If this goes past the beginning of the Chain, return null.

    Definition Classes
    ChainLinkAbstractChainLink
  71. def prev: Token

    Definition Classes
    ChainLinkAbstractChainLink
  72. def prevWindow(n: Int): Seq[Token]

    Definition Classes
    ChainLink
  73. def section: Section

    Just an alias for the "chain" method.

  74. def sentence: Sentence

  75. def sentenceHasNext: Boolean

  76. def sentenceHasPrev: Boolean

  77. def sentenceNext: Token

  78. def sentencePrev: Token

  79. def string: String

    Return the string contents of this Token, either from its attr[TokenString] variable or, if unset, directly as a substring of the Document

    Return the string contents of this Token, either from its attr[TokenString] variable or, if unset, directly as a substring of the Document

    Definition Classes
    TokenDocumentSubstringObservation
  80. val stringEnd: Int

    The character offset into the Document.

    The character offset into the Document.string at which this DocumentSubstring is over. In other words, the last character of the DocumentSubstring is Document.string(this.stringEnd-1).

    Definition Classes
    TokenDocumentSubstring
  81. val stringStart: Int

    The character offset into the Document.

    The character offset into the Document.string at which this DocumentSubstring begins.

    Definition Classes
    TokenDocumentSubstring
  82. def stringVar: TokenString

    Return the Token's string contents as a StringVariable.

    Return the Token's string contents as a StringVariable. Repeated calls will return the same Variable (assuming that the attr[TokenString] is not changed).

  83. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  84. def toString(): String

    Returns a string representation of this Token object, including the prefix "Token(" and its starting character offset.

    Returns a string representation of this Token object, including the prefix "Token(" and its starting character offset. If instead you want the string contents of the token use the method "string".

    Definition Classes
    Token → AnyRef → Any
  85. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  86. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  87. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  88. def window(n: Int): Seq[Token]

    Definition Classes
    ChainLink
  89. def windowWithoutSelf(n: Int): Seq[Token]

    Definition Classes
    ChainLink
  90. def wordShape(maxRepetitions: Int = 2): String

    Return a string that captures the generic "shape" of the original word, mapping lowercase alphabetics to 'a', uppercase to 'A', digits to '1', whitespace to ' '.

    Return a string that captures the generic "shape" of the original word, mapping lowercase alphabetics to 'a', uppercase to 'A', digits to '1', whitespace to ' '. Skip more than 'maxRepetitions' of the same character class.

Inherited from DocumentSubstring

Inherited from ChainLink[Token, Section]

Inherited from variable.ThisType[Token]

Inherited from Observation[Token]

Inherited from Attr

Inherited from AbstractChainLink[Token]

Inherited from AnyRef

Inherited from Any

Ungrouped