Token constructions that defaults to placing it in the special Section that encompasses the whole Document.
Create a Token and also append it to the list of Tokens in the Section.
Create a Token and also append it to the list of Tokens in the Section. There must not already be Tokens in the document with higher stringStart indices. Note that the start and end indices are character offsets into the Document string, not the Section string.
This method should never be called outside Chain.
This method should never be called outside Chain.+= or Chain.insert or Chain.remove
A collection of attributes, keyed by the attribute class.
Return all the word's character subsequences of lengths between min and max.
Return true if the word contains at least one digit.
Return true if any character of the word is lower case.
Return true if any character of the word is upper case.
Return the substring of the original Document string covered by the character indices stringStart to stringEnd.
Return the substring of the original Document string covered by the character indices stringStart to stringEnd. This may be different than the String returned by this.string if the TokenString attribute has been set. (Such substitutions are useful for de-hyphenation, downcasing, and other such modifications.
The Document containing this Token's Section.
The Document containing this Token's Section.
Return true if the character immediately preceding the start of this token is a newline.
Return true if the character immediately preceding the start of this token is a newline. The beginning of the document counts as a newline.
Return true if the character immediately following the end of this token is a whitespace character (such as space, newline, tab, etc)
Return true if the character immediately preceding the start of this token is a whitespace character (such as space, newline, tab, etc)
Return true if the first character of the word is upper case.
Return the lemma of the string contents of the Token, either from its attr[TokenLemma] variable or,if unset, from token.
Return the lemma of the string contents of the Token, either from its attr[TokenLemma] variable or,if unset, from token.string.
Return the ChainLink "n" positions ahead.
Return the ChainLink "n" positions ahead. If this goes past the end of the Chain, return null.
Return the string contents of this Token, either from its specified attr[C], or if unset, directly as a substring of the Document.
Return the 0-start index of this token in its sentence.
Return the 0-start index of this token in its sentence. If not part of a sentence, return -1.
Return true if the character immediately following the end of this token is a newline.
Return true if the character immediately following the end of this token is a newline. The end of the document counts as a newline.
Return the ChainLink "n" positions behind.
Return the ChainLink "n" positions behind. If this goes past the beginning of the Chain, return null.
Just an alias for the "chain" method.
Return the string contents of this Token, either from its attr[TokenString] variable or, if unset, directly as a substring of the Document
Return the string contents of this Token, either from its attr[TokenString] variable or, if unset, directly as a substring of the Document
The character offset into the Document.
The character offset into the Document.string at which this DocumentSubstring is over. In other words, the last character of the DocumentSubstring is Document.string(this.stringEnd-1).
The character offset into the Document.
The character offset into the Document.string at which this DocumentSubstring begins.
Return the Token's string contents as a StringVariable.
Return the Token's string contents as a StringVariable. Repeated calls will return the same Variable (assuming that the attr[TokenString] is not changed).
Returns a string representation of this Token object, including the prefix "Token(" and its starting character offset.
Returns a string representation of this Token object, including the prefix "Token(" and its starting character offset. If instead you want the string contents of the token use the method "string".
Return a string that captures the generic "shape" of the original word, mapping lowercase alphabetics to 'a', uppercase to 'A', digits to '1', whitespace to ' '.
Return a string that captures the generic "shape" of the original word, mapping lowercase alphabetics to 'a', uppercase to 'A', digits to '1', whitespace to ' '. Skip more than 'maxRepetitions' of the same character class.
A word in a document, covering a substring of the Document. A Token is also a ChainLink in a Chain sequence; thus Tokens have "next" and "prev" methods returning neighboring Tokens. Token constructors that include a Section automatically add the Token to the Section (which is the Chain). Token constructors that include a Sentence automatically add the Token to the Sentence and its Section. Token constructors that include a tokenString automatically append the tokenString to the Document's string.