tokenizer to use to tokenize the doc. Default is DeterministicTokenizer
dictionary to lookup to check for merge eligibility
if true, other tokens in document are used to check for merge eligibility
How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format.
How the annotation of this DocumentAnnotator should be printed as extra information after a one-word-per-line (OWPL) format. If there is no document annotation, return the empty string. Used in Document.owplString.
How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format.
How the annotation of this DocumentAnnotator should be printed in one-word-per-line (OWPL) format. If there is no per-token annotation, return null. Used in Document.owplString.
concatenates words split by hyphens in the original text based on user-provided dictionary or other words in the same document. It works on the output of the tokenizer. Caution: It modifies the output of the tokenizer by removing some tokens so run this before any other downstream tasks.