A simple concrete implementation of Section.
A Document holds a String containing the original raw string contents of a natural language document to be processed.
A sequence of DocumentAnnotators packaged as a single DocumentAnnotator.
Mapping from annotation class (usually stored in an attr) and the DocumentAnnotor from which it can be obtained.
A Cubbie for serializing a Document, with separate slots for the Tokens, Sentences, and TokenSpans.
Used as an attribute on Document to hold the document's name.
A portion of the string contents of a Document.
A Map from annotation class to DocumentAnnotator that provides that annotation.
A part of a Document, delineated by character offsets into the Document's string, and which can hold a sequence of Tokens and a sequence of Sentences.
A span of Tokens making up a sentence within a Section of a Document.
Command-line options available on all NLP model trainers.
A word in a document, covering a substring of the Document.
A sub-sequence of Tokens within a Section (which is in turn part of a Document).
A mutable collection of TokenSpans, with various methods to returns filtered sub-sets of spans based on position and class.
An immutable collection of TokenSpans, with various methods to returns filtered sub-sets of spans based on position and class.
Used as an attribute of Token when the token.
A factory for creating DocumentAnnotatorPipelines given requirements about which annotations or which DocumentAnnotators are desired.
A command-line driver for DocumentAnnotators.
Used as a stand-in dummy DocumentAnnotator in the DocumentAnnotatorMap when an annotation was added but not by a real DocumentAnnotator.
Given a sequence of strings describing labels in IOB format, such as O I-PER I-LOC B-LOC I-LOC O I-ORG, (where I, B prefixes are separated by a dash from the type suffix) return a sequence of tuples indicating span start, length and label suffix, such as (3, 2, "LOC").