Overview
At its heart FACTORIE is a toolkit for graphical models. All its specific applications, including their data representation, inference and learning methods are built on a small set of common graphical model primitives.
Introduction to Graphical Models
Graphical models are a formalism in which a graph denotes the conditional dependence structure between random variables. The formalism is the marriage between probability theory and graph theory. It provides an elegant framework that combines uncertainty (probabilities) and logical structure (independence constraints) such that complex joint probability distributions over multiple variables that would have otherwise been intractable to represent or manipulate can instead be represented compactly and often manipulated efficiently. Since graphical models can straightforwardly express so many different probabilistic models, they have become a lingua-franca for statistics, machine learning, and data mining.
In graphical models, variables are depicted by the nodes a graph, drawn as circles, and dependencies among variables are depicted by edges, drawn either as directed (with arrows), or undirected (without arrows).
There are two main types of graphical models.
*Directed graphical models (also known as Bayesian networks) represent a joint distribution over random variables by a product of normalized conditional probability distributions (one for each “child” variable, conditioned on the other “parent” variables that are connected by an incoming directed edge). They are convenient generative models when variable values can be generated by an ordered iteration of the graph nodes from “parent” variables to “child” variables.
*Undirected graphical models (also known as Markov random fields) represent a joint distribution over random variables by a product of unnormalized non-negative values (one value for each clique in the graph). They are convenient models for data in which it is not intuitive to impose an ordering on the variables’ generative process. They can represent different patterns of independence constraints than directed models can, and vice versa—neither one is strictly more expressive than the other.
*Factor graphs are a generalization of both directed and undirected graphical models, capable of representing both.
When drawn, factor graphs depict variables as circular nodes in the graph (just as in directed and undirected graphical models), however rather than having edges that connect variables directly to each other, edges instead connect variables to factors (which are drawn as black squares). In other words, variables are connected to other variables only through factors. The variables connected to a factor are called the “neighbors” of the factor. A factor graph is a bipartite graph connecting variables and factors.
Factor graphs represent a joint distribution over random variables by a product of (normalized or unnormalized) non-negative values—one value for each factor in the graph. The factors can be understood as “compatibility functions” that take as input the values of the variables which they neighbor, and outputing a “compatibility score” such that higher scores indicate the combination of values is more likely, and lower scores indicate the combination of values is less likely. A score of 0 indicates the combination is impossible.
Directed graphical models can be represented by having one factor for each (child) variable, where the factor is also connected to the other (parent) variables on whose values we must condition when generating the child’s value. The factor’s scores are normalized probabilities.
Undirected graphical models can be represented with one factor per clique (or other arbitary subsets of clique members). The factors’ scores are unnormalized non-negative values, and thus in order to obtain a normalized joint probability distribution over all the variables in the model, we must normalize the product of factor scores by the proper normalization value (called the “partition function”).
In practice, most implementations of graphical models use the log of the score values so that they can be summed rather than multiplied, and a larger dynamic range of scores can be represented with limited floating-point precision in hardware. Naturally these log-scores can then vary from negative infinity (indicating an impossible combination of values in the neighbors) to positive infinity (indicating an obligatory combination of values in the neighbors). Since the use of log-scores is so common (and used in our implementation) in the remainder of this documentation we will now simply use the term “score” to refer to log-scores.
Factor graphs in FACTORIE
The FACTORIE library defines Scala classes for representing the elements of a factor graph. Usually the names of our Scala classes and methods are simply the standard names used in English machine-learning or statistics vocabulary.
In this section we further describe factor graphs while introducing
FACTORIE class and method names indicated in mono-space font. Class
(and trait) names are capitalized. Method names begin with a
lowercase letter.
We provide here a high-level introduction to FACTORIE’s identifiers. Detailed explanations with code examples are given in subsequent chapters.
FACTORIE modular design philosophy
FACTORIE is explicitly designed to support independent, non-intertwined definitions of
- data representations with variables,
- models (value preferences, distributions, dependencies) with factors,
- inference methods, and
- parameter estimation.
This separation provides FACTORIE users with great flexibilty to mix-and-match different choices in each of these four dimensions. For example, the data representation for some task may be written just once, while different choices of dependencies are explored for this task—there is no need to re-write the data representation for each experiment with a different model. Similarly, inference methods may be selected separately from the model definition.
Representing data with variables and values
The data constituting any machine learning task are represented by
variables. A variable can be understood as a container for a value.
A variable may also have certain relations to other variables (for
example, by being part of a sequence of variables corresponding to a
sequence of part-of-speech tags). FACTORIE has many classes for
representing different types of variables. The root of this class
hierarchy is the abstract trait Var.
An Assignment represents a mapping from variables to values, and can
be treated as a function. The value of a variable v1 in assignment
a2 can be obtained by a2(v1).
Note that an assignment stores a single value for a variable, not a probability distribution over values. (Of course FACTORIE has extensive facilities for representing distributions over values, but these are not intrinsic to a variable definition or assignment—rather they are the purview of inference, because different inference methods may choose to use different distribution representations for the same variable.)
There are multiple concrete sub-classes of Assignment. A
MutableAssignment can be modified. A TargetAssignment returns the
gold-standard (e.g. human-labeler-provided) value for a
variable when such a labeled value is available. An Assignment1 is
a compact efficient representation for the value assignment of a
single variable (similarly Assignment2..4 also exist). A
HashMapAssignment is an arbitrarily-sized mutable assignment with
storage based on a HashMap.
Since we want to efficiently support the common case in which many
variables have relatively static value or unique assignments (because
they are observed, or because they are being sampled in a single MCMC
inference process) FACTORIE also has the GlobalAssignment object.
There may exist multiple instances of the other assignment classes,
each with its own values; however, there is only one
GlobalAssignment. For efficiency the GlobalAssignment values are
not stored in the GlobalAssignment, but rather in the corresponding
variable instances themselves.
All variables have a value method that returns the variable’s
currently assigned global value. In other words for variable v1
GlobalAssignment(v1) is equivalent to v1.value. Henceforth when
we refer to “a variable’s value” we mean “a variable’s
globally-assigned value”.
All variables also have methods for testing the equality (===) or
inequality (!==) of their global values. Thus v1 === v2 will
return true if variables v1 and v2 have the same values, and is a
simple shorthand for v1.value == v2.value. (Traditional equality
(with == or equals) on variables always tests the equality of the
system identity of the two variables, not their values.)
If some non-global assignment a2 does not contain a value for v1
then looking up the value with a2(v1) may have different behavior in
different assignment subclasses. For example, a HashMapAssignment
will throw an error, but an Assignment1 (or Assignment2..4) will
simply return the variable’s globally-assigned value.
Many types of variables
FACTORIE has different subclasses of Var for holding values of
different types. There are a large number of traits and classes for
variables. The following naming convensions make their interpretation
easier. All abstract traits and classes end in Var, while concrete
classes end in Variable. Almost all classes ending in Variable
have an abstract Var counterpart that does not necessarily specify
the mechanism by which the variable stores its value in memory.
The type of the value is named immediately before the Var or
Variable suffix (for example, IntegerVariable has values of type
Int). Further modifiers may appear as prefixes.
All variables also have a member type Value indicating the Scala
type returned by its value method. In some variables the Value
type is bound, but not assigned. In other cases it is assigned.
Some variables have mutable value (inheriting from the trait
MutableVar). Almost all classes ending in Variable are mutable.
Mutable variable values can be set with the := method. For example,
if v1 has integer values, we can set the value of v1 to 3 by v1
:= 3.
The following is a selection of FACTORIE’s most widely-used variable classes.
IntegerVariable- has value with Scala type Int.
DoubleVariable - has value with Scala type Double.
TensorVariable - has value of type Tensor, which is defined in the FACTORIE linear algebra package
cc.factorie.la. This variable class makes no restrictions on the dimensionality of the tensor, nor the lengths of its dimensions.VectorVariable - has value of type Tensor1, which is a one-dimensional Tensor (also traditionally called a “vector”). In addition each
VectorVariableis associated with aDiscreteDomain(further described below) whose size matches the length of the variable’s vector value.DiscreteVariable extends VectorVar - has a value among N possible values, each of type
DiscreteValue, and each associated with an integer 0 through N-1. ThisDiscreteValueinherits fromTensor1and can also be interpreted as a “one-hot” vector with value 1.0 in one position and 0.0 everywhere else. Given aDiscreteValue dv1its integer value can be obtained bydv1.intValue. The length of the vector (in other words, the value of N) can be obtained bydv1.length.CategoricalVariable[A] extends DiscreteVar - has value among N possible values, each of type
CategoricalValue[A](which inherits fromDiscreteValue), each associated with an integer 0 through N-1, and also associated with a “category” (often of type String). These variables are often used for representing class labels and words, when a mapping from String category names to integer indices is desirable for efficiency (such as indexing into an array of parameters). Given aCategoricalValue[String] cv1its integer value can be obtained bycv1.intValueand its categorical (String) value can be obtained bycv1.categoryValue. Its value may be set with an integer:cv1 := 2or set by category string:cv1 := "sports". (The mapping between Strings and integers is stored in aCategoricalDomain, which is described below.)CategoricalVectorVariable[A] extends VectorVar - has value of type Tensor1, which is a one-dimensional Tensor. In addition each
CategoricalVectorVariableis associated with aCategoricalDomain[A], which stores a mapping between values of type A (e.g.String) and integers. Thus each position in the vector is associated with a category. This variable type is useful for storing bag-of-words counts, for example.BooleanVariable extends CategoricalVar[Boolean] - has one of two possible values, each of type
BooleanValue(which inherits from CategoricalValue[Boolean]), one of which is associated with integer 0 and boolean value false, the other of which is associated with integer value 1 and boolean value true. Given aBooleanValue bv1its integer value can be obtained bybv1.intValueand its boolean value can be obtained bybv1.booleanValue.MassesVariable extends TensorVar - has value
Masses, which are Tensors constrained to contain non-negative values.Massesare useful as the parameters of Dirichlet distributions.ProportionsVariable extends MassesVar - has value
Proportions, which areMassesconstrained to sum to 1.0.Proportionsare useful as the parameters of discrete or multinomial distributions.RealVariable extends VectorVar - has a single real scalar value, stored in an object of type
RealValue(which inherits from Tensor1). This variable is similar toDoubleValuein that it stores a scalar value, however since its value type inherits from Tensor1, it can be used in dot products.
All of the above variable classes have constructors in which their
initial value may be set. For example, new IntegerVariable(3) will
create a new variable whose initial value is 3.
Some of the above variable types have specializations for the case in
which human-labeled gold-standard target values are known. These
specializations inherit from the LabeledVar trait which provides a
method targetValue that returns this gold-standard target value.
The target value is stored separately and may be different from the
variable’s current global assignment value. For example, if i1 is a
LabeledIntegerVariable we can determine if the variable is currently
assigned to its gold-standard target value by evaluating the boolean
expression i1.value == i1.targetValue. (The method valueIsTarget
is a convenient short-hand for this test.) In almost all cases the
target value is initialized to be the value provided in the variable
constructor.
Common LabeledVar sub-classes include:
LabeledDiscreteVariable LabeledCategoricalVariable[A] LabeledBooleanVariable LabeledStringVariable LabeledIntegerValue LabeledDoubleValue LabeledRealValue
All the above variable types are common in existing graphical models. However, FACTORIE also has random variables for representing less traditional values. Although these may seem like peculiar value types for a graphical model, they nonetheless can be scored by a factor, and are often useful in FACTORIE programs.
StringVariable- has value with Scala type String.
SeqVariable[A] - has value of type
Seq[A], that is, a sequence of objects of typeA.ChainVariable[A] extends SeqVar[A] - has value of type
Seq[A], but the elementsAmust inherit fromChainLinkwhich havenextandprevmethods.SpanVariable[A] extends SeqVar[A] - has value of type
Seq[A], and is a subsequence of aChainVar[A].SetVariable[A] - has value of type
Set[A], that is, an unordered set of objects of typeA.RefVariable[A] - has value of type A. In other words, it is a variable whose value is a pointer to a Scala object.
EdgeVariable[A,B] - has value of type
Tuple[A,B], that is a pair of objects: a “source” of typeAand a “destination” of typeB.ArrowVariable[A,B] extends EdgeVar[A,B] - like
EdgeVariablehas value of typeTuple[A,B], but only the “destination” is mutable, while the “source” is immutable.
Variable domains
Some (but not all) FACTORIE variables have a corresponding domain
object that contains information about the set of values the variable
may take on. Such variables inherit from VarWithDomain which
provides a domain method that returns the variable’s domain.
For example, DiscreteVariable subclasses have a corresponding
DiscreteDomain whose main role is to provide its size—the number
of possible values the variable can take on, i.e. from 0 to N-1.
CategoricalVariable[A] subclasses have a corresponding
CategoricalDomain[A] (inheriting from DiscreteDomain) which
provides not only its size but also a one-to-one mapping between the
categories and the integers 0 to N-1. For example, if cd1 is a
CategoricalDomain[String] then cd1.category(3) returns the String
corresponding to index 3 in the domain. cd1.index("apple") returns
the integer index corresponding to the category value "apple". If
"apple" is not yet in the domain, it will be automatically added, assigned the
next integer value, and the size of the domain will be increased by
one (assuming that cd1 has not previously been frozen).
Undoable changes
FACTORIE also provides the ability to undo a collection of changes to variable values. (This is especially useful in Metropolis-Hastings inference, in which a proposed change to variable values is made, but may be rejected, requiring a reversion to previous values.)
A Diff instance represents a change in value to a single variable.
It has methods undo and redo, as well as the variable method which
returns the variable whose value was changed.
A DiffList stores an ordered list of Diffs.
The := method for assigning the value of a MutableVar was described above.
An alternative method for changing the value of a MutableVar is
set, which, in addition to the new value, also takes a DiffList as
an implicit argument. The set method will then automatically
construct a Diff object and append it to the given DiffList.
Expressing preferences with factors and models
A collection of variables and their values represents a possible state of the world. To express a degree of preference for one possible world over another—or a probability distribution over possible worlds—we need a mechanism for scoring the different worlds, such as a collection of factors in a factor graph.
Factors
In FACTORIE, factors are represented by instances of the trait
Factor. All Factors have certain basic methods. The
numVariables method returns the number of variables neighboring the
factor. The list of neighboring variables themselves is returned by
the method variables. Calling currentScore calculates the factor’s
score for the neighboring variables’ values in the current global
assignment. You can obtain the score under alternative assignments by
calling assignmentScore(a1), where a1 is some Assignment.
Subclasses of the Factor trait include Factor1, Factor2,
Factor3, and Factor4, which have one, two, three or four
neighboring variables respectively. (FACTORIE does not currently
support factors with more than four neighbors; if necessary these
could be added in the future, however, thus far we have not found them
necessary because (a) the use of TensorVariable, SeqVariable and
other composite-valued variables handle cases in which many values are
necessary, or (b) the use of a “var-args” ContainerVariable helps
similarly.)
The only abstract method in Factor1 (and its other numbered
relations) is score, which takes as arguments the values of its
neighbors and returns a Double.
Some Factor subclasses define a statistics method that takes as
arguments the values of its neighbors and returns the sufficient
statistics of the factor. (Often the sufficient statistics will
simply be the combination of neighbor values, but this method provides
users with an opportunity to instead perform various useful
transformations from values to sufficient statistics.) In these cases
the score method is then defined to be the result of the
statisticsScore method, whose only argument is the statistics, thus
ensuring that the factor score can be calculated using only the
sufficient statistics.
For example, in the class DotFactor2 the abstract method
statistics must return sufficient statistics of type Tensor. The
statisticsScore method is then defined to be the dot-product of these
sufficient statistics with another Tensor of scoring parameters, which
is returned by the abstract method weights.
The DotFactorWithStatistics2 class inherits from DotFactor2 but
requires that both its neighbors inherit from TensorVar. It then
defines its statistics method to return the outer-product of the
tensor values of its two neighbors.
The naming convention is that classes having the suffix WithStatistics
define the statistics method for the user. Most other classes have
an abstract statistics method that must be defined by the user.
Templated Factors
In many situations we want to model “relational data” in which the variables and factors appear in repeated patterns of relations to each other.
For example, in a hidden Markov model (HMM) there is a sequence of observations (random variables with observed values) and a corresponding sequence of hidden states (random variables whose value is the identity of the state of a finite-state machine that was responsible for generating that observation). Each hidden state variable has “next” and “previous” relations in the chain of hidden states. Each also has a corresponding observation variable (the one that generated it, and occurs in the same sequence position).
The standard HMM is “time invariant” (sometimes called “stationary”), meaning that the hidden state-transition probabilities and the observation-from-state generation probabilities do not depend on their position in the sequence. Thus, although each transition needs its own factor in the factor graph (because each factor has different neighboring variables), each of these factors can share the same scoring function.
In FACTORIE factors that share the same sufficient statistics
function, score function, and other similar attributes are said to
belong the same “family” of factors. The trait Family defines a
Factor inner class such that its instances rely on methods in the
Family class for various shared functionality.
For example, the DotFamily provides a weights method for accessing
the dot-product parameters shared by all member factors.
Analogous to the numbered Factor subclasses there are numbered
families, Family1, Family2, Family3, Family4, each defining an
inner factor classes with the corresponding number of neighbors.
A Template is a subclass of Family that is also able to
automatically construct its factors on the fly in response to requests
to “unroll” part of the graphical model. A Template’s unroll
method takes as input a variable, and uses its knowledge about the
“pattern of relational data” to create and return the Template’s
factors that neighbor the given variable.
Note that for factors with more than one neighbor this requires traversing some
relational pattern to find the other variables that are also
neighbors of these factors. This knowledge is flexibly encoded in the
implementation of numbered “unroll” methods. For example Template2
implements its unroll method in terms of two abstract methods:
unroll1 and unroll2. The unroll1 method takes as input a
variable with type matching the first neighbor of the factor, and is
responsible for creating and returning a (possibly empty) collection of
templated factors that touch this variable—finding the second
neighbor in each case by traversing some relational structure among
the variables.
Implementing position-specific unroll methods may seem cumbersome,
but in most cases these method definitions are extremely succinct, and
moreover the ability to specify the relational pattern through
arbitrary Turing-complete code provides a great deal of flexibility.
Nonetheless, in the future, we plan to implement a simpler template
language on top of this unroll framework.
As with Family there are numerous specialized subclasses of
Template. For example a DotTemplateWithStatistics3 is a template
for a factor with three neighboring variables, each of which must
inherit from TensorVar; the template defines a factor scoring
function by the dot-product between the outer-product of the three
variables and the template’s weights. The only abstract methods are
unroll1, unroll2, unroll3 and weights.
Models
For the sake of flexible modularity, in FACTORIE, a “model” does not
include the data, or inference method, or learning mechanism—a model
is purely a source of factors. The key method of the Model class is
factors which takes as input a collection of variables, and returns
a collection of Factors that neighbor those variables. (This
definition allows efficient implementation of large relational models
in which factors are created on-the-fly in response to specific
requests instead of needing to be “unrolled” entirely in advance.)
The Model trait also provides alternative factors methods for
obtaining the factors that neighbor an individual variable, a Diff
or a DiffList. There are also methods providing a filtered
collection of factors, such as factorsOfClass, factorsOfFamily or
factorsOfFamilyClass. For example, the later may be used to obtain
only those factors that are members of the DotFamily class, and thus
conducive to gradient ascent parameter estimation.
Model also has convenience methods that return the total score of
all factors neighboring the method arguments. For example, its
assignmentScore method returns the sum of the factors’ scores
according to the given assignment. Similarly currentScore does so
for the global assignment.
There are several concrete subclasses of Model. A TemplateModel’s
factors come entirely from a collection of Templates. By contrast,
an ItemizedModel stores a set of factors created beforehand. A
CombinedModel returns the union of the sets of factors returned by
multiple submodels. There are also specialized models, such as the
ChainModel for linear-chain conditional random fields, with
specialized inference and learning algorithms.
Parameters and Weights
The situation in which templated factor scores are calculated by dot-products is so common (and so relevant to our typical parameter estimation procedures) that FACTORIE provides special support for this case.
In a templated model the parameters are a set of weights tensors, one
per factor template. The Weights trait is a TensorVar that holds
the parameters of one family. For example, DotFamily has a
weights method that returns the Weights used to calculate its
factor scores.
A WeightsSet is then a collection of Weights, typically holding
all the parameters of a model (across multiple families) based on dot-products.
To declare that an object has a WeightsSet holding parameters we use
the Parameters trait, which defines a parameters method returning
the WeightsSet. (Thus this trait should not be thought of as
“being” parameters, but “holding” parameters.) Its typical use-case
is Model with Parameters, declaring that the object is a source of
factors, and also has a WeightsSet available through its
parameters method.
During parameter estimation one typically needs additional tensors
having the same structure as the parameters, but that hold other
quantities, such as gradients or expectations. For example the
blankSparseMap method on WeightsSet will create a “blank”
(initially zero-filled) WeightsMap, which is a collection of tensors
having the same structure as the WeightsSet that created it. This
WeightsMap can then be filled with a gradient through some
computation on training data. It is called a “map” because the
parameter-value-containing Weights objects are used as “keys” to
obtain the gradient tensor corresponding to that Weights object.
In summary, both WeightsSet and WeightsMap make available a
collection of Tensors. In WeightsSet the tensors are stored inside
the constituent Weights (TensorVars) themselves. In WeightsMap
the tensors are stored in a map internal to the WeightsMap; they are
accessible by lookup using the corresponding Weights as keys.
The abstract superclass of both WeightsSet and WeightsMap is
TensorSet. Although a TensorSet is not a single tensor but a
collection of tensors, it has a variety of useful methods allowing it
to be treated somewhat like a single tensor. These methods include
oneNorm and twoNorm for calculating norms, += for incremental
addition, dot for the sum of dot-product of all constituent tensors,
and different to determine if the element-wise distance between two
WeightsSets is larger than a threshold.
Searching for solutions with inference
Once we have a way to represent a possible world (variables, with their possible values) and a way to represent preferences over different possible worlds (factors provided by a model) we can address the problem of searching for the most preferred possible world (MAP inference) or finding the probabilities of subsets of all possible worlds (marginal inference).
In FACTORIE terminology, inference is a process that takes as input a
list of variables and a Model and produces as output a Summary, which
contains a collection of Marginals and which also has a
marginalization constant, accessible as logZ.
Marginals
A Marginal represents a joint distribution over a subset of the
variables. For example, a DiscreteMarginal2 specifies two
DiscreteVars, and a Proportions2 which contains their marginal
distribution. A RealGaussianMarginal1 represents a univariate
Gaussian distribution with a specified mean and variance. Naturally a
marginal that results from maximum aposteriori inference will simply
have a spike with all probability on the maximum-scoring values. The
most widely used marginals are those over a single variable,
inheriting from Marginal1 because marginals over multiple variables
more typically represented by FactorMarginals.
A FactorMarginal is marginal associated with a Factor (note,
however, that for technical reasons it does not inherit from the
Marginal trait). These are mainly used for learning, providing a
tensor of sufficient statistics which are used to calculate parameter
gradients.
Its marginal distribution may cover all the neighboring variables of the
factor, or, in the case in which inference varied some but not all of
the neighbors, the marginal distribution will be over a subset of the
neighboring variables of the factor. Its tensorStatistics method
returns the expected sufficient statistics of the factor, which always
covers all the neighboring variables of the factor; this is useful for
parameter estimation.
There is infrastructure for automatically creating a FactorMarginals
from Marginals that cover a subset of the neighbors of a Factor.
For example if we want a FactorMarginal for a Factor2 in which one
of the factor’s two neighbors is varying but the other is treated as
constant, we can use a DiscreteMargina1 to construct a
DiscreteMarginal1Factor2 whose sufficient statistics will cover both
neighboring variables.
A request for inference on a model results in a Summary which is a
collection of Marginals .
Summary
When inference is run the result is a Summary which is a container
for (a) the collection of variables that varied during inference, (b)
a collection of the corresponding single-variable Marginal1s, (c) a
collection of FactorMarginals for the factors used during inference,
and (d) a normalization constant, logZ. Using a Summary you can
look up a Marginal1 given a variable, and can look up a
FactorMarginal given a Factor.
Several specialized implementations of Summary are provided. For
example MAPSummary puts all of its probability on one Assignment
of the variables’ values. Because it also contains a collection of
Factors and FactorMarginals, it can also be used for (max-margin)
parameter estimation. Other examples include BPSummary for holding
the intermediate and final states of belief propagation, and
IndependentDiscreteSummary for holding an IID distribution over
discrete variables.
Inference
The primary interface for making inference requests is in the method
infer in the Infer trait. This method takes two or three
arguments: (1) the collection of non-constant variables whose values
should be varied and whose marginal distributions should be
calculated; (2) the Model whose factors will score these different
assignments; and optionally, (3) a Summary which may indicate
additional variables that should be marginalized (and the functional
form of that marginalization), or which may alternatively provide
information about the functional form to be used for the marginals of
the first argument variables.
The Maximize trait is a subtype of Infer for object that implement
maximum aposteriori (MAP) inference. It adds a method maximize,
which calls infer and then assigns the variables to their maximizing
values.
Generally it is easy to implement your own inferencer and make it
conform to this API (which then immediately enables its use for
parameter estimation and other tasks). Typically much of the existing
infrastructure for Summarys and Marginals will be helpful in your
implementation.
FACTORIE comes with a large collection of already-implement marginal
and MAP inference procedures, whose names start with InferBy
MaxmimizeBy respectively.
There are many variants of belief propagation (BP) for both
marginalization and maximization on graphical models of discrete
variables. These include InferByBPChain, an efficient
implementation of forward-backward in linear-chain structured models;
InferByBPTree for exact inference tree-shaped graphical models;
InferByBPLoopy for approximate BP inference by on graphical models
with loops; InferByBPLoopyTreewise for more robust approximate BP
inference with tree-reparameterization [Wainwright, Jaakkola, Wilsky,
2001].
Other variational inference methods exist as well, including
InferByMeanField, an implementation of marginal inference by mean
field for discrete variables; MaximizeByMPLP, which uses dual
coordinate ascent on the LP relaxation for MAP inference.
FACTORIE also provides infrastructure for Monte Carlo inference. This
includes a generic Sampler as well as its special-case subclass, a
ProposalSampler which selects among a collection of Proposal
objects (each of which provides a score and a DiffList for undoing
and redoing the proposed changes). Examples of ProposalSampler
include the GibbsSampler and MHSampler (Metropolis-Hastings), and
IteratedConditionalModes which is the maximization analogue of Gibbs
sampling.
These underlying sampling mechanisms are wrapped into objects that
implement the Infer interface, providing InferByGibbsSampling,
InferByMeanField, MaximizeByIteratedConditionalModels,
MaximizeByMPLP, etc.
Finally, Expectation-Maximization (EM) and dual decomposition are
implemented outside of the Infer API, by the EMInferencer and
DualDecomposition classes.
There is currently limited support for inference in that directly leverages the special case of directed graphical models. We have preliminarily implemented a system of storing and selecting among a collection of “recipes” for handling different patterns of random variables and factor types (similar to BUGS), but only a few such “recipes” have been implemented so far. One of those cases, however, provides an extremely efficient collapsed Gibbs sampler for latent Dirichlet allocation (LDA)—the FACTORIE implementation of LDA is actually faster than the widely-used MALLET implementation. See the Section on “Topic Modeling” below.
FACTORIE does not currently provide automated methods of selecting the best inference method given a model and collection of variables. This is an area of future work.
Estimating parameters
(To be provided from Luke’s text.)
Application packages
In additional to its general support for factor graphs, FACTORIE provides pre-build models and other infrastructure for particular application areas.
Classification
Regression
Clustering
This package is not yet implemented, but will be in the future.
