Note: This page is currently obsolete. See the tutorials for the correct description.
Root of the variable class hierarchy is cc.factorie.Variable.
Naming conventions: Except for cc.factorie.Variable, Variable means mutable. Observation means immutable. The names Var and Varscould be either mutable or immutable. The plural Vars indicates that objects of the class might actually hold multiple values of that type (as inBinaryVectorVariable). The singular Var indicates that the object only holds one value, such as a single integer or single real number.
Each class of Variable is associated with its own Domain, a representation of the set of valid values for variables of that type. The domain of variables of class Token extends EnumVariable[String] is accessible by Domain[Token]. The default Domain class provides little functionality. An important subclass is CategoricalDomain, which stores a bi-directional mapping between valid values (categories) of its variables and their integer indices 0 to N. A CategoricalDomain is also a scala.Seq containing its constituent categories. Thus, for example, since CategoricalVariables have a CategoricalDomain, you can print all unique Token strings by Domain[Token].foreach(println(_)).
A GeneratedVariable is a variable that knows is "parent" source. A Parameter knows its "children". These variable classes encapsulate not only their value type, but also the distribution from which they were generated. Following the convention in statistics (unlike computer science) they are named after their parent distribution. Hence the value of a Dirichlet variable is a sequence of floating-point that sum to one, and which was generated from a Dirichlet distribution.
Mixture models. Trait MixtureComponent and MixtureChoice.
A factor in a factor graph measures the "compatibility" of values in its neighboring variables. This compatibility is expressed as a non-negative real-valued number, which corresponds to an unnormalized log-probability. The score of an entire factor graph is the sum of the scores of all its factors.
In many cases useful factor graphs have multiple factors with the same cardinality and types of variable neighbors and also share the same parameters. A "factor template" efficiently captures these common attributes. It is a template, or "generator" of individual factors neighboring particular variables. A factor template consists of (1) a description of the arbitrary relationship among its variable neighbors, (2) a sufficient statistics function that maps those neighbors to the statistics necessary to return a real-valued score (and optionally a vector of sufficient statistics), (3) an aggregator for multiple statistics of the same template, (4) a function mapping those aggregated statistics to a real-valued score, and (5) optionally, the parameters used in the function to calculate that score; (alternatively the score may be calculated in some fixed way without learned parameters).
In FACTORIE the "description of the arbitrary relationship among its variable neighbors" can be defined in several alternative ways. There is support for an entity-attribute-relationship language that can be used to describe relations among variables. Factor templates with boolean sufficient statistics can also be defined using this entity-attribute-relationship language plus formulas in first-order logic. But most flexibly, you can use a Turing-complete language (actually the full power of Scala) to define the relationship among a template's variable neighbors.
Probability distribution between two alternative possible worlds can be calculated by...
Efficiently score only the changes to a possible world...
The DiffList and its management...
Finding the factors that touch changed variables, and the other variable neighbors of those factors...
Given one changed variable, the unroll methods find the other variables neighbors for the factor template, construct one or more Factors, and return them...
Steps of scoring a change to the model
Types of factor templates