FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference. Key features:
The structure of generative models can be expressed as a program that describes the generative storyline. The structure undirected graphical models can be specified in an entity-relationship language, in which the factor templates are expressed as compatibility functions on arbitrary entity-relationship expressions; alternatively, factor templates may also be specified as formulas in first-order logic. However, most generally, data can be stored in arbitrary data structures (much as one would in deterministic programming), and the connectivity patterns of factor templates can be specified in a Turing-complete imperative style. This usage of imperative programming to define various aspects of factor graph construction and operation is an innovation originated in FACTORIE; we term this approach imperatively-defined factor graphs. The above three methods for specifying relational factor graph structure can be mixed in the same model.
FACTORIE has been successfully applied to various tasks in natural language processing and information integration, including
The current recommended FACTORIE source code is version 1.0.0-M3, available for download here. You can also obtain our latest code changes through the Mercurial repository. Although pre-1.0, it is already extremely useful and quite stable. You can also browse our list of outstanding issues.
Factorie has been released under the Apache License 2.0, and is free to use for commercial or academic purposes. However, please acknowledge its use with a citation:
Andrew McCallum, Karl Schultz, Sameer Singh. "FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs".
Neural Information Processing Systems (NIPS), 2009.
Research and development of FACTORIE is supported in part by the UMass Center for Intelligent Information Retrieval; in part by Google, in part by the National Science Foundation under NSF grant #IIS-0803847, #IIS-0326249 and #CNS-0551597; in part by Army prime contract number W911NF-07-1-0216 and University of Pennsylvania subaward number 103-548106; and in part by SRI International subcontract #27-001338 and ARFL prime contract #FA8750-09-C-0181. Any opinions, findings and conclusions or recommendations expressed in this material are the authors' and do not necessarily reflect those of the sponsors.