FACTORIE supports several styles of modeling, including conditional random fields (undirected graphical models) and generative latent-variable models (directed graphical models). Documented examples are still quite sparse unfortunately. More documentation will be completed in the near future.
Examples currently available in draft form
Note: For the most up-to-date code, please see examples/ in the factorie source.
Undirected graphical models
Examples that will be added in the future
Undirected graphical models
- Linear-chain CRF for word segmentation?
- Coreference of people names with toy data? (entity resolution)
- Coreference of named entity mentions in ACE data? (entity resolution)
- Schema matching (information integration)?
Directed graphical models
- Generating from and fitting Multinomials and Dirichlets?
- Latent Dirichlet allocation?
- Poly-lingual latent Dirichlet allocation?
Quick samples of model definitions in FACTORIE
Linear-chain CRF in Entity-Relationship specification
val model = new Model(
Foreach[Label] { label => Score(label) },
Foreach[Label] { label => Score(label.prev, label, label.token) }
)
Linear-chain CRF in raw imperative specification
val model = new Model(
new TemplateWithDotStatistics1[Label],
new TemplateWithDotStatistics3[Label, Label, Token] {
def unroll1(label:Label) = if (label.hasNext) Factor(label, label.next, label.token.next) else Nil
def unroll2(label:Label) = if (label.hasPrev) Factor(label.prev, label, label.token) else Nil
def unroll3(token:Token) = throw new Error("Token values shouldn't change")
}
)
Smoking, cancer and friends in simple logic
val model = new Model(
// Apriori, you are 10 times more likely not to have cancer
Forany[Person] { p => Not(p.cancer) } * 10,
// If you smoke, you are 2 times more likely to have cancer
Forany[Person] { p => p.smokes ==> p.cancer } * 2.0,
// You are 2 times more likely to have the same smoking habit as each of your friends
Forany[Person] { p => p.friends.smokes <==> p.smokes } * 2,
// If your mother doesn't smoke, for each of your children who doesn't smoke, you are 1.5 times less likely to smoke
Forany[Person] { p => Not(p.mother.smokes) ^ Not(p.children.smokes) ==> Not(p.smokes) } * 1.5,
)
A Generative model: Latent Dirichlet Allocation
Define the variable classes that will hold our model:
object Beta extends SymmetricDirichlet[Word](0.01)
class Topic extends DirichletMultinomial[Word] with MixtureComponent[Topic]
class Z extends MixtureChoice[Topic,Z]
object Alpha extends SymmetricDirichlet[Z](1.0)
class Theta extends DirichletMultinomial[Z]
class Word(s:String) extends EnumObservation(s) with CategoricalOutcome[Word]
class Document(val file:String) extends ArrayBuffer[Word] { var theta:Theta = _ }
Then after data has been read into Documents, define the generative storyline.
val numTopics = 5
val topics = for (i <- 1 to numTopics) yield new Topic ~ Beta
for (document <- documents) {
document.theta = new Theta ~ Alpha
for (word <- document) {
val z = new Z ~ document.theta
word ~ z
}
}