Documentation‎ > ‎

Examples

FACTORIE supports several styles of modeling, including conditional random fields (undirected graphical models) and generative latent-variable models (directed graphical models). Documented examples are still quite sparse unfortunately. More documentation will be completed in the near future.

Examples currently available in draft form

Note: For the most up-to-date code, please see examples/ in the factorie source.

Undirected graphical models

Examples that will be added in the future

Undirected graphical models

  • Linear-chain CRF for word segmentation?
  • Coreference of people names with toy data? (entity resolution)
  • Coreference of named entity mentions in ACE data? (entity resolution)
  • Schema matching (information integration)?

Directed graphical models

  • Generating from and fitting Multinomials and Dirichlets?
  • Latent Dirichlet allocation?
  • Poly-lingual latent Dirichlet allocation?

Quick samples of model definitions in FACTORIE

Linear-chain CRF in Entity-Relationship specification

  val model = new Model(
    Foreach[Label] { label => Score(label) },
    Foreach[Label] { label => Score(label.prev, label, label.token) }
  )

Linear-chain CRF in raw imperative specification

  val model = new Model(
    new TemplateWithDotStatistics1[Label],
    new TemplateWithDotStatistics3[Label, Label, Token] {
      def unroll1(label:Label) = if (label.hasNext) Factor(label, label.next, label.token.next) else Nil
      def unroll2(label:Label) = if (label.hasPrev) Factor(label.prev, label, label.token) else Nil
      def unroll3(token:Token) = throw new Error("Token values shouldn't change")
    }
  )

Smoking, cancer and friends in simple logic

  val model = new Model(
    // Apriori, you are 10 times more likely not to have cancer
    Forany[Person] { p => Not(p.cancer) } * 10,
     
    // If you smoke, you are 2 times more likely to have cancer
    Forany[Person] { p => p.smokes ==> p.cancer } * 2.0,
     
    // You are 2 times more likely to have the same smoking habit as each of your friends
    Forany[Person] { p => p.friends.smokes <==> p.smokes } * 2,

    // If your mother doesn't smoke, for each of your children who doesn't smoke, you are 1.5 times less likely to smoke
    Forany[Person] { p => Not(p.mother.smokes) ^ Not(p.children.smokes) ==> Not(p.smokes) } * 1.5,
  )

A Generative model: Latent Dirichlet Allocation

Define the variable classes that will hold our model:

  object Beta extends SymmetricDirichlet[Word](0.01)
  class Topic extends DirichletMultinomial[Word] with MixtureComponent[Topic]
  class Z extends MixtureChoice[Topic,Z]
  object Alpha extends SymmetricDirichlet[Z](1.0)
  class Theta extends DirichletMultinomial[Z]
  class Word(s:String) extends EnumObservation(s) with CategoricalOutcome[Word]
  class Document(val file:String) extends ArrayBuffer[Word] { var theta:Theta = _ }

Then after data has been read into Documents, define the generative storyline.

  val numTopics = 5
  val topics = for (i <- 1 to numTopics) yield new Topic ~ Beta
  for (document <- documents) {
    document.theta = new Theta ~ Alpha
    for (word <- document) {
      val z = new Z ~ document.theta
      word ~ z
    }
  }
Subpages (1): Linear Chains