News

Factorie 1.0.0-M3 Released

posted Jan 25, 2013, 3:59 PM by Sameer Singh   [ updated Jan 25, 2013, 4:07 PM ]

This is the third milestone release for Factorie 1.0. Detailed changelog attached.

New in version 1.0.0-M3:

  • Documentation
    • improved existing tutorials
    • new tutorial on Inference and Learning
    • better TUI
    • better comments and error messages
    • Parser Demo
    • site can be generated at the users’ end
  • Models and Templates
    • support for feature hashing
    • Massive renaming of Variables and Domains
  • NLP
    • Classifier based POS tagger
    • added port of ClearNLP tokenizer/segmenter
    • Faster Bibtex parser
    • REST API for Parsers
  • Inference
    • support efficient inference for ChainModels
    • Sampler can return a DiffList of all the changes
    • bugfixes in MHSampler
    • BP logZ implemented to enable likelihood learning
  • Optimization and Training
    • Removed redundant SampleRank
    • Added Pegasos. Pseudo-likelihood, Contrastive Divergence, StructSVM, AdaGrad
    • new ClassifierTrainer to support all types of losses, trainers and optimizers
    • better multi-threaded support
    • bugfixes and efficiency improvements
  • Tensors
    • speed enhacements and bugfixes
    • more operations implemented
    • new tests for Tensors
  • Serialization
    • all new serialization based on Cubbies

Factorie 1.0.0-M2 released

posted Oct 27, 2012, 10:18 PM by Sameer Singh   [ updated Oct 27, 2012, 10:32 PM ]

This is the second milestone release for Factorie 1.0. Detailed changelog attached.
New in version 1.0.0-M2:
---

* Documentation
        - markdown based website, the source for which is checked into the repository
        - Tutorial on Domains
        - more assertions throughout the code (including tutorials)
        - better Tutorial prettifier

* Models and Templates
        - Factors can provide statistics and scores on any Assignment and valueTensors
        - trait Model independent of context, ModelWithContext[C] can unroll given any context

* NLP
        - Abstracted dependency parser prediction for easily dropping in alternative classifiers.
        - Bootstrapping for improved dependency parser training.

* Inference
        - BPSummary is more efficient, includes an abstract version

* Optimization and Training
        - Pieces are now Examples, Learners are Trainers
        - MaxlikelihoodExample is efficient in computing constraints
        - SampleRankExample replaces old trainer, almost as efficient

* Classifiers
    - Added DecisionTree, AdaBoost, SVM classifiers in app.classify

* Tensors
        - Filled in more of the missing cases in Tensors
        - Fixed indexing bugs in a few Tensor types
        - OuterTensors that efficiently represent the outer product between Tensors

* Serialization
        - gzip support

Factorie 1.0.0-M1 released

posted Oct 11, 2012, 1:06 PM by Sameer Singh   [ updated Oct 11, 2012, 1:13 PM ]

This is the first milestone release for Factorie 1.0. This version comes with many core API changes, complete rewrite of the factorie.la package, reimplemented version of BP, modification to the optimization package, and so on. Detailed changelog attached.
New in version 1.0.0-M1:
* Models and Templates
  - All templates are now Models
  - Models are now parameterized by the type of things they can score
  - It is possible to write code that does not deduplicate factors

* NLP
  - new Ontonotes Loader
  - new Nonprojective Dependency parser

* Inference
  - Summary class now maintains the marginals, and is common to Samplers and BP
  - Reimplementation of BP to be more efficient

* Optimization & Training
  - efficient L2-regularized SVM training
     - integration with app.classify
  - support for parallel batch and online training with a Piece API
  - support for Hogwild (including Hogwild SampleRank)

* Tensors
  - all new la package that replaces the earlier Vector classes with Tensors
  - Tensors can be multi-dimensional, with implementations that independently choose sparsity/singleton for each dimension
  - weights and features now use Tensors

* Serialization
  - Serialization exists in a different class

* Misc
  - Added Tutorials to walkthrough model construction
  - Cleaned examples so that they work (added a test that makes sure they do)

Factorie 0.10.2 released

posted May 11, 2012, 12:35 PM by Sameer Singh   [ updated May 11, 2012, 12:59 PM ]

This release comes with a number of enhancements to the inference techniques, a developed NLP package, a flexible persistence layer (Cubbie), and a novel hierarchical model. Detailed changelog attached.
Changelog:
New in version 0.10.2:
* NLP
  - Customized forward-backward and viterbi for chain models
  - changes to the coreference data structures that support hierarchical models
  - new data loaders
  - models can be loaded from JARs (POS model in IESL Nexus)
  - initial dependency parser

* BP
  - Refactoring to be faster and cleaner interface, with bugfixes
  - Caching of scores and values
  - MaxProduct works even when multiple MAP states
  - TimingBP to compare performance of the different variants of BP in the codebase
  - maxMarginal with threshold, to support PR curves
  - some initial parallelization

* Max likelihood training
  - convenience constructors for selecting which families to update
  - pieces can use families for inference that are not updated

* Trainer that uses Stochastic gradient descent

* Cubbie
  - new united interface for serialization/persistence (including mongodb support)

* Hierarchical Coref Model
  - added model that supports arbitrarily deep and wide hierarchy of entites, aka Wick, Singh, McCallum, ACL 2012

* Gzip saving/loading of models
* Data loaders for bibtex, dblp, etc.
* Better support for limitedValues and sparse domains on factors
* Code cleanup, including deletion of inner/outer factors 

Factorie 0.10.1 Released

posted Feb 9, 2012, 2:46 PM by Sameer Singh   [ updated Oct 29, 2012, 5:29 PM ]

After a long delay, the latest version of Factorie has been released. Numerous major features have been added, and significant renames and refactoring has been performed. 
Here's an incomplete changelog:

New in version 0.10.1:

* Many renames, new features and refactors; the list below is partially complete.

* Initial support for sparse value iteration in factor/families

* Data representation for app.nlp like Tokens, ParseTrees, Spans, Sentences, etc.

* Initial version for POS, NER, within-doc coref for app.nlp

* Additional vectors that mix sparse and dense representations (SparseOuterVector) in factorie.la

* Added Families that represent sets of factors. Templates are a type of Family now.

* Initial support for MaxLikelihood and Piecewise Training using the new BP framework

* Added a more flexible, modular BP framework

* DiscreteVector and CategoricalVector

The old names "DiscretesValue", "DiscretesVariable", etc were
deemed too easily misread (easy to miss the little "s" in the middle)
and have been renamed "DiscreteVectorValue", "DiscreteVectorVariable",
etc.

* Factors independent of Templates

* Models independent of Templates

* Redesigned cc.factorie.generative package





New in version 0.10.0:

* Variable 'value' methods:

All variables must now have a 'value' method with return type
'this.Value'. By default this type is Any. If you want to override
use the VarAndValueType trait, which sets the covariant types
'VariableType' and 'ValueType'. 'Value' is magically defined from
these to be psuedo-invariant.

The canonical representation of DiscreteVariable (and
CategoricalVariable) values used to be an Int. Now it is a
DiscreteValue (or CategoricalValue) object, which is a wrapper around
an integer (and its corresponding categorical value). These objects
are created automatically in the DiscreteDomain (or
CategoricalDomain), and are guaranteed to be unique for each integer
value, and thus can be compared by pointer equality.

For example, if 'label' is a CategoricalVariable[String]
label.value is a CategoricalValue.
label.intValue == label.value.index, is an integer
label.categoryValue == label.value.category, is a String



* Discrete variables and vectors

DiscreteValues has been renamed DiscretesValue. Similarily there are
now classes DiscretesVariable, CategoricalsValue and
CategoricalsVariable. These plural names refer to vector values and
their variables. For example, CategoricalsVariable is a superclass of
the BinaryFeatureVectorVariable.

The singular DiscreteValue, DiscreteVariable, CategoricalValue and
CategoricalVariable hold single values (i.e. which could be mapped to
single integers), but are subclasses their plural counterparts, with
values that are singleton vectors.

The domain of the plural types (i.e. vectors, not necessarily
singleton vectors) are DiscretesDomain and CategoricalsDomain. The
length of these vectors are determined by an inner DiscreteDomain or
CategoricalDomain. Hence to create a domain for vectors of length 10:

new DiscretesDomain {
val dimensionDomain = new DiscreteDomain { def count = 10 }
}



* TrueSetting renamed to TargetValue

Now that all variables have a 'value', the name 'setting' is
deprecated. Also, "true" and "truth" were deemed confusable with
boolean values, and are now deprecated. The preferred alternative is
"target". Hence, the "TrueSetting" trait has been renamed
"TargetValue", and various methods renamed:
setToTruth => setToTarget
valueIsTruth => valueIsTarget
trueIntValue => targetIntValue



* Domains:

Previously there was a one-to-one correspondence between variable
classes and domains; the variable looked up its domain in a global
hashtable whose keys were the variable classes. Furthermore Domain
objects were often created for the user auto-magically. This scheme
lacked flexibility and was sometimes confusing. The one-to-one
correspondence has now been removed. The 'domain' method in Variable
is now abstract. Some subclasses of Variable define this method, such
as RealVariable; others still leave it abstract. For example, in
subclasses of DiscreteVariable and CategoricalVariable you must define
the 'domain' method. In these cases you must also create your domain
objects explicitly. Thus we have sacrificed a little brevity for
clarity and flexibility. Here is an example of typical code for
creating class labels:

object MyLabelDomain extends CategoricalDomain[String]
class MyLabel(theValue:String) extends CategoricalVariable(theValue) {
def domain = MyLabelDomain
}
or
class MyLabel(theValue:String, val domain = MyLabelDomain) extends CategoricalVariable(theValue)

The type argument for domains used to be the variable class; now it is
the 'ValueType' type of the domain (and its variables).

Templates now automatically gather the domains of the neighbor
variables. VectorTemplates also gather the domains of their
statistics values. [TODO: Discuss the dangers of this automatic
mechanism and consider others mechanisms.]



* Template statistics:

Previously the constructor arguments of Stat objects were Variables.
They have now been changed to Variable *values* instead. Furthermore,
whereas the old Template.statistics method took as arguments a list
of variables, the new Template.statistics method takes a "Values"
object, which is a simple Tuple-style case class containing variable values.

For example, old code:
new Template2[Label,Label] extends DotStatistics1[BooleanVariable] {
def statistics(y1:Label, y2:Label) =
Stat(new BooleanVariable(y1.intValue == y2.intValue)
}
might be re-written as:
new Template2[Label,Label] extends DotStatistics1[BooleanValue] {
def statistics(values:Values) = Stat(values._1 == values._2)
}



* VectorTemplate

VectorStatistics1, VectorStatistics2, VectorStatistics3 used to take
VectorVar type arguments. They now take DiscretesValue type
arguments. The method 'statsize' has been renmed
'statisticsVectorLength' for clarity.



* Generative modeling package

The probability calculations and sampling routines are no longer
implemented in the variable, but in templates instead. Each
GeneratedVar must have a value "generativeTemplate" and a method
"generativeFactor". Many changes have been made to the generative
modeling package, but they are not yet finished or usable. The code
is being checked in now in order to facilitate others' work on the
undirected models.

1-5 of 5