AdaGrad does not increase the size of the weight vector while learning. Weight Vector dimensions might increase if there are new features seen while feature extraction from unseen training examples.
Cause:
https://github.com/IllinoisCogComp/lbjava/blob/master/lbjava/src/main/java/edu/illinois/cs/cogcomp/lbjava/learn/AdaGrad.java#L189
Example feature:
discrete MyTestFeature(MyData d) <- {
return d.isCapitalized() ? "YES" : "NO"
}
For this example, weight vector should have size 3 - YES, NO, Bias Term. But exampleFeatures.length is only 1 here.
Compare with implementation of StochasticGradientDescent.