Skip to content

v1.1.0

Compare
Choose a tag to compare
@chschroeder chschroeder released this 01 Oct 10:50
· 387 commits to main since this release

This release adds a conda package, more convenient imports, and improves many aspects of the classifcation functionality. Moreover, one new query strategy and three stopping criteria have been added.

Added

General

  • Small-Text package is now available via conda-forge.
  • Imports have been reorganized. You can import all public classes and methods from the top-level package (small_text):
    from small_text import PoolBasedActiveLearner
    

Classification

  • All classifiers now support weighting of training samples.
  • Early stopping has been reworked, improved, and documented (#18).
  • Model selection has been reworked and documented.
  • [!] KimCNNClassifier.__init()__: The default value of the (now deprecated) keyword argument early_stopping_acc has been changed from 0.98 to -1 in order to match TransformerBasedClassification.
  • [!] Removed weight renormalization after gradient clipping.

Datasets

  • The target_labels keyword argument in __init()__ will now raise a warning if not passed.
  • Added from_arrays() to SklearnDataset, PytorchTextClassificationDataset, and TransformersDataset to construct datasets more conveniently.

Query Strategies

Stopping Criteria

Deprecated

  • small_text.integrations.pytorch.utils.misc.default_tensor_type() is deprecated without replacement (#2).
  • TransformerBasedClassification and KimCNNClassifier:
    The keyword arguments for early stopping (early_stopping / early_stopping_no_improvement, early_stopping_acc) that are passed to __init__() are now deprecated. Use the early_stopping
    keyword argument in the fit() method instead (#18).

Fixed

Classification

  • KimCNNClassifier.fit() and TransformerBasedClassification.fit() now correctly
    process the scheduler keyword argument (#16).

Removed

  • Removed the strict check that every target label has to occur in the training data.
    (This is intended for multi-label settings with many labels; apart from that it is still recommended to make sure that all labels occur.)