Predictability¶

Reference

An early version of this software was used in Bonami and Beniamine (2016), and a more recent one in Beniamine, Bonami and Luís (2021). The latest implementation (including token frequencies and probability of success) is described in Bouton and Bonami (2026).

By default, this will start by computing patterns. To work with pre-computed patterns, pass the path to the pattern computation metadata with patterns=<path/to/metadata.json>.

Warning

Patterns and entropies computed with Qumin 2.0 are not importable in Qumin 3.0 due to a breaking change in the output format. When importing computation results, Qumin 3.0+ expects a path to the metadata.json file, which contains relative paths to the output files.

Predictability measures from one cell

/$ qumin action=pred data=<dataset.package.json>

Entropies for other number of predictors:

/$ qumin action=pred  pred.n=2 data=<dataset.package.json>
/$ qumin action=pred  pred.n="[2,3]" data=<dataset.package.json>

Warning

With n and N>2 the computation can get quite long on large datasets, and it might be better to run Qumin on a server. In addition, the probability of success is not supported when N>1.

Predictability measures with known features

Predicting with known form-wise features (such as syllable count) or lexeme-wise features (such as gender or inflection class) is also possible. Lexeme-wise features was used in Pellegrini (2023) and form-wise features in Bouton (2026, in press). Form-wise features can be used to hard code complex phonological properties that Qumin fails to infer by itself. To use features, pass the name of any column(s) from the lexemes or forms table:

/$ qumin action=pred  pred.features.lexemes="[inflection_class,gender]" patterns=<metadata.json> data=<dataset.package.json>
/$ qumin action=pred  pred.features.forms=tone patterns=<metadata.json> data=<dataset.package.json>

Warning

Patterns and entropies computed with Qumin 2.0 are not importable in Qumin 3.0 due to a breaking change in the output format. When importing computation results, Qumin 3.0 now expects a path to the metadata.json file, which contains relative paths to the output files.

Using token frequencies to weight results

Qumin allows to weight results at several levels using type or token frequencies::

/$ qumin action=pred data=<dataset.package.json> pred.token_freq.patterns=True pred.token_freq.predictors=True

For more details on predictability measures and weighting, turn to the predictability How-To.

Full reference¶

Predictability measure options are under the pred namespace. Available options (see also the common options):

class qumin.config.config.PredictabilityConfig(*, vis=True, n=<factory>, features=<factory>, importResults=None, exportApplicable=False, export_log=False, token_freq=<factory>)[source]

Configuration for entropy calculations.

Parameters:

vis (bool) – Whether to create a heatmap of the metrics and of interpredictability zones.
n (List[int]) – Compute entropy for prediction from with n predictors.
features (FeaturesConfig) – Feature column in the Lexeme table. Features will be considered known in conditional probabilities: P(X~Y|X,f1,f2…)
importResults (str | None) – Import previous entropy computation results. with any file, use to compute entropy heatmap with n-1 predictors, allows for acceleration on nPreds entropy computation.
exportApplicable (bool) – whether to export applicable patterns for further entropy computations with different settings.
export_log (bool) – whether to export a full human readable markdown log of intermediate calculations (competition classes of alternation patterns and their respective probabilities)
token_freq (TokenFreqConfig) – Whether to use token frequencies for…

Values for the token_freq keyword:

class qumin.config.config.TokenFreqConfig(*, patterns=False, predictors=False, overabundant=False, cells=False)[source]

Whether to use token frequencies for…

Parameters:

patterns (bool) – The probability of the patterns.
predictors (bool) – The probability of the predictor classes and of the predictor forms.
overabundant (bool) – The weighting of overabundant cellmates
cells (bool) – The weighting of the measures across different cells.

The default configuration for these keys looks like this:

pred:
  vis: true
  'n':
  - 1
  features:
    forms: null
    lexemes: null
  importResults: null
  exportApplicable: false
  export_log: false
  token_freq:
    patterns: false
    predictors: false
    overabundant: false
    cells: false

See the full full-config