Predictability¶
Reference
An early version of this software was used in Bonami and Beniamine (2016), and a more recent one in Beniamine, Bonami and Luís (2021). The latest implementation (including token frequencies and probability of success) is described in Bouton and Bonami (2026).
By default, this will start by computing patterns. To work with pre-computed patterns, pass the path to the pattern computation metadata with patterns=<path/to/metadata.json>.
Warning
Patterns and entropies computed with Qumin 2.0 are not importable in Qumin 3.0 due to a breaking change in the output format. When importing computation results, Qumin 3.0+ expects a path to the metadata.json file, which contains relative paths to the output files.
Predictability measures from one cell
/$ qumin action=pred data=<dataset.package.json>
Entropies for other number of predictors:
/$ qumin action=pred pred.n=2 data=<dataset.package.json>
/$ qumin action=pred pred.n="[2,3]" data=<dataset.package.json>
Warning
With n and N>2 the computation can get quite long on large datasets, and it might be better to run Qumin on a server. In addition, the probability of success is not supported when N>1.
Predictability measures with known features
Predicting with known lexeme-wise features (such as gender or inflection class) is also possible. This feature was used in Pellegrini (2023). To use features, pass the name of any column(s) from the lexemes table:
/$ qumin action=pred pred.features=inflection_class patterns=<metadata.json> data=<dataset.package.json>
/$ qumin action=pred pred.features="[inflection_class,gender]" patterns=<metadata.json> data=<dataset.package.json>
Using token frequencies to weight results
Qumin allows to weight results at several levels using type or token frequencies::
/$ qumin action=pred data=<dataset.package.json> pred.token_freq.patterns=True pred.token_freq.predictors=True
For more details on predictability measures and weighting, turn to the predictability How-To.
Full reference¶
Predictability measure options are under the pred namespace. Available options (see also the common options):
- class qumin.utils.config.PredictabilityConfig(*, vis=True, n=<factory>, features=None, importResults=None, token_freq=<factory>)[source]
Configuration for entropy calculations.
- Parameters:
vis (bool) – Whether to create a heatmap of the metrics and of interpredictability zones.
n (List[int]) – Compute entropy for prediction from with n predictors.
features (Any | None) – Feature column in the Lexeme table. Features will be considered known in conditional probabilities: P(X~Y|X,f1,f2…)
importResults (str | None) – Import previous entropy computation results. with any file, use to compute entropy heatmap with n-1 predictors, allows for acceleration on nPreds entropy computation.
token_freq (TokenFreqConfig) – Whether to use token frequencies for…
Values for the token_freq keyword:
- class qumin.utils.config.TokenFreqConfig(*, patterns=False, predictors=False, overabundant=False, cells=False)[source]
Whether to use token frequencies for…
The default configuration for these keys looks like this:
pred:
vis: true
'n':
- 1
features: null
importResults: null
token_freq:
patterns: false
predictors: false
overabundant: false
cells: false
See the full Default configuration