Patterns¶
Reference
An early version of the patterns algorithm is described in Beniamine (2017). An updated description figures in Beniamine, Bonami and Luís (2021).
The default action for Qumin is to compute patterns only, so these two commands are identical:
/$ qumin data=<dataset.package.json>
/$ qumin action=patterns data=<dataset.package.json>
By default, Qumin will ignore defective lexemes and overabundant forms.
For paradigm entropy, it is possible to explicitly keep defective lexemes:
/$ qumin pats.defective=True data=<dataset.package.json>
For inflection class lattices, both can be kept:
/$ qumin pats.defective=True pats.overabundant.keep=True data=<dataset.package.json>
Warning
Patterns and entropies computed with Qumin 2.0 are not importable in Qumin 3.0 due to a breaking change in the output format. When importing computation results, Qumin 3.0 now expects a path to the metadata.json file, which contains relative paths to the output files.
This script generates alternation patterns. They can be consumed by further Qumin scripts by passing the path to the metadata.json file produced by a computation. It also writes human readable patterns in the patterns/human_readable folder, which are intended for manual inspection.
Values for these keys can be given through the command line, eg:
/$ qumin verbose=True cells=="[ind.prs.1.sg,ind.fut.1.sg]" pats.defective=True data=<dataset.package.json>
Patterns kinds¶
Qumin can compute various kinds of patterns that can be used for entropy calculations. They have alternations and generalized contexts:
edits: Aligned with simple edit distance.phon: Aligned with edit distances based on phonological similarity.
It is recommended to use the default phon in most cases. To avoid relying on your phonological features files for alignment scores, use edit. Only these two are full patterns with generalization both in the context and alternation.
Warning
Additional strategies were implemented for comparison purposes during Sacha Beniamine’s PhD. Support for these has been discontinued after Qumin 2.0.1.
Full reference¶
Pattern options are under the pats namespace. Available pattern options (see also the common options):
- class qumin.utils.config.PatternsConfig(*, kind=Kind.phon, defective=False, gap_proportion=0.4, optim_mem=False, overabundant=<factory>)[source]
Configuration for the
patternsaction.- Parameters:
kind (Kind) – Options are (see docs): phon, edits
defective (bool) – Whether to keep defective entries
gap_proportion (float) – Proportion of the median score used to set the gap score
optim_mem (bool) – Attempt to use a little bit less memory
overabundant (OverabundantPatternsConfig) – Configuration for overabundance
Values for the overabundant keyword:
- class qumin.utils.config.OverabundantPatternsConfig(*, keep=False, freq=True, tags=None)[source]
Configuration for the processing of overabundant forms.
The default configuration for these keys looks like this:
pats:
kind: phon
defective: false
gap_proportion: 0.4
optim_mem: false
overabundant:
keep: false
freq: true
tags: null
See the full Default configuration