Patterns

Reference

An early version of the patterns algorithm is described in Beniamine (2017). An updated description figures in Beniamine, Bonami and Luís (2021).

The default action for Qumin is to compute patterns only, so these two commands are identical:

/$ qumin data=<dataset.package.json>
/$ qumin action=patterns data=<dataset.package.json>

By default, Qumin will ignore defective lexemes and overabundant forms.

For paradigm entropy, it is possible to explicitly keep defective lexemes:

/$ qumin pats.defective=True data=<dataset.package.json>

For inflection class lattices, both can be kept:

/$ qumin pats.defective=True pats.overabundant.keep=True data=<dataset.package.json>

Warning

Patterns and entropies computed with Qumin 2.0 are not importable in Qumin 3.0 due to a breaking change in the output format. When importing computation results, Qumin 3.0 now expects a path to the metadata.json file, which contains relative paths to the output files.

This script generates alternation patterns. They can be consumed by further Qumin scripts by passing the path to the metadata.json file produced by a computation. It also writes human readable patterns in the patterns/human_readable folder, which are intended for manual inspection.

Values for these keys can be given through the command line, eg:

/$ qumin verbose=True cells=="[ind.prs.1.sg,ind.fut.1.sg]" pats.defective=True data=<dataset.package.json>

Patterns kinds

Qumin can compute various kinds of patterns that can be used for entropy calculations. They have alternations and generalized contexts:

  • edits: Aligned with simple edit distance.

  • phon: Aligned with edit distances based on phonological similarity.

It is recommended to use the default phon in most cases. To avoid relying on your phonological features files for alignment scores, use edit. Only these two are full patterns with generalization both in the context and alternation.

Warning

Additional strategies were implemented for comparison purposes during Sacha Beniamine’s PhD. Support for these has been discontinued after Qumin 2.0.1.

Full reference

Pattern options are under the pats namespace. Available pattern options (see also the common options):

class qumin.utils.config.PatternsConfig(*, kind=Kind.phon, defective=False, gap_proportion=0.4, optim_mem=False, overabundant=<factory>)[source]

Configuration for the patterns action.

Parameters:
  • kind (Kind) – Options are (see docs): phon, edits

  • defective (bool) – Whether to keep defective entries

  • gap_proportion (float) – Proportion of the median score used to set the gap score

  • optim_mem (bool) – Attempt to use a little bit less memory

  • overabundant (OverabundantPatternsConfig) – Configuration for overabundance

Values for the overabundant keyword:

class qumin.utils.config.OverabundantPatternsConfig(*, keep=False, freq=True, tags=None)[source]

Configuration for the processing of overabundant forms.

Parameters:
  • keep (bool) – Whether to keep overabundant entries

  • freq (bool) – Whether to prioritize overabundant forms by frequency (fallback on file order)

  • tags (Any | None) – Tags to prefer when dropping overabundance (fallback on freq)

The default configuration for these keys looks like this:

pats:
  kind: phon
  defective: false
  gap_proportion: 0.4
  optim_mem: false
  overabundant:
    keep: false
    freq: true
    tags: null

See the full Default configuration