Changelog

Qumin follows the semver principles for versioning. This changelog only refers to MAJOR and MINOR versions.

Development

  • [features]

  • [bugfix]

  • [other]

Version 3.3

  • [features]
    • Add option checkSegments to disable the segment check, which can be slow on large datasets.

  • [bugfix]
    • Phonological systems of sound are now represented internally as Inventory instances (rather than a single static Inventory class).

    This fixes issue #103, allowing for parallel computations on windows and mac. - Increase lattice computations speed (in some cases, all lexemes were kept, instead of the microclasses only).

  • [other]
    • Enable pandas>=3.0.0 and numpy>=2.4.

    • Handle the #MISSING# tag (same usage as #DEF#) introduced in paralex 2.3.

    • Higher verbosity in lattice computations.

    • Fix documentation for features in entropy.

Hotfixes

  • 3.3.1: Explicitly convert incidence table to str dtype to pass the tests.

  • 3.3.2:
    • Fix regression: Qumin only took the first forms table in case of multipart dataset;

    • Fix an issue with the sorting of overabundant forms.

  • 3.3.3:
    • Restore multiprocessing (broken when sound inventories became large)

    • Minor fix for the frequency of forms with identical sound shapes.

    • Fix RAM consumption leak (broken in 3.3)

    • Cleanup in Patterns code

Version 3.2

  • [features]
    • Add option to take token frequencies of patterns and predictors into account and change keywords for token frequencies.

    • Add computation of probability of success besides entropy.

    • Allow predictability computations for overabundant paradigms.

    • Expose onehot representation of patterns through the to_onehot function & faster implementation.

  • [other]
    • Move default config from a YAML file to a proper OmegaConf StructuredConfig and autogenerate the CLI documentation.

    • Improve documentation of qumin.lattice.lattice.ICLattice

    • Allow lexemes argument overwrite in downstream scripts.

  • [deprecation] (removal planned in 4.0.0)
    • Actions H and ent_heatmap are replaced with pred and pred_heatmap

    • Entropy options are now under the namespace pred (insead of entropy)

    • Configuration keywords that expect lists will no longer allow strings in place of a list containing one item.

Hotfixes

  • 3.2.1: fix n-pred computations (didn’t work since 3.0.0)

Version 3.1

  • small UX improvements on lattice scripts.

  • Add an option to pass a list of lexeme ids to use for the next computation (instead of all lexemes).

  • More consistent import of paradigms / patterns.

  • Modernize the build process: move to pyproject.toml and remove setup.cfg.

  • Default to cpus=1 due to incompatibility with Mac & Windows.

  • Add option to autosort cells in heatmaps, according to result similarity.

  • rewrite docs and change docs theme.

Hotfixes

  • 3.1.1: Add readme to pypi

  • 3.1.2: Rst tweaks to render readme on pypi

Version 3.0

  • Moved the project to gitlab !

  • Implement parallelisation for finding patterns and finding applicable patterns. See cpus config option.

  • [breaking] Switch patterns management to long format everywhere.

  • [breaking] Change management of imports/exports. Outputs from Qumin are now shipped as a Frictionless DataPackage. To import them, a path to the computation metadata.json must be provided.

  • Addition of frequencies:
    • Read frequencies from as much sources as possible in the Paralex package (Frequencies class).

    • Weight cells based on the predictor-target pair frequency.

  • Paralex compliance changes:
    • Usage of the form_id introduced in Paralex.

    • Sounds file in conformity with Paralex (Segs -> sound_id)

  • User interface improvements:
    • Change the format of human readable patterns to a more readable markdown export

    • Add keywords pos, sample_cells, sample_lexemes keyword to filter paradigms.

    • Prevent Matplotlib font manager from spamming the log in debug mode.

    • Improve and extend visualisations of Entropy distributions.

  • Sampling management changes:
    • most_freq=False replaced by force_random=True.

    • Addition of a seed option to determinise sampling. By default, sample by frequency.

    • Change sample for sample_lexemes and add sample_cells with the same behaviour for cells.

    • Removal of overabundant forms is now by default done by frequency (pats.overabundant.freq=True), and can optionally prioritize some tags (pats.overabudant.tags="[standard_form,preferred_form]").

  • [breaking] Removal:
    • Support for sound table “Seg.” column.

    • Bipartite entropy computations

    • All alternation algorithms except phon and edits (former patternsPhonsim and patternsLevenshtein).

    • Patterns evaluation with action=eval.

  • And some various smaller bug fixes !

Hotfixes

  • 3.0.1: Fixes errors when dataset had 1-phoneme long words, and the phoneme had multiple graphemes.

  • 3.0.2: Fixes two bugs:
    • process killed by kernel when run with cell frequencies, due to generating crossproduct of categories (issue #92).

    • Fix bug occuring with debug, where index of predicted frequencies were duplicated.

  • 3.0.3: Fixes generated forms having extra spaces due to not resegmenting anymore (issue #93)

  • 3.0.4: Fixes broken cell probabilities with cell samples (issue #96)

  • 3.0.5: Fixes misalignment in patterns dfs due to merging/unmerging of cells (issue #93 again).

  • 3.0.6: Moves integration tests to makefile to use less minutes (issue #97).

  • 3.0.7: Fix unwanted reordering of cells, leading to shuffled entropy results.

  • 3.0.8: Fixes a few bugs in entropy visualizations.

  • 3.0.9: Remove tokens_freq.cells from CLI and always compute weighted entropies if available. Store the results in metadata.json.

  • 3.0.10:
    • Ensure the Metadata class (qumin.utils.Metadata) can find the current run directory, when Hydra is not in use, e.g. in custom scripts. See the class documentation for additional details.

    • Fix broken features in entropy computations.

    • Various small fixes and better warnings.

  • 3.0.11: Fix another source of extra spaces/empty strings in patterns (issue #101)

  • 3.0.12 to 14: Fix Frequencies expecting the dataset necessarily has cells and lexemes tables (Qumin should run even with just sounds and forms).

  • 3.0.15: Fix broken visualisations when no cells table and no order provided.

  • 3.0.16: Rewrite some checks when importing patterns.

  • 3.0.17: Fix broken microclass heatmaps.

  • 3.0.18: Fix inf cell name interpreted as a numeric value.

  • 3.0.19: On windows, no multiprocess. Otherwise, cap cpus at 10. Windows integration tests. Ensure utf8 console output.

  • 3.0.20/21: Fix PyPi readme display.

  • 3.0.22: Fix lattice computations.

  • 3.0.23: Fix reading of paradigms from multiple tables (frictionless multipath resources).

  • 3.0.24: Fix metadata error when no html export of lattices is possible due to mpld3 not being installed.

  • 3.0.25: In CI/CD and Pypi, limit python to 3.13 – this is due to Hydra which doesn’t work with 3.14.

  • 3.0.26: Typos in the publication list

  • 3.0.27: Fix html export of lattices

Version 2.0

  • Support for the Paralex standard.

  • Automatic generation of heatmaps for entropy computations.

  • Add a cells keyword to filter paradigms on cells.

  • Several bugfixes

  • Removal:
    • Support for wide paradigms.

Version 1.1

  • Several bugfixes

Version 1.0

  • Initial release