Changelog¶
Qumin follows the semver principles for versioning. This changelog only refers to MAJOR and MINOR versions.
Development¶
[features]
[bugfix]
[other]
Version 3.3¶
- [features]
Add option
checkSegmentsto disable the segment check, which can be slow on large datasets.
- [bugfix]
Phonological systems of sound are now represented internally as Inventory instances (rather than a single static Inventory class).
This fixes issue #103, allowing for parallel computations on windows and mac. - Increase lattice computations speed (in some cases, all lexemes were kept, instead of the microclasses only).
- [other]
Enable pandas>=3.0.0 and numpy>=2.4.
Handle the
#MISSING#tag (same usage as#DEF#) introduced in paralex 2.3.Higher verbosity in lattice computations.
Fix documentation for features in entropy.
Hotfixes¶
3.3.1: Explicitly convert incidence table to str dtype to pass the tests.
- 3.3.2:
Fix regression: Qumin only took the first forms table in case of multipart dataset;
Fix an issue with the sorting of overabundant forms.
- 3.3.3:
Restore multiprocessing (broken when sound inventories became large)
Minor fix for the frequency of forms with identical sound shapes.
Fix RAM consumption leak (broken in 3.3)
Cleanup in Patterns code
Version 3.2¶
- [features]
Add option to take token frequencies of patterns and predictors into account and change keywords for token frequencies.
Add computation of probability of success besides entropy.
Allow predictability computations for overabundant paradigms.
Expose onehot representation of patterns through the to_onehot function & faster implementation.
- [other]
Move default config from a YAML file to a proper OmegaConf
StructuredConfigand autogenerate the CLI documentation.Improve documentation of
qumin.lattice.lattice.ICLatticeAllow
lexemesargument overwrite in downstream scripts.
- [deprecation] (removal planned in 4.0.0)
Actions
Handent_heatmapare replaced withpredandpred_heatmapEntropy options are now under the namespace
pred(insead ofentropy)Configuration keywords that expect lists will no longer allow strings in place of a list containing one item.
Hotfixes¶
3.2.1: fix n-pred computations (didn’t work since 3.0.0)
Version 3.1¶
small UX improvements on lattice scripts.
Add an option to pass a list of lexeme ids to use for the next computation (instead of all lexemes).
More consistent import of paradigms / patterns.
Modernize the build process: move to pyproject.toml and remove setup.cfg.
Default to cpus=1 due to incompatibility with Mac & Windows.
Add option to autosort cells in heatmaps, according to result similarity.
rewrite docs and change docs theme.
Hotfixes¶
3.1.1: Add readme to pypi
3.1.2: Rst tweaks to render readme on pypi
Version 3.0¶
Moved the project to gitlab !
Implement parallelisation for finding patterns and finding applicable patterns. See
cpusconfig option.[breaking] Switch patterns management to long format everywhere.
[breaking] Change management of imports/exports. Outputs from Qumin are now shipped as a Frictionless DataPackage. To import them, a path to the computation
metadata.jsonmust be provided.- Addition of frequencies:
Read frequencies from as much sources as possible in the Paralex package (Frequencies class).
Weight cells based on the predictor-target pair frequency.
- Paralex compliance changes:
Usage of the
form_idintroduced in Paralex.Sounds file in conformity with Paralex (
Segs->sound_id)
- User interface improvements:
Change the format of human readable patterns to a more readable markdown export
Add keywords
pos,sample_cells,sample_lexemeskeyword to filter paradigms.Prevent Matplotlib font manager from spamming the log in debug mode.
Improve and extend visualisations of Entropy distributions.
- Sampling management changes:
most_freq=Falsereplaced byforce_random=True.Addition of a seed option to determinise sampling. By default, sample by frequency.
Change
sampleforsample_lexemesand addsample_cellswith the same behaviour for cells.Removal of overabundant forms is now by default done by frequency (
pats.overabundant.freq=True), and can optionally prioritize some tags (pats.overabudant.tags="[standard_form,preferred_form]").
- [breaking] Removal:
Support for sound table “Seg.” column.
Bipartite entropy computations
All alternation algorithms except
phonandedits(formerpatternsPhonsimandpatternsLevenshtein).Patterns evaluation with
action=eval.
And some various smaller bug fixes !
Hotfixes¶
3.0.1: Fixes errors when dataset had 1-phoneme long words, and the phoneme had multiple graphemes.
- 3.0.2: Fixes two bugs:
process killed by kernel when run with cell frequencies, due to generating crossproduct of categories (issue #92).
Fix bug occuring with debug, where index of predicted frequencies were duplicated.
3.0.3: Fixes generated forms having extra spaces due to not resegmenting anymore (issue #93)
3.0.4: Fixes broken cell probabilities with cell samples (issue #96)
3.0.5: Fixes misalignment in patterns dfs due to merging/unmerging of cells (issue #93 again).
3.0.6: Moves integration tests to makefile to use less minutes (issue #97).
3.0.7: Fix unwanted reordering of cells, leading to shuffled entropy results.
3.0.8: Fixes a few bugs in entropy visualizations.
3.0.9: Remove
tokens_freq.cellsfrom CLI and always compute weighted entropies if available. Store the results inmetadata.json.- 3.0.10:
Ensure the
Metadataclass (qumin.utils.Metadata) can find the current run directory, when Hydra is not in use, e.g. in custom scripts. See the class documentation for additional details.Fix broken features in entropy computations.
Various small fixes and better warnings.
3.0.11: Fix another source of extra spaces/empty strings in patterns (issue #101)
3.0.12 to 14: Fix Frequencies expecting the dataset necessarily has cells and lexemes tables (Qumin should run even with just sounds and forms).
3.0.15: Fix broken visualisations when no cells table and no order provided.
3.0.16: Rewrite some checks when importing patterns.
3.0.17: Fix broken microclass heatmaps.
3.0.18: Fix inf cell name interpreted as a numeric value.
3.0.19: On windows, no multiprocess. Otherwise, cap cpus at 10. Windows integration tests. Ensure utf8 console output.
3.0.20/21: Fix PyPi readme display.
3.0.22: Fix lattice computations.
3.0.23: Fix reading of paradigms from multiple tables (frictionless multipath resources).
3.0.24: Fix metadata error when no html export of lattices is possible due to mpld3 not being installed.
3.0.25: In CI/CD and Pypi, limit python to 3.13 – this is due to Hydra which doesn’t work with 3.14.
3.0.26: Typos in the publication list
3.0.27: Fix html export of lattices
Version 2.0¶
Support for the Paralex standard.
Automatic generation of heatmaps for entropy computations.
Add a
cellskeyword to filter paradigms on cells.Several bugfixes
- Removal:
Support for wide paradigms.
Version 1.1¶
Several bugfixes
Version 1.0¶
Initial release