Changelog¶

Qumin follows the semver principles for versioning. This changelog only refers to MAJOR and MINOR versions.

Version 4.0¶

[features]
- [breaking] Reorganize the lattice API:
  
  New options for lattice (horizontal and layout) and macroclasses (horizontal, square, topdown).
  
  More customizable API.
  
  The lattice drawing algorithm can now be controlled with the seed keyword argument, making the results reproductible.
  
  Drop duplicated lexemes / patterns when computing the lattice, to ensure faster computations.
  
  PatternStore now has the following export methods: .to_incidence_table(), to_microclasses(), .to_lattice()
- [breaking] Add form-level features for conditioning in predictability calculations. Move former lexeme-level features to pred.features.lexemes.
- [breaking] Remove deprecated H and ent_heatmap actions.
- [breaking] Action heatmap deleted and renamed to microclasses.
- [breaking] Namespace for macroclasses action is corrected from macroclass to macroclasses.
- Add the possibility to weight onehot encodings of patterns for overabundant patterns, using frequencies (through the API).
- Add a pred.export_log keyword, which exports human readable details about distributions in dedicated markdown files (previously was dumped on stdout when set to verbose). Avoid double calculations, which used to be done when creating the log. No more “_debug” outputs are generated.
[bugfix]: various small bugfixes
[other]
- When several scripts are chained, load paradigms and segments only once.
- Documentation improvements, explicitly documenting how to run Qumin on data subsets, adding CLI links to the home page, adding optional installs to the tutorial.
- Several keywords that used to allow both strings and lists of strings (e.g. pos) will now enforce lists of string, to facilitate user input type checking.
- Moved config.py file from utils/ module to config/.

Version 3.3¶

[features]
- Add option checkSegments to disable the segment check, which can be slow on large datasets.
[bugfix]
- Phonological systems of sound are now represented internally as Inventory instances (rather than a single static Inventory class).
This fixes issue #103, allowing for parallel computations on windows and mac. - Increase lattice computations speed (in some cases, all lexemes were kept, instead of the microclasses only).
[other]
- Enable pandas>=3.0.0 and numpy>=2.4.
- Handle the #MISSING# tag (same usage as #DEF#) introduced in paralex 2.3.
- Higher verbosity in lattice computations.
- Fix documentation for features in entropy.

Hotfixes¶

3.3.1: Explicitly convert incidence table to str dtype to pass the tests.
3.3.2:
- Fix regression: Qumin only took the first forms table in case of multipart dataset;
- Fix an issue with the sorting of overabundant forms.
3.3.3:
- Restore multiprocessing (broken when sound inventories became large)
- Minor fix for the frequency of forms with identical sound shapes.
- Fix RAM consumption leak (broken in 3.3)
- Cleanup in Patterns code
3.3.4:
- Fix heatmap norms when all measures are equal to 0 or 1
- Update API to matplotlib==3.11
3.3.5:
- Documentation improvements, explicitly documenting how to run Qumin on data subsets.

Version 3.2¶

[features]
- Add option to take token frequencies of patterns and predictors into account and change keywords for token frequencies.
- Add computation of probability of success besides entropy.
- Allow predictability computations for overabundant paradigms.
- Expose onehot representation of patterns through the to_onehot function & faster implementation.
[other]
- Move default config from a YAML file to a proper OmegaConf StructuredConfig and autogenerate the CLI documentation.
- Improve documentation of qumin.lattice.lattice.ICLattice
- Allow lexemes argument overwrite in downstream scripts.
[deprecation] (removal planned in 4.0.0)
- Actions H and ent_heatmap are replaced with pred and pred_heatmap
- Entropy options are now under the namespace pred (insead of entropy)
- Configuration keywords that expect lists will no longer allow strings in place of a list containing one item.

Hotfixes¶

3.2.1: fix n-pred computations (didn’t work since 3.0.0)

Version 3.1¶

small UX improvements on lattice scripts.
Add an option to pass a list of lexeme ids to use for the next computation (instead of all lexemes).
More consistent import of paradigms / patterns.
Modernize the build process: move to pyproject.toml and remove setup.cfg.
Default to cpus=1 due to incompatibility with Mac & Windows.
Add option to autosort cells in heatmaps, according to result similarity.
rewrite docs and change docs theme.

Hotfixes¶

3.1.1: Add readme to pypi
3.1.2: Rst tweaks to render readme on pypi

Version 3.0¶

Moved the project to gitlab !
Implement parallelisation for finding patterns and finding applicable patterns. See cpus config option.
[breaking] Switch patterns management to long format everywhere.
[breaking] Change management of imports/exports. Outputs from Qumin are now shipped as a Frictionless DataPackage. To import them, a path to the computation metadata.json must be provided.
Addition of frequencies:
- Read frequencies from as much sources as possible in the Paralex package (Frequencies class).
- Weight cells based on the predictor-target pair frequency.
Paralex compliance changes:
- Usage of the form_id introduced in Paralex.
- Sounds file in conformity with Paralex (Segs -> sound_id)
User interface improvements:
- Change the format of human readable patterns to a more readable markdown export
- Add keywords pos, sample_cells, sample_lexemes keyword to filter paradigms.
- Prevent Matplotlib font manager from spamming the log in debug mode.
- Improve and extend visualisations of Entropy distributions.
Sampling management changes:
- most_freq=False replaced by force_random=True.
- Addition of a seed option to determinise sampling. By default, sample by frequency.
- Change sample for sample_lexemes and add sample_cells with the same behaviour for cells.
- Removal of overabundant forms is now by default done by frequency (pats.overabundant.freq=True), and can optionally prioritize some tags (pats.overabudant.tags="[standard_form,preferred_form]").
[breaking] Removal:
- Support for sound table “Seg.” column.
- Bipartite entropy computations
- All alternation algorithms except phon and edits (former patternsPhonsim and patternsLevenshtein).
- Patterns evaluation with action=eval.
And some various smaller bug fixes !

Hotfixes¶

3.0.1: Fixes errors when dataset had 1-phoneme long words, and the phoneme had multiple graphemes.
3.0.2: Fixes two bugs:
- process killed by kernel when run with cell frequencies, due to generating crossproduct of categories (issue #92).
- Fix bug occuring with debug, where index of predicted frequencies were duplicated.
3.0.3: Fixes generated forms having extra spaces due to not resegmenting anymore (issue #93)
3.0.4: Fixes broken cell probabilities with cell samples (issue #96)
3.0.5: Fixes misalignment in patterns dfs due to merging/unmerging of cells (issue #93 again).
3.0.6: Moves integration tests to makefile to use less minutes (issue #97).
3.0.7: Fix unwanted reordering of cells, leading to shuffled entropy results.
3.0.8: Fixes a few bugs in entropy visualizations.
3.0.9: Remove tokens_freq.cells from CLI and always compute weighted entropies if available. Store the results in metadata.json.
3.0.10:
- Ensure the Metadata class (qumin.utils.Metadata) can find the current run directory, when Hydra is not in use, e.g. in custom scripts. See the class documentation for additional details.
- Fix broken features in entropy computations.
- Various small fixes and better warnings.
3.0.11: Fix another source of extra spaces/empty strings in patterns (issue #101)
3.0.12 to 14: Fix Frequencies expecting the dataset necessarily has cells and lexemes tables (Qumin should run even with just sounds and forms).
3.0.15: Fix broken visualisations when no cells table and no order provided.
3.0.16: Rewrite some checks when importing patterns.
3.0.17: Fix broken microclass heatmaps.
3.0.18: Fix inf cell name interpreted as a numeric value.
3.0.19: On windows, no multiprocess. Otherwise, cap cpus at 10. Windows integration tests. Ensure utf8 console output.
3.0.20/21: Fix PyPi readme display.
3.0.22: Fix lattice computations.
3.0.23: Fix reading of paradigms from multiple tables (frictionless multipath resources).
3.0.24: Fix metadata error when no html export of lattices is possible due to mpld3 not being installed.
3.0.25: In CI/CD and Pypi, limit python to 3.13 – this is due to Hydra which doesn’t work with 3.14.
3.0.26: Typos in the publication list
3.0.27: Fix html export of lattices

Version 2.0¶

Support for the Paralex standard.
Automatic generation of heatmaps for entropy computations.
Add a cells keyword to filter paradigms on cells.
Several bugfixes
Removal:
- Support for wide paradigms.

Version 1.1¶

Several bugfixes

Version 1.0¶

Initial release