How to get faster computations?¶
Importing results¶
Some steps in Qumin are particularly long, for instance pattern computations. Such computations often make it possible to save results in files and import these results the next time. This is the case for:
pattern computations (for entropy, lattice, etc:
patterns=<path/metadata.json>)entropy computations (for entropy with n-predictors:
pred.importResults=<path/metadata.json>)
Warning
Patterns and entropies computed with Qumin 2.0 are not importable in Qumin 3.0 due to a breaking change in the output format. When importing computation results, Qumin 3.0 now expects a path to the metadata.json file, which contains relative paths to the output files.
Multiprocessing¶
Since version 3.0.0, Qumín handles multiprocessing for different time-consuming computations in datasets with a large number of cells. The basic unit for multiprocessing is the pair of cells: without multiprocessing, each pair is handled sequentially, whereas multiprocessing allows for parallel computations.
Tip
Let us consider as a time unit the time required to perform a computation on one pair of cells.
If a dataset contains 10 cells, there are (10 * 11 / 2) = 55 pairs of cells (55 time units). If the computer has 10 available threads, the computation time is reduced by approximately 50 time units (55 * 9 / 10) to 5 time units.
If a dataset contains 50 cells, there are (50 * 51 / 2) = 1275 pairs of cells. If the computer has 10 available threads, the computation time is reduced by approximately 1148 time units (1275 * 9 / 10) to 127 time units.
This operation is especially beneficial with datasets containing a lot of cells.
Multiprocessing is available for the following steps:
Pattern inference (
action=pattern)Applicable patterns computation (
action=pred)
Sampling¶
Qumin automatically identifies syncretic cells that can be merged together. However, it also happens that several cells, without being strictly identical, are entirely interpredictable. In addition, you might not be interested in computing results on your whole dataset. In such cases, it is recommended to sample your dataset based on the name of the cells using the provided keywords.
In the case you import previously computed patterns, Qumin should then be able to retrieve the sampling used during the previous runs. Available options allow you to limit the amount of cells and/or lexemes, randomly or using lists of cells.
Tip
Have a look at the CLI reference to see all available options for sampling.