Get faster computations¶

Importing results¶

Some steps in Qumin are particularly long, for instance pattern computations. Such computations often make it possible to save results in files and import these results the next time. This is the case for pattern computations and predictability measures (see below).

Warning

Patterns and entropies computed with Qumin 2.0 are not importable in Qumin 3.0 due to a breaking change in the output format. When importing computation results, Qumin 3.0 now expects a path to the metadata.json file, which contains relative paths to the output files.

Pattern computations¶

Patterns are used in many scripts (predictability, lattice, etc). If no patterns are provided, they are always recomputed. If you run multiple scripts on the same data, you can compute patterns once and pass the result to other scripts.

qumin action=<lattice/pred/...> data=<mydata.package.json> patterns=<path/metadata.json>

In addition, predictability computations identify which patterns can be applied to each form. This step is also costly: if you run several predictability computations with different settings on the same patterns, it is worth exporting the applicable patterns:

qumin action=pred data=<mydata.package.json> patterns=<mypatterns/metadata.json> exportApplicable=True

To speed up the next predictability computation, pass the result of this last run with patterns=<path to pred results/metadata.json>. The results from the previous computation encapsulate both regular patterns and applicable patterns:

qumin action=pred data=<mydata.package.json> patterns=<pred/metadata.json>

Predictability measures¶

Computing entropies with n predictors, can be slow. If you already computed entropies with n-1 predictors, passing them to Qumín can significantly reduce the computation time.

qumin action=pred data=<mydata.package.json> pred.n=[2] pred.importResults=<pred-n-1/metadata.json>

Multiprocessing¶

Since version 3.0.0, Qumín handles multiprocessing for different time-consuming computations in datasets with a large number of cells. The basic unit for multiprocessing is the pair of cells: without multiprocessing, each pair is handled sequentially, whereas multiprocessing allows for parallel computations.

Tip

Let us consider as a time unit the time required to perform a computation on one pair of cells.

If a dataset contains 10 cells, there are (10 * 11 / 2) = 55 pairs of cells (55 time units). If the computer has 10 available threads, the computation time is reduced by approximately 50 time units (55 * 9 / 10) to 5 time units.
If a dataset contains 50 cells, there are (50 * 51 / 2) = 1275 pairs of cells. If the computer has 10 available threads, the computation time is reduced by approximately 1148 time units (1275 * 9 / 10) to 127 time units.

This operation is especially beneficial with datasets containing a lot of cells.

Multiprocessing is available for the following steps:

Pattern inference (action=pattern)
Applicable patterns computation (action=pred)

Sampling¶

Qumin automatically identifies syncretic cells that can be merged together. However, it also happens that several cells, without being strictly identical, are entirely interpredictable. In addition, you might not be interested in computing results on your whole dataset. In such cases, it is recommended to sample your dataset based on the name of the cells using the provided keywords. For this, see Use a data subset.