qumin.utils package¶
Submodules¶
qumin.utils.metadata module¶
- class qumin.utils.metadata.Metadata(path=None, cfg=None, rundir_path=None)[source]¶
Bases:
objectMetadata manager for Qumin scripts. Wrapper around the Frictionless Package class.
Basic usage :
Register Metadata manager;
Get an absolute path to the metadata folder;
Write to that path;
After writing a file, register it and set metadata (description, custom dict);
Export the JSON descriptor.
The Metadata class can easily be used in scripts that reuse Qumin results. In that case, one must pass a value to runtime_path, if hydra is not set (which is very likely if you write a simple script).
Examples
import omegaconf cfg = omegaconf.dictconfig.DictConfig({data="myparalex/package.json"}) md = Metadata(cfg=cfg, path="myprevious_run/metadata.json") name = 'path/myfile.txt' filename = md.get_path(name) # Open an IO stream and write to ``filename``. md.register_file(name, description="My nice file", custom={"property": "value"}) md.save_metadata(path)
- Variables:
start (datetime) – timestamp at the beginning of the run.
prefix (Path) – normalized prefix for the output files
cfg (OmegaConf) – all arguments passed to the python script
paralex (frictionless.Package) – a frictionless Package representing a dataset.
- get_paradigm_conf(cfg)[source]¶
Load paradigm creation keywords from previous run.
A few security checks are performed to ensure the user didn’t pass contradictory arguments. If this is the case, a warning is thrown and old arguments are kept.
Under some conditions, arguments can be overwritten (e.g. cells list).
- Parameters:
cfg (OmegaConf.dictconfig.DictConfig) – Arguments passed to the current run. These arguments might override arguments from the previous run, under specific conditions.
- get_paradigms(md, **kwargs)[source]¶
Creates paradigms with a stable config strategy.
- Parameters:
md (qumin.utils.metadata.Metadata) – Metadata handler of the current run.
kwargs (dict) – Additional keyword arguments are passed to qumin.representations.paradigms.Paradigms.
- get_path(rel_path)[source]¶
Return an absolute path to a file and create parent directories.
- Parameters:
rel_path (str) – relative path to the file or folder.
- Returns:
absolute path to the file or folder.
- Return type:
- get_pattern_conf()[source]¶
Load pattern creation keywords from previous run. No security checks: all relevant arguments have already been tested when loading the paradigms.
- get_patterns(paradigms, **kwargs)[source]¶
Creates patterns with a stable config strategy.
- Parameters:
paradigms (qumin.representations.paradigms.Paradigms) – Paradigms representation.
kwargs (dict) – Additional keyword arguments are passed patterns.from_file().
- get_resource_path(resource)[source]¶
Return the full path to a resource
- Parameters:
resource (str) – A resource name
- Returns:
a path to the resource.
- Return type:
- register_file(rel_path, name=None, custom=None, **kwargs)[source]¶
Add a file as a frictionless resource.
- Parameters:
rel_path (str or pathlib.Path) – the relative path to the file.
name (str) – name of the resource. By default, this will be the name of the file without the extension.
custom (dict) – Custom properties to save.
**kwargs (dict) – Optional keyword arguments passed to Resource, e.g. description.
Module contents¶
qumin.utils.metadata module¶
- class qumin.utils.metadata.Metadata(path=None, cfg=None, rundir_path=None)[source]¶
Bases:
objectMetadata manager for Qumin scripts. Wrapper around the Frictionless Package class.
Basic usage :
Register Metadata manager;
Get an absolute path to the metadata folder;
Write to that path;
After writing a file, register it and set metadata (description, custom dict);
Export the JSON descriptor.
The Metadata class can easily be used in scripts that reuse Qumin results. In that case, one must pass a value to runtime_path, if hydra is not set (which is very likely if you write a simple script).
Examples
import omegaconf cfg = omegaconf.dictconfig.DictConfig({data="myparalex/package.json"}) md = Metadata(cfg=cfg, path="myprevious_run/metadata.json") name = 'path/myfile.txt' filename = md.get_path(name) # Open an IO stream and write to ``filename``. md.register_file(name, description="My nice file", custom={"property": "value"}) md.save_metadata(path)
- Variables:
start (datetime) – timestamp at the beginning of the run.
prefix (Path) – normalized prefix for the output files
cfg (OmegaConf) – all arguments passed to the python script
paralex (frictionless.Package) – a frictionless Package representing a dataset.
- get_paradigm_conf(cfg)[source]¶
Load paradigm creation keywords from previous run.
A few security checks are performed to ensure the user didn’t pass contradictory arguments. If this is the case, a warning is thrown and old arguments are kept.
Under some conditions, arguments can be overwritten (e.g. cells list).
- Parameters:
cfg (OmegaConf.dictconfig.DictConfig) – Arguments passed to the current run. These arguments might override arguments from the previous run, under specific conditions.
- get_paradigms(md, **kwargs)[source]¶
Creates paradigms with a stable config strategy.
- Parameters:
md (qumin.utils.metadata.Metadata) – Metadata handler of the current run.
kwargs (dict) – Additional keyword arguments are passed to qumin.representations.paradigms.Paradigms.
- get_path(rel_path)[source]¶
Return an absolute path to a file and create parent directories.
- Parameters:
rel_path (str) – relative path to the file or folder.
- Returns:
absolute path to the file or folder.
- Return type:
- get_pattern_conf()[source]¶
Load pattern creation keywords from previous run. No security checks: all relevant arguments have already been tested when loading the paradigms.
- get_patterns(paradigms, **kwargs)[source]¶
Creates patterns with a stable config strategy.
- Parameters:
paradigms (qumin.representations.paradigms.Paradigms) – Paradigms representation.
kwargs (dict) – Additional keyword arguments are passed patterns.from_file().
- get_resource_path(resource)[source]¶
Return the full path to a resource
- Parameters:
resource (str) – A resource name
- Returns:
a path to the resource.
- Return type:
- register_file(rel_path, name=None, custom=None, **kwargs)[source]¶
Add a file as a frictionless resource.
- Parameters:
rel_path (str or pathlib.Path) – the relative path to the file.
name (str) – name of the resource. By default, this will be the name of the file without the extension.
custom (dict) – Custom properties to save.
**kwargs (dict) – Optional keyword arguments passed to Resource, e.g. description.
qumin.utils.config module¶
- class qumin.utils.config.Actions(*values)[source]¶
-
Available actions. Each action triggers a different script.
Changed in version 3.2.0: Actions
Handent_heatmapare replaced bypredandpred_heatmap.- H = 'H'¶
- ent_heatmap = 'ent_heatmap'¶
- heatmap = 'heatmap'¶
- lattice = 'lattice'¶
- macroclasses = 'macroclasses'¶
- patterns = 'patterns'¶
- pred = 'pred'¶
- pred_heatmap = 'pred_heatmap'¶
- class qumin.utils.config.HeatmapConfig(*, label=None, cmap=None, exhaustive_labels=False, dense=False, annotate=False, order=None, cols=None, display=<factory>)[source]¶
Bases:
object- Parameters:
label (str | None) – Lexeme column to use as label (for microclass heatmap, eg. inflection_class)
cmap (str | None) – Colormap name
exhaustive_labels (bool) – by default, seaborn shows only some labels on the heatmap for readability. This forces seaborn to print all labels.
dense (bool) – Use initials instead of full labels (only for entropy heatmap)
annotate (bool) – Display values on the heatmap. (only for entropy heatmap)
order (Any | None) – Priority list for sorting features (for entropy heatmap) ex: [number, case]). If no features-values file available, you can use the key cells to provide an ordered list of cells to display. Special value “autosort” in order to sort by cell similarity.
cols (Any | None) – List of features to show in columns (for zones heatmap) ex: [Mode, Tense]). All other features will constitute rows.
display (HeatmapDisplayConfig) – Options to switch on/off additional heatmaps.
- display: HeatmapDisplayConfig¶
- class qumin.utils.config.HeatmapDisplayConfig(*, n_pairs=True, freq_margins=True)[source]¶
Bases:
objectSet to True/False to show or hide detailed information on the heatmap
- Parameters:
- class qumin.utils.config.Kind(*values)[source]¶
-
Kind of algorithm for the patterns.
- Parameters:
phon – phonological distance
edits – simple edit distance
- edits = 'edits'¶
- phon = 'phon'¶
- class qumin.utils.config.LatticeConfig(*, shorten=False, aoc=False, html=False, ctxt=False, stat=False, pdf=True, png=False)[source]¶
Bases:
objectConfiguration for the ``lattice`` action.
- Parameters:
shorten (bool) – Drop redundant columns altogether. Useful for big contexts, but loses information. The lattice shape and stats will be the same. Avoid using with –html
aoc (bool) – Only attribute and object concepts
html (bool) – Export to html
ctxt (bool) – Export as a context
stat (bool) – Output stats about the lattice
pdf (bool) – Export as pdf
png (bool) – Export as png
- class qumin.utils.config.OverabundantPatternsConfig(*, keep=False, freq=True, tags=None)[source]¶
Bases:
objectConfiguration for the processing of overabundant forms.
- Parameters:
- class qumin.utils.config.PatternsConfig(*, kind=Kind.phon, defective=False, gap_proportion=0.4, optim_mem=False, overabundant=<factory>)[source]¶
Bases:
objectConfiguration for the
patternsaction.- Parameters:
kind (Kind) – Options are (see docs): phon, edits
defective (bool) – Whether to keep defective entries
gap_proportion (float) – Proportion of the median score used to set the gap score
optim_mem (bool) – Attempt to use a little bit less memory
overabundant (OverabundantPatternsConfig) – Configuration for overabundance
- overabundant: OverabundantPatternsConfig¶
- class qumin.utils.config.PredictabilityConfig(*, vis=True, n=<factory>, features=None, importResults=None, token_freq=<factory>)[source]¶
Bases:
objectConfiguration for entropy calculations.
- Parameters:
vis (bool) – Whether to create a heatmap of the metrics and of interpredictability zones.
n (List[int]) – Compute entropy for prediction from with n predictors.
features (Any | None) – Feature column in the Lexeme table. Features will be considered known in conditional probabilities: P(X~Y|X,f1,f2…)
importResults (str | None) – Import previous entropy computation results. with any file, use to compute entropy heatmap with n-1 predictors, allows for acceleration on nPreds entropy computation.
token_freq (TokenFreqConfig) – Whether to use token frequencies for…
- token_freq: TokenFreqConfig¶
- class qumin.utils.config.QuminConfig(*, action=Actions.patterns, data, patterns=None, pos=None, lexemes=None, cells=None, sample_lexemes=None, sample_cells=None, force_random=False, seed=1, force=False, cpus=1, resegment=False, checkSegments=True, pats=<factory>, lattice=<factory>, heatmap=<factory>, pred=<factory>, entropy='${oc.deprecated:pred}')[source]¶
Bases:
object- Parameters:
action (Actions) – Action, one of: patterns, pred (H is deprecated), lattice, macroclasses, heatmap, pred_heatmap (ent_heatmap is deprecated)
data (str) – Path to paralex.package.json paradigms, segments
cells (Any | None) – Cells to use (subset)
pos (Any | None) – Parts of speech to use (subset)
patterns (str | None) – Path to pattern computation metadata. If null, will compute patterns.
lexemes (str | None) – Lexemes to use (subset), path to a file with one lexeme id per row
sample_lexemes (int | None) – A number of lexemes to sample, for debug purposes.
sample_cells (int | None) – A number of cells to sample, for debug purposes. Samples by frequency if possible, otherwise randomly.
force_random (bool) – Whether to force random sampling.
seed (int) – Random seed for reproducible random effects.
force (bool) – Whether to overpass RAM usage security (2GB)
cpus (int) – Number of cpus to use for big computations Defaults to 1. 0 sets the number of available cpus to the maximum - 2. WARNING: cpus > 1 is unavailable for now in Windows and Mac. Whether to ignore spaces in phon forms and re-compute phonemic segmentation
resegment (bool) – Whether to resegment phonological forms.
checkSegments (bool) – Whether to control if all forms contain licit segments.
pats (PatternsConfig) – Configuration for the
patternsaction.lattice (LatticeConfig) – Configuration for the
latticeaction.heatmap (HeatmapConfig) – Configuration for the
pred_heatmapaction.pred (PredictabilityConfig) – Configuration for the
predaction.entropy (PredictabilityConfig)
Changed in version 3.2.0: Namespace
entropyis replaced bypred.- entropy: PredictabilityConfig = '${oc.deprecated:pred}'¶
- heatmap: HeatmapConfig¶
- lattice: LatticeConfig¶
- pats: PatternsConfig¶
- pred: PredictabilityConfig¶