How to write your own script?¶
You may want to write a script that makes use of Qumín’s built-in utilities. While some of the functions are easy to use independently, many are specific to Qumín and require preparatory steps. In this How-to, we focus on the basic operation of reading pre-computed alternation patterns. This is often the first step before writing a more complex script.
Setting the environment¶
Qumín scripts are passed through the command line using Hydra, with a set of options. We create a DictConfig object that stores all the options we want to pass to our script. These options should strictly follow the hierarchy of the CLI reference
from omegaconf.dictconfig import DictConfig
cfg = DictConfig({'data': "source/vlexique/vlexique.package.json"})
To store metadata about the current run (input paralex dataset, hydra options, current directory) Qumin makes use of a Metadata class. Each run has its own Metadata class, which is then saved in the metadata.json file.
We initialize a Metadata object for the current run, and we pass as options:
rundir_path : A directory for the current run
cfg : the command line configuration
from qumin.utils.metadata import Metadata
md = Metadata(cfg=cfg, rundir_path="results/my_run")
Similarly, we load the Metadata object corresponding to the patterns run. The second line is a bit tricky: we manually load the command-line configuration from the metadata file.
patterns_md = Metadata(path="results/patterns/metadata.json")
patterns_md.cfg = DictConfig(patterns_md.package.custom['omega_conf'])
Reading the patterns¶
We now turn to the proper step of reading the morphological data.
We first load the paradigms from the paralex dataset. Paradigms are stored as a Paradigms class. This will also initialize the segment inventory for phonological computations. Phonological information is retrieved from the paralex dataset using the current run Metadata. Additional explanations for manipulating paradigms can be found in the corresponding API documentation.
paradigms = patterns_md.get_paradigms(md, segcheck=True)
Then, we instantiate a PatternStore object, and we populate it with data saved from the previous run. Additional explanations for manipulating patterns can be found in the corresponding API documentation.
patterns = patterns_md.get_patterns(paradigms)
Using the patterns¶
The patterns object now contains all you need for your research. ParadigmPatterns subclasses dict: each key is a tuple containing a pair of cells, and each value a DataFrame of patterns, making it fairly easy to manipulate.
The best solution to understand how patterns can be used in further computations is probably to dive into the scripts which are provided with Qumin. They are available at the root of Qumin’s source code.
All-in-one¶
from qumin.utils.metadata import Metadata
from omegaconf.dictconfig import DictConfig
cfg = DictConfig({'data': "source/vlexique/vlexique.package.json"})
md = Metadata(cfg=cfg, rundir_path="results/my_run")
patterns_md = Metadata(path="results/patterns/metadata.json")
patterns_md.cfg = DictConfig(patterns_md.package.custom['omega_conf'])
paradigms = patterns_md.get_paradigms(md, segcheck=True)
patterns = patterns_md.get_patterns(paradigms)