Predictability heatmaps

Since Qumin 2.0, results are shipped as long tables. This allows to store several metrics in the same file, with results for several runs. Results file now look like this:

predictor,predicted,measure,value,n_pairs,n_preds,dataset,pair_proba,pred_proba,target_proba,proba_source
<cell1>,<cell2>,cond_entropy,0.39,500,1,<dataset_name>,...
<cell1>,<cell2>,cond_entropy,0.35,500,1,<dataset_name>,...
<cell1>,<cell2>,cond_entropy,0.2,500,1,<dataset_name>,...
<cell1>,<cell2>,cond_entropy,0.43,500,1,<dataset_name>,...
<cell1>,<cell2>,cond_entropy,0.6,500,1,<dataset_name>,...
<cell1>,<cell2>,cond_entropy,0.1,500,1,<dataset_name>,...

When run with probabilities settings, additional columns are added reporting probabilities of cells and their combination.

All results are in the same file, including different number of predictors (indicated in the n_preds column), and different measures (indicated in the measure column).

To facilitate a quick general glance at the results, we output an entropy heatmap in the wide matrix format. This behaviour can be disabled by passing pred.heatmap=False. It takes advantage of the Paralex features-values table to sort the cells in a canonical order on the heatmap. The heatmap.order setting is used to specify which feature should have higher priority in the sorting:

/$ qumin action=pred data=<dataset.package.json> heatmap.order="[number, case]"

It is also possible to draw an entropy heatmap without running entropy computations:

/$ qumin action=ent_heatmap pred.importResults=<metadata.json>

Newer versions of Qumin automatically compute zones of interpredictibility and display them as a heatmap. The heatmap.cols argument is required to tell Qumin which features should be put in columns.

/$ qumin action=pred data=<dataset.package.json> heatmap.cols="[case]"

Full reference

Heatmap options are under the heatmap namespace. Available heatmap options (see also the common options):

class qumin.utils.config.HeatmapConfig(*, label=None, cmap=None, exhaustive_labels=False, dense=False, annotate=False, order=None, cols=None, display=<factory>)[source]
Parameters:
  • label (str | None) – Lexeme column to use as label (for microclass heatmap, eg. inflection_class)

  • cmap (str | None) – Colormap name

  • exhaustive_labels (bool) – by default, seaborn shows only some labels on the heatmap for readability. This forces seaborn to print all labels.

  • dense (bool) – Use initials instead of full labels (only for entropy heatmap)

  • annotate (bool) – Display values on the heatmap. (only for entropy heatmap)

  • order (Any | None) – Priority list for sorting features (for entropy heatmap) ex: [number, case]). If no features-values file available, you can use the key cells to provide an ordered list of cells to display. Special value “autosort” in order to sort by cell similarity.

  • cols (Any | None) – List of features to show in columns (for zones heatmap) ex: [Mode, Tense]). All other features will constitute rows.

  • display (HeatmapDisplayConfig) – Options to switch on/off additional heatmaps.

Values for the heatmap_display keyword:

class qumin.utils.config.HeatmapDisplayConfig(*, n_pairs=True, freq_margins=True)[source]

Set to True/False to show or hide detailed information on the heatmap

Parameters:
  • n_pairs (bool) – Whether to display the number of pairs.

  • freq_margins (bool) – Whether to display frequency margins on heatmaps.

The default configuration for these keys looks like this:

heatmap:
  label: null
  cmap: null
  exhaustive_labels: false
  dense: false
  annotate: false
  order: null
  cols: null
  display:
    n_pairs: true
    freq_margins: true

See the full Default configuration