Configuration Reference¶

This page explains the meaning of every configuration parameter exposed through the unified YAML entrypoint.

For sibling autoindexers configuration fields such as num_bits, num_tables, use_median_thresholds, and num_iterations, see Indexers.

dataset:
  name: ...
  config: ...
model:
  name: ...
  config: ...
encoder:
  name: ...
  config: ...
decoder:
  name: ...
  config: ...
trainer:
  ...

Reading This Page¶

"Base" sections define fields inherited by many concrete configs.
Concrete model sections only list fields added on top of the base family.
Backbone fields belong to encoder.config and decoder.config.
Trainer fields belong to the flat trainer block.

Dataset Configs¶

BaseDatasetConfig¶

Field	Meaning
`max_vectors`	Optional cap on how many prepared samples are materialized from a downloadable embedding-style dataset. Use it to shorten experiments or smoke tests.

GloVeDatasetConfig¶

Field	Meaning
`dim`	Embedding width to extract from the Stanford GloVe archive. Valid values are `50`, `100`, `200`, and `300`.
`max_vectors`	Optional cap on how many word vectors to keep from the chosen file.

FastTextEnglishDatasetConfig¶

Field	Meaning
`max_vectors`	Optional cap on how many English fastText vectors are loaded from the source file.

ConceptNetNumberbatchDatasetConfig¶

Field	Meaning
`max_vectors`	Optional cap on how many Numberbatch vectors are loaded.

EncoderBackedTextDatasetConfig¶

Used by snli and multinli.

Field	Meaning
`encoder`	Text encoder model name used to materialize sentence embeddings, typically a Sentence-Transformers identifier.
`encoder_batch_size`	Batch size used while converting raw texts into embeddings during preprocessing.
`normalize_embeddings`	Whether to L2-normalize encoder outputs before saving them as dataset samples.
`max_vectors`	Optional cap on how many raw text examples are embedded.

CLIPBackedDatasetConfig¶

Used by flickr30k.

Field	Meaning
`encoder`	CLIP backbone name, such as `ViT-B-32`.
`clip_pretrained`	CLIP checkpoint tag paired with the backbone, such as `laion2b_s34b_b79k`.
`encoder_batch_size`	Batch size used while extracting CLIP embeddings.
`clip_device`	Device override for CLIP preprocessing, such as `cpu`, `cuda`, or `mps`.
`normalize_embeddings`	Whether image/text embeddings are normalized to unit length before saving.
`clip_modality`	Which modality to materialize: `image`, `text`, or `both`.
`max_vectors`	Optional cap on how many records or caption embeddings are materialized.

CIFAR10DatasetConfig¶

Field	Meaning
`max_examples`	Optional cap on how many CIFAR-10 images are retained after download and preprocessing.

Backbone Configs¶

BaseAutoencoderModuleConfig¶

This is the structural base for built-in backbones. It does not currently add standalone YAML fields beyond what concrete modules define.

MLPModuleConfig¶

Field	Meaning
`hidden_dims`	Ordered list of layer widths. For an encoder, this is the path from input features to the module output. For an explicit decoder, this is the path from decoder input features back to sample space.
`activation`	Nonlinearity inserted after each non-final linear layer. Supported values: `relu`, `gelu`, `silu`, `tanh`.
`use_bias`	Whether each linear layer uses a bias term.
`dropout`	Dropout probability applied after non-final activations.
`norm`	Optional normalization after each non-final linear layer: `none`, `layernorm`, or `batchnorm`.
`weight_init`	Initialization strategy for linear weights: `default`, `xavier_uniform`, or `xavier_normal`.

CNNModuleConfig¶

Field	Meaning
`channels`	Output channel count for each convolutional stage.
`kernel_sizes`	Kernel size per stage. Each value may be one integer or one `(height, width)` pair.
`strides`	Stride per stage. Each value may be one integer or one `(height, width)` pair.
`paddings`	Padding per stage. Each value may be one integer or one `(height, width)` pair.
`activation`	Nonlinearity after each non-final convolution. Supported values: `relu`, `gelu`, `silu`, `tanh`.
`use_bias`	Whether convolution layers use bias terms.
`transpose`	If `true`, build explicit upsampling layers with `ConvTranspose2d`. Use this for image decoders declared explicitly in YAML.

VisionTransformerModuleConfig¶

Field	Meaning
`patch_size`	Patch height and width used to turn images into patch tokens, given as one integer or one `(height, width)` pair.
`hidden_dim`	Transformer token width after patch projection.
`num_layers`	Number of transformer encoder layers.
`num_heads`	Attention heads per transformer layer. `hidden_dim` must be divisible by this value.
`mlp_ratio`	Feed-forward expansion ratio inside each transformer block.
`dropout`	Dropout probability used inside transformer layers.
`use_bias`	Whether patch projection, output projection, and transformer linear layers use bias terms.

Model Configs¶

BaseAutoencoderConfig¶

Field	Meaning
`latent_dim`	Width of the core latent space when the family uses a single latent width. In deterministic AEs this is the latent width after `project_to_core`; in quantized models it is usually the codebook embedding width.
`reconstruction_loss`	Reconstruction objective. Current built-in choices are intended for dense tensors and typically use `mse`.

Deterministic AE Family¶

AutoencoderConfig¶

No extra fields beyond BaseAutoencoderConfig.

SemanticHashingAutoencoderConfig¶

Field	Meaning
`binarization`	Binary bottleneck mode. Use `ste` for hard straight-through binary codes during the forward pass, or `tanh` to train against soft bounded codes and threshold only at export/inference time.
`binarization_weight`	Weight on the penalty that pushes latent activations toward binary endpoints (`-1` and `+1`).
`balance_weight`	Weight on the bit-balance regularizer that keeps each hash dimension centered around an even split across a batch.
`decorrelation_weight`	Weight on the penalty that suppresses covariance between different bits so the learned code dimensions carry less redundant information.

DenoisingAutoencoderConfig¶

Field	Meaning
`noise_type`	Corruption mode applied to inputs before reconstruction.
`noise_std`	Standard deviation for additive Gaussian noise when that corruption mode is used.
`masking_ratio`	Fraction of features dropped or masked when a masking corruption mode is used.

ContractiveAutoencoderConfig¶

Field	Meaning
`contractive_weight`	Strength of the Jacobian contraction penalty added to the reconstruction objective.

SparseAutoencoderConfig¶

Field	Meaning
`sparsity_weight`	Penalty weight encouraging sparse latent activations.
`target_activation`	Desired mean activation level used by the sparsity penalty.

TopKSparseAutoencoderConfig¶

Field	Meaning
`topk`	Number of latent units retained per sample when using top-k sparsification.

KLSparseAutoencoderConfig¶

Field	Meaning
`sparsity_weight`	Weight for the KL sparsity term.
`target_activation`	Target average activation used inside the KL sparsity penalty.

WassersteinAutoencoderConfig¶

Field	Meaning
`mmd_weight`	Strength of the MMD regularizer matching latent codes to the chosen prior.
`kernel_bandwidths`	Kernel bandwidth list used by the MMD estimator.

AdversarialAutoencoderConfig¶

Field	Meaning
`adversarial_weight`	Strength of the adversarial latent-matching objective.
`discriminator_hidden_dims`	Hidden widths for the latent discriminator network.

Variational Family¶

BaseVariationalAutoencoderConfig¶

Field	Meaning
`kl_weight`	Multiplier on the KL term after warmup is complete.
`free_bits`	Minimum KL contribution retained per latent dimension or block to reduce posterior collapse.
`kl_warmup_epochs`	Number of epochs over which the KL weight ramps from `kl_start_weight` to `kl_weight`.
`kl_start_weight`	Initial KL multiplier before warmup progresses.
`use_mean_in_eval`	If `true`, evaluation and export use posterior means instead of sampling noise.

VariationalAutoencoderConfig¶

No extra fields beyond the base variational family.

BetaVariationalAutoencoderConfig¶

No new fields. Use kl_weight to express beta-style scaling.

DenoisingVariationalAutoencoderConfig¶

Field	Meaning
`noise_type`	Corruption mode for the denoising encoder input.
`noise_std`	Standard deviation for additive Gaussian corruption.
`masking_ratio`	Fraction of masked features when using masking noise.

HierarchicalVariationalAutoencoderConfig¶

Field	Meaning
`top_latent_dim`	Width of the upper latent level in hierarchical VAE models.

VampPriorVariationalAutoencoderConfig¶

Field	Meaning
`num_pseudo_inputs`	Number of learned pseudo-inputs used to define the VampPrior mixture.

InformationVariationalAutoencoderConfig / MMDVariationalAutoencoderConfig¶

Field	Meaning
`mmd_weight`	Weight for the additional MMD regularizer.
`kernel_bandwidths`	Bandwidth list for the MMD kernel mixture.

DIPVariationalAutoencoderConfig¶

Field	Meaning
`dip_type`	Which DIP-VAE covariance penalty variant to use.
`lambda_diag`	Weight on diagonal covariance matching.
`lambda_offdiag`	Weight on off-diagonal covariance suppression.

BetaTCVariationalAutoencoderConfig¶

Field	Meaning
`tc_weight`	Weight on the total-correlation penalty.

FactorVariationalAutoencoderConfig¶

Field	Meaning
`tc_weight`	Weight on the total-correlation penalty estimated through the discriminator.
`discriminator_hidden_dims`	Hidden widths for the auxiliary discriminator.

Quantized Family¶

BaseVectorQuantizedAutoencoderConfig¶

Field	Meaning
`codebook_size`	Number of discrete codes per learned codebook.
`commitment_weight`	Weight that pulls encoder outputs toward selected codes.
`codebook_weight`	Weight that pulls learned code vectors toward encoder outputs when explicit codebook loss is used.
`assignment_strategy`	How discrete indices are selected from codebook distances: `nearest` or `sinkhorn`.
`sinkhorn_epsilon`	Entropic regularization strength for Sinkhorn assignment. It may be one float shared across codebooks or one list with one value per codebook slot. A slot set to `0.0` falls back to nearest-neighbor assignment.
`sinkhorn_iters`	Number of Sinkhorn normalization iterations when `assignment_strategy` is `sinkhorn`.
`kmeans_init`	Whether learned codebooks are initialized from the first training batch of encoder latents instead of uniform random weights.
`kmeans_iters`	Number of Lloyd iterations used during codebook k-means initialization.
`use_ema_codebook`	Whether learned codebooks are updated by exponential moving averages instead of gradient updates.
`ema_decay`	EMA decay factor for codebook statistics. Lower values adapt faster; higher values are smoother.
`ema_epsilon`	Numerical stabilizer used when normalizing EMA cluster sizes.
`dead_code_reset`	Whether rarely used codes are reinitialized at the end of training epochs or steps.
`dead_code_threshold`	Usage-count threshold below which a code is considered dead for reset purposes.

VectorQuantizedAutoencoderConfig¶

No extra fields beyond the base quantized family.

GumbelQuantizedAutoencoderConfig¶

Field	Meaning
`temperature`	Initial Gumbel-softmax temperature.
`min_temperature`	Lower bound for the annealed temperature.
`anneal_rate`	Multiplicative decay controlling how fast the temperature cools.

FiniteScalarQuantizedAutoencoderConfig¶

Field	Meaning
`num_levels`	Number of scalar quantization levels per latent feature.

ResidualFiniteScalarQuantizedAutoencoderConfig¶

Field	Meaning
`num_levels`	Number of scalar levels per residual quantizer.
`num_quantizers`	Number of residual scalar quantization stages.

ProductQuantizedAutoencoderConfig¶

Field	Meaning
`num_codebooks`	Number of product-quantization subspaces. `latent_dim` must be divisible by this value.

OptimizedProductQuantizedAutoencoderConfig¶

No extra fields beyond ProductQuantizedAutoencoderConfig.

OPQVAE keeps the same codebook and assignment settings as PQVAE, but learns an orthogonal rotation over the latent space before subspace splitting. In practice this means:

num_codebooks still controls how many PQ subspaces are used.
sinkhorn_epsilon still accepts either one shared value or one value per codebook slot.
kmeans_init, use_ema_codebook, and dead_code_reset behave the same way as in PQVAE.

ResidualQuantizedAutoencoderConfig¶

Field	Meaning
`num_quantizers`	Number of residual vector quantization stages applied in sequence.

HierarchicalVectorQuantizedAutoencoderConfig¶

Field	Meaning
`top_latent_dim`	Width of the top-level codebook latents before they are combined with bottom-level latents.

Trainer Configs¶

TrainingConfig¶

Evaluate metric short names exposed by the terminal logger, display_metrics, and save_best_by are:

loss
recon
binary
bal
decor
sparse
topk
kl-sparse
contract
mmd
adv
disc
commit
book
kl
free-kl
codes
usage
ppl
dead
coll

Field	Meaning
`output_dir`	Directory where checkpoints, exported models, and metrics are written.
`epochs`	Maximum training epochs. Use `0` together with `patience` for early-stop-only training.
`patience`	Early stopping patience in epochs without validation improvement.
`learning_rate`	Base optimizer learning rate.
`optimizer_name`	Optimizer choice: `adam`, `adamw`, `sgd`, `rmsprop`, or `adagrad`.
`weight_decay`	Weight decay passed to the optimizer.
`lr_scheduler_type`	Learning-rate schedule: `none`, `constant`, `linear`, or `cosine`.
`warmup_epochs`	Number of epochs used for learning-rate warmup before the main scheduler takes over.
`grad_clip_norm`	If set, clip gradient norm to this value after backpropagation.
`batch_size`	Batch size used to build train/validation/test dataloaders.
`full_dataset_as_splits`	If `true`, reuse the full dataset for train, validation, and test instead of splitting it.
`device`	Device target such as `auto`, `cpu`, `cuda`, or `mps`.
`seed`	Global random seed for reproducibility.
`save_best_by`	Validation metric short names used to save best checkpoints. `loss` writes to `best/`; extra names such as `commit` write to `best-commit/`. Each metric is minimized exactly as named, with no fallback aliases.
`display_metrics`	Optional mapping from metric short names to booleans. Omitted names default to `true`; set a name to `false` to hide it from batch, epoch, and final evaluation logs.
`log_memory`	If `true`, emit `MEM` lines after train, validation, and final test passes. CUDA logs allocated, reserved, and peak memory; MPS logs best-effort allocated/reserved figures when available.
`show_only_best_epochs`	If `true`, only emit compact summaries for best-validation epochs instead of every epoch.
`advice`	If `true`, append trainer-generated tuning suggestions after the run.

AdversarialAutoencoderTrainingConfig¶

Field	Meaning
`discriminator_learning_rate`	Optional learning rate override for the discriminator optimizer.
`generator_learning_rate`	Optional learning rate override for the generator or autoencoder optimizer.
`discriminator_steps`	Number of discriminator updates per generator-style update.

FactorVariationalAutoencoderTrainingConfig¶

Field	Meaning
`discriminator_learning_rate`	Optional learning rate override for the FactorVAE discriminator.
`discriminator_steps`	Number of discriminator updates per training iteration.