Configuration Reference
This page explains the meaning of every configuration parameter exposed through the unified YAML entrypoint.
For sibling autoindexers configuration fields such as num_bits, num_tables, use_median_thresholds, and num_iterations, see Indexers.
dataset:
name: ...
config: ...
model:
name: ...
config: ...
encoder:
name: ...
config: ...
decoder:
name: ...
config: ...
trainer:
...
Reading This Page
- "Base" sections define fields inherited by many concrete configs.
- Concrete model sections only list fields added on top of the base family.
- Backbone fields belong to
encoder.config and decoder.config.
- Trainer fields belong to the flat
trainer block.
Dataset Configs
BaseDatasetConfig
| Field |
Meaning |
max_vectors |
Optional cap on how many prepared samples are materialized from a downloadable embedding-style dataset. Use it to shorten experiments or smoke tests. |
GloVeDatasetConfig
| Field |
Meaning |
dim |
Embedding width to extract from the Stanford GloVe archive. Valid values are 50, 100, 200, and 300. |
max_vectors |
Optional cap on how many word vectors to keep from the chosen file. |
FastTextEnglishDatasetConfig
| Field |
Meaning |
max_vectors |
Optional cap on how many English fastText vectors are loaded from the source file. |
ConceptNetNumberbatchDatasetConfig
| Field |
Meaning |
max_vectors |
Optional cap on how many Numberbatch vectors are loaded. |
EncoderBackedTextDatasetConfig
Used by snli and multinli.
| Field |
Meaning |
encoder |
Text encoder model name used to materialize sentence embeddings, typically a Sentence-Transformers identifier. |
encoder_batch_size |
Batch size used while converting raw texts into embeddings during preprocessing. |
normalize_embeddings |
Whether to L2-normalize encoder outputs before saving them as dataset samples. |
max_vectors |
Optional cap on how many raw text examples are embedded. |
CLIPBackedDatasetConfig
Used by flickr30k.
| Field |
Meaning |
encoder |
CLIP backbone name, such as ViT-B-32. |
clip_pretrained |
CLIP checkpoint tag paired with the backbone, such as laion2b_s34b_b79k. |
encoder_batch_size |
Batch size used while extracting CLIP embeddings. |
clip_device |
Device override for CLIP preprocessing, such as cpu, cuda, or mps. |
normalize_embeddings |
Whether image/text embeddings are normalized to unit length before saving. |
clip_modality |
Which modality to materialize: image, text, or both. |
max_vectors |
Optional cap on how many records or caption embeddings are materialized. |
CIFAR10DatasetConfig
| Field |
Meaning |
max_examples |
Optional cap on how many CIFAR-10 images are retained after download and preprocessing. |
Backbone Configs
BaseAutoencoderModuleConfig
This is the structural base for built-in backbones. It does not currently add standalone YAML fields beyond what concrete modules define.
MLPModuleConfig
| Field |
Meaning |
hidden_dims |
Ordered list of layer widths. For an encoder, this is the path from input features to the module output. For an explicit decoder, this is the path from decoder input features back to sample space. |
activation |
Nonlinearity inserted after each non-final linear layer. Supported values: relu, gelu, silu, tanh. |
use_bias |
Whether each linear layer uses a bias term. |
dropout |
Dropout probability applied after non-final activations. |
norm |
Optional normalization after each non-final linear layer: none, layernorm, or batchnorm. |
weight_init |
Initialization strategy for linear weights: default, xavier_uniform, or xavier_normal. |
CNNModuleConfig
| Field |
Meaning |
channels |
Output channel count for each convolutional stage. |
kernel_sizes |
Kernel size per stage. Each value may be one integer or one (height, width) pair. |
strides |
Stride per stage. Each value may be one integer or one (height, width) pair. |
paddings |
Padding per stage. Each value may be one integer or one (height, width) pair. |
activation |
Nonlinearity after each non-final convolution. Supported values: relu, gelu, silu, tanh. |
use_bias |
Whether convolution layers use bias terms. |
transpose |
If true, build explicit upsampling layers with ConvTranspose2d. Use this for image decoders declared explicitly in YAML. |
| Field |
Meaning |
patch_size |
Patch height and width used to turn images into patch tokens, given as one integer or one (height, width) pair. |
hidden_dim |
Transformer token width after patch projection. |
num_layers |
Number of transformer encoder layers. |
num_heads |
Attention heads per transformer layer. hidden_dim must be divisible by this value. |
mlp_ratio |
Feed-forward expansion ratio inside each transformer block. |
dropout |
Dropout probability used inside transformer layers. |
use_bias |
Whether patch projection, output projection, and transformer linear layers use bias terms. |
Model Configs
BaseAutoencoderConfig
| Field |
Meaning |
latent_dim |
Width of the core latent space when the family uses a single latent width. In deterministic AEs this is the latent width after project_to_core; in quantized models it is usually the codebook embedding width. |
reconstruction_loss |
Reconstruction objective. Current built-in choices are intended for dense tensors and typically use mse. |
Deterministic AE Family
AutoencoderConfig
No extra fields beyond BaseAutoencoderConfig.
SemanticHashingAutoencoderConfig
| Field |
Meaning |
binarization |
Binary bottleneck mode. Use ste for hard straight-through binary codes during the forward pass, or tanh to train against soft bounded codes and threshold only at export/inference time. |
binarization_weight |
Weight on the penalty that pushes latent activations toward binary endpoints (-1 and +1). |
balance_weight |
Weight on the bit-balance regularizer that keeps each hash dimension centered around an even split across a batch. |
decorrelation_weight |
Weight on the penalty that suppresses covariance between different bits so the learned code dimensions carry less redundant information. |
DenoisingAutoencoderConfig
| Field |
Meaning |
noise_type |
Corruption mode applied to inputs before reconstruction. |
noise_std |
Standard deviation for additive Gaussian noise when that corruption mode is used. |
masking_ratio |
Fraction of features dropped or masked when a masking corruption mode is used. |
ContractiveAutoencoderConfig
| Field |
Meaning |
contractive_weight |
Strength of the Jacobian contraction penalty added to the reconstruction objective. |
SparseAutoencoderConfig
| Field |
Meaning |
sparsity_weight |
Penalty weight encouraging sparse latent activations. |
target_activation |
Desired mean activation level used by the sparsity penalty. |
TopKSparseAutoencoderConfig
| Field |
Meaning |
topk |
Number of latent units retained per sample when using top-k sparsification. |
KLSparseAutoencoderConfig
| Field |
Meaning |
sparsity_weight |
Weight for the KL sparsity term. |
target_activation |
Target average activation used inside the KL sparsity penalty. |
WassersteinAutoencoderConfig
| Field |
Meaning |
mmd_weight |
Strength of the MMD regularizer matching latent codes to the chosen prior. |
kernel_bandwidths |
Kernel bandwidth list used by the MMD estimator. |
AdversarialAutoencoderConfig
| Field |
Meaning |
adversarial_weight |
Strength of the adversarial latent-matching objective. |
discriminator_hidden_dims |
Hidden widths for the latent discriminator network. |
Variational Family
BaseVariationalAutoencoderConfig
| Field |
Meaning |
kl_weight |
Multiplier on the KL term after warmup is complete. |
free_bits |
Minimum KL contribution retained per latent dimension or block to reduce posterior collapse. |
kl_warmup_epochs |
Number of epochs over which the KL weight ramps from kl_start_weight to kl_weight. |
kl_start_weight |
Initial KL multiplier before warmup progresses. |
use_mean_in_eval |
If true, evaluation and export use posterior means instead of sampling noise. |
VariationalAutoencoderConfig
No extra fields beyond the base variational family.
BetaVariationalAutoencoderConfig
No new fields. Use kl_weight to express beta-style scaling.
DenoisingVariationalAutoencoderConfig
| Field |
Meaning |
noise_type |
Corruption mode for the denoising encoder input. |
noise_std |
Standard deviation for additive Gaussian corruption. |
masking_ratio |
Fraction of masked features when using masking noise. |
HierarchicalVariationalAutoencoderConfig
| Field |
Meaning |
top_latent_dim |
Width of the upper latent level in hierarchical VAE models. |
VampPriorVariationalAutoencoderConfig
| Field |
Meaning |
num_pseudo_inputs |
Number of learned pseudo-inputs used to define the VampPrior mixture. |
| Field |
Meaning |
mmd_weight |
Weight for the additional MMD regularizer. |
kernel_bandwidths |
Bandwidth list for the MMD kernel mixture. |
DIPVariationalAutoencoderConfig
| Field |
Meaning |
dip_type |
Which DIP-VAE covariance penalty variant to use. |
lambda_diag |
Weight on diagonal covariance matching. |
lambda_offdiag |
Weight on off-diagonal covariance suppression. |
BetaTCVariationalAutoencoderConfig
| Field |
Meaning |
tc_weight |
Weight on the total-correlation penalty. |
FactorVariationalAutoencoderConfig
| Field |
Meaning |
tc_weight |
Weight on the total-correlation penalty estimated through the discriminator. |
discriminator_hidden_dims |
Hidden widths for the auxiliary discriminator. |
Quantized Family
BaseVectorQuantizedAutoencoderConfig
| Field |
Meaning |
codebook_size |
Number of discrete codes per learned codebook. |
commitment_weight |
Weight that pulls encoder outputs toward selected codes. |
codebook_weight |
Weight that pulls learned code vectors toward encoder outputs when explicit codebook loss is used. |
assignment_strategy |
How discrete indices are selected from codebook distances: nearest or sinkhorn. |
sinkhorn_epsilon |
Entropic regularization strength for Sinkhorn assignment. It may be one float shared across codebooks or one list with one value per codebook slot. A slot set to 0.0 falls back to nearest-neighbor assignment. |
sinkhorn_iters |
Number of Sinkhorn normalization iterations when assignment_strategy is sinkhorn. |
kmeans_init |
Whether learned codebooks are initialized from the first training batch of encoder latents instead of uniform random weights. |
kmeans_iters |
Number of Lloyd iterations used during codebook k-means initialization. |
use_ema_codebook |
Whether learned codebooks are updated by exponential moving averages instead of gradient updates. |
ema_decay |
EMA decay factor for codebook statistics. Lower values adapt faster; higher values are smoother. |
ema_epsilon |
Numerical stabilizer used when normalizing EMA cluster sizes. |
dead_code_reset |
Whether rarely used codes are reinitialized at the end of training epochs or steps. |
dead_code_threshold |
Usage-count threshold below which a code is considered dead for reset purposes. |
VectorQuantizedAutoencoderConfig
No extra fields beyond the base quantized family.
GumbelQuantizedAutoencoderConfig
| Field |
Meaning |
temperature |
Initial Gumbel-softmax temperature. |
min_temperature |
Lower bound for the annealed temperature. |
anneal_rate |
Multiplicative decay controlling how fast the temperature cools. |
FiniteScalarQuantizedAutoencoderConfig
| Field |
Meaning |
num_levels |
Number of scalar quantization levels per latent feature. |
ResidualFiniteScalarQuantizedAutoencoderConfig
| Field |
Meaning |
num_levels |
Number of scalar levels per residual quantizer. |
num_quantizers |
Number of residual scalar quantization stages. |
ProductQuantizedAutoencoderConfig
| Field |
Meaning |
num_codebooks |
Number of product-quantization subspaces. latent_dim must be divisible by this value. |
OptimizedProductQuantizedAutoencoderConfig
No extra fields beyond ProductQuantizedAutoencoderConfig.
OPQVAE keeps the same codebook and assignment settings as PQVAE, but learns an orthogonal rotation over the latent space before subspace splitting. In practice this means:
num_codebooks still controls how many PQ subspaces are used.
sinkhorn_epsilon still accepts either one shared value or one value per codebook slot.
kmeans_init, use_ema_codebook, and dead_code_reset behave the same way as in PQVAE.
ResidualQuantizedAutoencoderConfig
| Field |
Meaning |
num_quantizers |
Number of residual vector quantization stages applied in sequence. |
HierarchicalVectorQuantizedAutoencoderConfig
| Field |
Meaning |
top_latent_dim |
Width of the top-level codebook latents before they are combined with bottom-level latents. |
Trainer Configs
TrainingConfig
Evaluate metric short names exposed by the terminal logger, display_metrics, and save_best_by are:
loss
recon
binary
bal
decor
sparse
topk
kl-sparse
contract
mmd
adv
disc
commit
book
kl
free-kl
codes
usage
ppl
dead
coll
| Field |
Meaning |
output_dir |
Directory where checkpoints, exported models, and metrics are written. |
epochs |
Maximum training epochs. Use 0 together with patience for early-stop-only training. |
patience |
Early stopping patience in epochs without validation improvement. |
learning_rate |
Base optimizer learning rate. |
optimizer_name |
Optimizer choice: adam, adamw, sgd, rmsprop, or adagrad. |
weight_decay |
Weight decay passed to the optimizer. |
lr_scheduler_type |
Learning-rate schedule: none, constant, linear, or cosine. |
warmup_epochs |
Number of epochs used for learning-rate warmup before the main scheduler takes over. |
grad_clip_norm |
If set, clip gradient norm to this value after backpropagation. |
batch_size |
Batch size used to build train/validation/test dataloaders. |
full_dataset_as_splits |
If true, reuse the full dataset for train, validation, and test instead of splitting it. |
device |
Device target such as auto, cpu, cuda, or mps. |
seed |
Global random seed for reproducibility. |
save_best_by |
Validation metric short names used to save best checkpoints. loss writes to best/; extra names such as commit write to best-commit/. Each metric is minimized exactly as named, with no fallback aliases. |
display_metrics |
Optional mapping from metric short names to booleans. Omitted names default to true; set a name to false to hide it from batch, epoch, and final evaluation logs. |
log_memory |
If true, emit MEM lines after train, validation, and final test passes. CUDA logs allocated, reserved, and peak memory; MPS logs best-effort allocated/reserved figures when available. |
show_only_best_epochs |
If true, only emit compact summaries for best-validation epochs instead of every epoch. |
advice |
If true, append trainer-generated tuning suggestions after the run. |
AdversarialAutoencoderTrainingConfig
| Field |
Meaning |
discriminator_learning_rate |
Optional learning rate override for the discriminator optimizer. |
generator_learning_rate |
Optional learning rate override for the generator or autoencoder optimizer. |
discriminator_steps |
Number of discriminator updates per generator-style update. |
FactorVariationalAutoencoderTrainingConfig
| Field |
Meaning |
discriminator_learning_rate |
Optional learning rate override for the FactorVAE discriminator. |
discriminator_steps |
Number of discriminator updates per training iteration. |