Component Selection Patterns#
One of the most important architectural decisions in rompy is the use of two different selection patterns for different types of functionality. Both patterns use entry points for discovery, but differ in when and how selection occurs. Understanding when and why to use each pattern is crucial for effective rompy development.
Overview#
rompy uses two distinct approaches for component selection:
Pydantic Discriminated Union Pattern for model configurations (
CONFIG_TYPES
)Runtime String Selection Pattern for execution backends (run, postprocess, pipeline)
Both patterns use Python entry points for plugin discovery, but serve fundamentally different purposes. This document explains the rationale behind this dual approach and provides guidance on when to use each pattern.
The Two Patterns#
Both patterns use Python entry points for plugin discovery, but differ in when and how selection occurs.
Pydantic Discriminated Union Pattern (CONFIG_TYPES)#
Model configurations use entry points to build a discriminated union:
from typing import Union
from pydantic import Field
from rompy.utils import load_entry_points
# Load config types from entry points at import time
CONFIG_TYPES = load_entry_points("rompy.config")
class ModelRun(RompyBaseModel):
config: Union[CONFIG_TYPES] = Field(
default_factory=BaseConfig,
description="The configuration object",
discriminator="model_type", # Selection via discriminator field
)
Selection happens at model instantiation time via the model_type
discriminator field in the configuration data.
Runtime String Selection Pattern (Backends)#
Execution backends use entry points for runtime selection:
from rompy.utils import load_entry_points
# Load backends from entry points at import time
def _load_backends():
run_backends = {}
for backend in load_entry_points("rompy.run"):
name = backend.__name__.lower().replace('runbackend', '')
run_backends[name] = backend
return run_backends
RUN_BACKENDS = _load_backends()
def run(self, backend: str = "local", **kwargs) -> bool:
# Selection happens at execution time via string parameter
backend_class = RUN_BACKENDS[backend]
backend_instance = backend_class()
return backend_instance.run(self, **kwargs)
Selection happens at execution time via string parameters passed to methods.
Comparative Analysis#
Pydantic Discriminated Union Approach#
✅ Strengths
- Strong Type Safety
Full Pydantic validation happens at model instantiation time, catching configuration errors early in the workflow.
# Validation happens here - invalid configs rejected immediately model = ModelRun(config={"model_type": "swan", "grid": {...}})
- IDE Support & Developer Experience
Excellent autocomplete, type checking, and refactoring support in modern IDEs.
- Serialization & Reproducibility
Configuration is part of the model state and fully serializable, enabling reproducible science.
# Complete model configuration saved as YAML config: model_type: swan grid: x0: 115.68 y0: -32.76 # ... full configuration preserved
- Schema Documentation
Clear, declarative schema with automatic documentation generation and validation rules.
- Immutability
Once instantiated, configurations are immutable, preventing accidental modification during execution.
- Plugin Support
Uses entry points for discovery, allowing third-party configuration types.
# Third-party configs discovered via entry points CONFIG_TYPES = load_entry_points("rompy.config")
❌ Limitations
- Selection Timing
Configuration type must be known at model instantiation time.
- State Coupling
Configuration choice becomes part of persistent model state.
- Validation Completeness
All possible configurations must be validated upfront, even if unused.
Runtime String Selection Approach#
✅ Strengths
- Execution-Time Flexibility
Backend choice can be made based on runtime conditions and environment.
# Different backends for different environments backend = "docker" if has_docker() else "local" model.run(backend=backend)
- Operational Independence
Backend choice is independent of scientific configuration.
- Environment Adaptation
Same model configuration can use different backends based on deployment environment.
# Same config, different execution strategies model.run(backend="local") # Development model.run(backend="slurm") # HPC cluster model.run(backend="k8s") # Cloud deployment
- Plugin Support
Uses entry points for discovery, allowing third-party backends.
# Third-party backends discovered via entry points RUN_BACKENDS = dict(load_entry_points("rompy.run"))
- Lazy Instantiation
Only instantiate backends when actually needed.
- Optional Dependencies
Graceful handling when optional backends aren’t available.
❌ Limitations
- Reduced Type Safety
Backend selection via strings means errors are only caught at execution time.
# Error only discovered when run() is called model.run(backend="typo_backend") # ValueError at runtime
- Late Validation
Backend availability and parameter validation happens during execution, not configuration.
- Non-Serializable Choice
Backend choice is not part of the serializable model configuration.
- Discovery Complexity
Harder to know what backends are available during development.
Why Different Patterns for Different Concerns?#
The architectural decision reflects the fundamental difference in purpose between these two types of selection, despite both using entry points:
State vs Behavior Separation#
Configuration Represents Persistent Domain State
Model configurations encode scientific and mathematical knowledge that must be preserved:
What physics to simulate (wave propagation, hydrodynamics)
Where to simulate it (grid definition, boundaries)
When to simulate it (time periods, forcing data)
This domain state needs:
Strong validation (incorrect physics parameters = invalid science)
Reproducibility (same config = same results)
Serialization (configurations must be saveable and shareable)
Immutability (configurations shouldn’t change during execution)
Early validation (catch errors before expensive computation starts)
Execution Represents Runtime Behavior
Execution backends handle operational and deployment behavior:
How to run the model (local process, container, HPC queue)
Where to run it (laptop, cluster, cloud)
With what resources (CPU cores, memory, time limits)
This runtime behavior needs:
Environment flexibility (different options in different deployments)
Late binding (choose backend based on current conditions)
Optional availability (some backends may not be installed)
Operational parameters (that vary per execution, not per model)
Ephemeral choice (backend selection shouldn’t be saved with scientific config)
Practical Examples#
Configuration Example (Discriminated Union)#
Scientific parameters are validated, serialized, and preserved:
# This represents scientific intent - must be validated and preserved
config:
model_type: swan # ← Discriminator field for Pydantic union selection
grid:
x0: 115.68 # Geographic coordinate - must be valid
y0: -32.76 # Geographic coordinate - must be valid
dx: 0.001 # Grid resolution - affects numerical accuracy
dy: 0.001 # Grid resolution - affects numerical accuracy
physics:
friction: MAD # Physics model choice - affects results
friction_coeff: 0.1 # Physics parameter - must be scientifically valid
The model_type
field triggers Pydantic’s discriminated union to select the correct configuration class. Any error in these parameters would produce scientifically invalid results, so they must be validated at instantiation time.
Execution Example (Runtime String Selection)#
Operational parameters vary by environment and are not serialized:
# Same config object, different execution environments
config_data = load_yaml("scientific_config.yaml") # Contains model_type discriminator
model = ModelRun(**config_data) # Pydantic selects config class
# Development environment - runtime string selection
model.run(
backend="local", # ← String parameter for runtime selection
timeout=600,
env_vars={"OMP_NUM_THREADS": "2"}
)
# Production HPC environment - same config, different backend
model.run(
backend="slurm", # ← Different string, same config
partition="compute",
nodes=4,
time_limit="24:00:00",
env_vars={"OMP_NUM_THREADS": "16"}
)
# Cloud deployment - same config, cloud backend
model.run(
backend="kubernetes", # ← Runtime choice, not saved
image="rompy/swan:v1.2.3",
resources={"cpu": "8", "memory": "32Gi"}
)
The same scientific configuration (with its model_type
discriminator) runs in all environments, but with different runtime backend selections that are not part of the serializable state.
Design Patterns in Practice#
When to Use Discriminated Union Pattern#
Use the discriminated union pattern when extending rompy with components that need to be:
- ✅ Part of Serializable State
Components that must be saved, shared, and reproduced exactly.
- ✅ Validated at Instantiation
Components where early validation prevents expensive failures later.
- ✅ Scientifically Critical
Components where incorrect parameters lead to invalid scientific results.
- ✅ Model Configuration Types
New model types (SCHISM, XBeach, FVCOM) that define scientific computation.
- ✅ Grid Definitions
New grid types that define spatial discretization approaches.
- ✅ Physics Parameterizations
New physics options that require parameter validation and documentation.
Example - Adding a new model type with entry point registration:
class XBeachConfig(BaseConfig):
"""XBeach model configuration."""
model_type: Literal["xbeach"] = "xbeach" # Discriminator field
# Validated scientific parameters
grid: XBeachGrid
physics: XBeachPhysics
outputs: XBeachOutputs
# Strong validation rules
@validator('physics')
def validate_physics_consistency(cls, v, values):
# Ensure physics parameters are scientifically consistent
return v
# Register via entry points for discovery
[project.entry-points."rompy.config"]
xbeach = "mypackage.config:XBeachConfig"
When to Use Runtime String Selection Pattern#
Use the runtime string selection pattern when extending rompy with components that are:
- ✅ Environment-Specific
Components that vary based on where the code is running.
- ✅ Operationally Focused
Components that handle execution, processing, or infrastructure concerns.
- ✅ Optional Dependencies
Components that may not be available in all environments.
- ✅ Execution Environments
New ways to run models (HPC schedulers, cloud platforms, containers).
- ✅ Output Processing
New analysis, visualization, or data transformation capabilities.
- ✅ Workflow Orchestration
New ways to coordinate multi-stage model workflows.
Example - Adding a new execution backend with entry point registration:
class SlurmBackend:
"""Execute models via SLURM job scheduler."""
def run(self, model_run, partition="compute", nodes=1, **kwargs):
"""Submit model to SLURM queue."""
# Generate SLURM job script
job_script = self._create_slurm_script(
model_run, partition, nodes, **kwargs
)
# Submit job and monitor execution
job_id = self._submit_job(job_script)
return self._wait_for_completion(job_id)
# Register via entry points for discovery
[project.entry-points."rompy.run"]
slurm = "rompy_hpc.backends:SlurmBackend"
Best Practices#
For Discriminated Union Extensions (Configuration)#
- Comprehensive Validation
Implement validators that check scientific and mathematical consistency.
@validator('grid_resolution') def validate_resolution(cls, v): if v <= 0: raise ValueError("Grid resolution must be positive") if v > 0.1: warnings.warn("Very coarse resolution may affect accuracy") return v
- Clear Documentation
Provide detailed docstrings explaining scientific meaning and valid ranges.
- Immutable Design
Avoid mutable state that could change during model execution.
- Schema Versioning
Plan for configuration schema evolution and backward compatibility.
- Entry Point Registration
Register new configuration types via entry points for automatic discovery.
[project.entry-points."rompy.config"] mymodel = "mypackage.config:MyModelConfig"
For Runtime String Selection Extensions (Backends)#
- Robust Error Handling
Handle missing dependencies and environment issues gracefully.
def run(self, model_run, **kwargs): try: return self._execute_backend(model_run, **kwargs) except ImportError as e: raise RuntimeError(f"Backend dependencies not available: {e}") except Exception as e: logger.exception(f"Backend execution failed: {e}") return False
- Environment Detection
Check if the backend can run in the current environment.
- Parameter Validation
Validate backend-specific parameters at execution time.
- Resource Cleanup
Ensure proper cleanup of resources on success and failure.
- Entry Point Registration
Register new backends via entry points for automatic discovery.
[project.entry-points."rompy.run"] mybackend = "mypackage.backends:MyBackend"
Conclusion#
The dual selection pattern in rompy reflects a sophisticated understanding of different types of component selection requirements:
State-based selection (configurations) needs early validation, serialization, and reproducibility
Behavior-based selection (backends) needs late binding, environment adaptation, and optional availability
Both patterns use entry points for plugin discovery, but differ fundamentally in when selection occurs and what gets serialized:
Configurations: Selected at instantiation time via discriminator fields, become part of persistent state
Backends: Selected at execution time via string parameters, remain ephemeral operational choices
This architectural decision enables rompy to maintain scientific rigor while supporting diverse computational environments. When extending rompy, carefully consider whether your extension represents:
Persistent domain state → Use discriminated unions with entry point discovery
Runtime behavior choice → Use string selection with entry point discovery
The pattern demonstrates that the same plugin discovery mechanism can serve different selection patterns, and a well-designed system should choose the selection timing and state management approach that best fits the component’s purpose.
Further Reading#
../extending/custom_backends - Practical guide to creating new backends
../extending/custom_models - Guide to adding new model configurations
../api_design/entry_points - Technical details on the entry point system
configuration_patterns - Deep dive into configuration design patterns