Architecture Overview
This document provides a comprehensive overview of Rompy's architecture, explaining the advanced design patterns and component interactions. For basic concepts, please see the Getting Started Guide.
Rompy follows a modular, plugin-based architecture that separates concerns between configuration, execution, and post-processing:
graph TB
subgraph "User Interface Layer"
CLI["CLI Commands<br/>rompy.cli:main"]
API["Python API<br/>ModelRun instantiation"]
end
subgraph "Core Orchestration Layer"
ModelRun["ModelRun<br/>rompy.model:ModelRun"]
Config["Configuration System<br/>rompy.core.config:BaseConfig"]
TimeRange["TimeRange<br/>rompy.core.time:TimeRange"]
end
subgraph "Data Management System"
DataGrid["DataGrid<br/>rompy.core.data:DataGrid"]
DataBlob["DataBlob<br/>rompy.core.data:DataBlob"]
DataPoint["DataPoint<br/>rompy.core.data:DataPoint"]
Sources["Data Sources<br/>SourceFile, SourceIntake,<br/>SourceDatamesh, SourceWavespectra"]
Grid["Grid System<br/>BaseGrid, RegularGrid"]
end
subgraph "Execution Backend System"
LocalBackend["LocalRunBackend<br/>rompy.run:LocalRunBackend"]
DockerBackend["DockerRunBackend<br/>rompy.run.docker:DockerRunBackend"]
BackendConfig["Backend Configs<br/>LocalConfig, DockerConfig"]
end
subgraph "Processing & Support Systems"
Postprocessor["Postprocessors<br/>NoopPostprocessor"]
Pipeline["Pipeline Backends<br/>LocalPipelineBackend"]
Logging["Logging System<br/>RompyLogger + BoxFormatter"]
end
CLI --> ModelRun
API --> ModelRun
ModelRun --> Config
ModelRun --> TimeRange
ModelRun --> DataGrid
ModelRun --> BackendConfig
ModelRun --> Pipeline
Config --> Sources
DataGrid --> Sources
DataGrid --> Grid
DataBlob --> Sources
DataPoint --> Sources
BackendConfig --> LocalBackend
BackendConfig --> DockerBackend
LocalBackend --> ModelRun
DockerBackend --> ModelRun
Pipeline --> LocalBackend
Pipeline --> Postprocessor
ModelRun -.logs via.-> Logging
Advanced Architecture Patterns
1. Separation of Concerns
- Configuration (what to compute) is separate from execution (how to compute)
- Model setup is independent of execution environment
- Data sources are abstracted from model implementation
2. Composition over Inheritance
- Complex configurations built by composing simpler components
- Backends compose different capabilities rather than inheriting behavior
3. Type Safety with Pydantic
- Configuration objects validated at runtime
- Clear interfaces with type hints
- Automatic serialization/deserialization
4. Late Binding
- Execution backends resolved at runtime
- Enables the same configuration to run in different environments
Core Components
The following diagram shows the core component architecture, highlighting the Pydantic foundation and how core abstractions and backend abstractions inherit from RompyBaseModel.
graph TB
subgraph "Pydantic Foundation"
RompyBaseModel["RompyBaseModel<br/>(rompy.core.types)"]
end
subgraph "Core Abstractions"
ModelRun["ModelRun<br/>(rompy.model)"]
BaseConfig["BaseConfig<br/>(rompy.core.config)"]
TimeRange["TimeRange<br/>(rompy.core.time)"]
BaseGrid["BaseGrid<br/>(rompy.core.grid)"]
DataGrid["DataGrid<br/>(rompy.core.data)"]
DataBlob["DataBlob<br/>(rompy.core.data)"]
DataPoint["DataPoint<br/>(rompy.core.data)"]
SourceBase["SourceBase<br/>(rompy.core.source)"]
end
subgraph "Backend Abstractions"
BaseBackendConfig["BaseBackendConfig<br/>(rompy.backends.config)"]
BaseProcessor["BaseProcessor<br/>(rompy.postprocess)"]
BasePipeline["BasePipeline<br/>(rompy.pipeline)"]
end
RompyBaseModel --> ModelRun
RompyBaseModel --> BaseConfig
RompyBaseModel --> TimeRange
RompyBaseModel --> BaseGrid
RompyBaseModel --> DataGrid
RompyBaseModel --> DataBlob
RompyBaseModel --> DataPoint
RompyBaseModel --> SourceBase
RompyBaseModel --> BaseBackendConfig
RompyBaseModel --> BaseProcessor
RompyBaseModel --> BasePipeline
ModelRun --> TimeRange
ModelRun --> BaseConfig
ModelRun --> BaseBackendConfig
DataGrid --> SourceBase
DataBlob --> SourceBase
DataPoint --> SourceBase
DataGrid --> BaseGrid
Plugin Architecture
Rompy uses Python entry points for extensibility:
Entry Points
rompy.config: Model configuration classesrompy.run: Execution backendsrompy.postprocess: Post-processing modulesrompy.source: Data source implementations
Extending Functionality
New components can be added through the plugin system:
# Example: Adding a new model configuration
# Register in setup.py/pyproject.toml:
# [project.entry-points."rompy.config"]
# mymodel = "mypackage.config:MyModelConfig"
class MyModelConfig(BaseConfig):
model_type: Literal["mymodel"] = "mymodel"
# Custom model configuration attributes
Data Flow Architecture
This flowchart illustrates the data processing pipeline, from input sources through abstraction, processing, and into the model execution context.
flowchart TB
subgraph "Data Input Sources"
Files["Local Files<br/>NetCDF, CSV"]
Catalogs["Intake Catalogs<br/>YAML definitions"]
Datamesh["Datamesh API<br/>Remote service"]
end
subgraph "Data Abstraction Layer"
SourceFile["SourceFile<br/>rompy.core.source"]
SourceIntake["SourceIntake<br/>rompy.core.source"]
SourceDatamesh["SourceDatamesh<br/>rompy.core.source"]
SourceWavespectra["SourceWavespectra<br/>rompy.core.source"]
SourceFile --o DataBlob
SourceFile --o DataGrid
SourceIntake --o DataGrid
SourceDatamesh --o DataGrid
SourceWavespectra --o DataPoint
DataBlob["DataBlob<br/>Generic file container"]
DataGrid["DataGrid<br/>+ spatial/temporal filters"]
DataPoint["DataPoint<br/>Point time series"]
end
subgraph "Data Processing"
XArray["XArray Dataset<br/>Multi-dimensional arrays"]
Filters["Filter Operations<br/>sort, crop, subset"]
GridOps["Grid Operations<br/>bbox, boundary, meshgrid"]
end
subgraph "Model Execution Context"
ModelRun["ModelRun<br/>rompy.model:ModelRun"]
Config["BaseConfig<br/>rompy.core.config"]
Templates["Jinja2 Templates<br/>Input file generation"]
end
subgraph "Output Generation"
Staging["Staging Directory<br/>Generated inputs"]
ModelOutput["Model Output Files<br/>NetCDF, custom formats"]
Postprocess["Post-processed Results"]
end
Files --> SourceFile
Catalogs --> SourceIntake
Datamesh --> SourceDatamesh
DataGrid --> XArray
DataGrid --> Filters
DataGrid --> GridOps
XArray --> ModelRun
Filters --> XArray
GridOps --> XArray
Config --> Templates
Templates --> Staging
ModelRun --> Config
ModelRun --> Staging
Staging --> ModelOutput
ModelOutput --> Postprocess
Model Execution Pipeline
This sequence diagram details the three-stage execution lifecycle of ModelRun: Generate, Execution, and Post-processing.
sequenceDiagram
participant User
participant ModelRun["ModelRun<br/>rompy.model"]
participant Config["BaseConfig"]
participant TimeRange["TimeRange"]
participant DataGrid["DataGrid"]
participant Backend["Backend<br/>Local/Docker"]
participant Model["Model Process"]
participant Postprocessor
User->>ModelRun: "Initialize(run_id, period, output_dir, config)"
ModelRun->>TimeRange: "Validate and parse period"
TimeRange-->>ModelRun: "start, end, interval"
User->>ModelRun: "generate()"
ModelRun->>Config: "Render configuration"
Config->>DataGrid: "Fetch required data"
DataGrid->>DataGrid: "Apply filters (crop, sort, buffer)"
DataGrid-->>Config: "Filtered xarray.Dataset"
Config->>Config: "Render Jinja2 templates"
Config-->>ModelRun: "staging_dir with input files"
User->>ModelRun: "run(backend=BackendConfig)"
ModelRun->>Backend: "backend.get_backend_class()"
Backend-->>ModelRun: "LocalRunBackend or DockerRunBackend"
ModelRun->>Backend: "run(model_run, config, workspace_dir)"
alt "Local Execution"
Backend->>Model: "subprocess.run(command)"
Model-->>Backend: "exit code"
else "Docker Execution"
Backend->>Backend: "_prepare_image (build or pull)"
Backend->>Backend: "_get_run_command (with mpiexec if needed)"
Backend->>Backend: "_prepare_volumes (mount staging_dir)"
Backend->>Model: "docker.containers.run()"
Model-->>Backend: "container exit code"
end
Backend-->>ModelRun: "success/failure boolean"
alt "Post-processing (Optional)"
User->>ModelRun: "postprocess(processor='custom')"
ModelRun->>Postprocessor: "process(model_run)"
Postprocessor->>Postprocessor: "Validate outputs, create archives"
Postprocessor-->>ModelRun: "results dict"
ModelRun-->>User: "processed results"
end
Design Principles
1. Reproducibility
- Model configurations are fully serializable
- Same configuration produces identical results across environments
- Execution context tracked and logged
2. Extensibility
- Plugin system allows adding new models without changing core code
- Backend-agnostic design supports multiple execution environments
- Hook points available for custom processing
3. Environment Agnostic
- The same model configuration can run in multiple environments (local, HPC, cloud)
- Execution backends are resolved at runtime based on configuration
Module Structure
Core Modules
rompy.model: Contains the mainModelRunclassrompy.core: Basic abstractions (config, grid, data, source)rompy.backends: Backend implementations and configurationrompy.run: Run backend implementationsrompy.postprocess: Post-processing implementationsrompy.pipeline: Pipeline orchestrationrompy.logging: Logging and formatting framework
Key Classes and Their Responsibilities
| Class | Module | Responsibility |
|---|---|---|
| ModelRun | rompy.model | Main orchestrator for model runs |
| BaseConfig | rompy.core.config | Base configuration model |
| BaseGrid | rompy.core.grid | Base grid definition |
| DataGrid | rompy.core.data | Data grid abstraction |
| SourceBase | rompy.core.source | Base data source interface |
| LocalConfig | rompy.backends | Local backend configuration |
| DockerConfig | rompy.backends | Docker backend configuration |
| LocalRunBackend | rompy.run | Local execution backend |
| DockerRunBackend | rompy.run.docker | Docker execution backend |
Integration Points
Rompy integrates with various external systems:
Data Systems
- NetCDF files via xarray
- Intake catalogs for data discovery
- Various file formats through fsspec
Execution Systems
- Docker for containerized execution
- HPC systems via job schedulers
- Cloud platforms for distributed computing
Development Tools
- Pydantic for configuration validation
- Cookiecutter for template-based generation
- Standard Python logging
Future Architecture Considerations
Scalability
- Support for distributed model components
- Parallel execution of ensemble members
- Asynchronous job submission
Extensibility
- Additional plugin interfaces for custom workflows
- Machine learning integration for parameter estimation
- Enhanced visualization capabilities
Next Steps
- Review the Plugin Architecture for more details on extending Rompy
- Check the Developer Guide for advanced development topics
- Look at the API Reference for detailed class documentation