Satellite Testbed for Alignment and Registration Benchmarking.
Hybrid CPU + GPU + FPGA image registration benchmark combining intensity-based (Faber, ETNA_Multi) and feature-based (LightGlue, XFeat, MINIMA, RIFT2, SIFT/SURF/ORB) pipelines into a single, reproducible evaluation harness.
STAR-Bench was designed to compare classical, learning-based, and hardware-accelerated registration methods on satellite image pairs (e.g. optical-SAR, visible-infrared, multi-band optical) and to report per-method accuracy, runtime, and power figures side by side across three target platforms:
- CPU — any x86_64 / aarch64 host, pure PyTorch + OpenCV fallback.
- GPU — CUDA-accelerated path; tuned for NVIDIA Jetson (Orin / Xavier) as well as desktop GPUs.
- FPGA — Xilinx Kria (KV260 / KR260) via the
wax_mi_accelPYNQ overlay for the Mutual-Information kernel.
STAR-Bench evaluates one registration backend per run entry (SIFT, XFeat,
LightGlue, ETNA_Multi, ...). Multi-stage chains — e.g. a coarse XFeat
alignment followed by a pyramidal ETNA_Multi refinement — live inside ETNA
(see ETNA/pipeline.py). ETNA is the bundle that packages pyramidal Faber,
XFeat, and their combinations behind a single PipelineExecutor API.
Claudio Di Salvo, Emanuele Del Sozzo, Giuseppe Sorrentino, Eleonora D'Arnese, Paolo Panicucci, Davide Conficconi.
STAR-Bench/
├── starbench-main.py # Main entry point
├── starbench_processing.py # High-level per-folder processing loop
├── starbench_registration.py # Single-backend registration orchestration
├── starbench_detectors.py # Strategy pattern: one Detector per backend
├── starbench_metrics.py # Similarity / registration-quality metrics
├── starbench_utils.py # I/O helpers, transform composition
├── starbench_reporting.py # CSV / summary report writers
├── starbench_visualization.py # Optional match / warp visualisations
├── starbench_augmentation.py # Offline dataset augmentation driver
├── starbench_performance_evaluation.py# Pareto frontier (latency/accuracy/energy)
├── create_final_table.py # Aggregate multiple runs into a final table
├── fabersw-unique.py # Legacy standalone ETNA driver
├── run.sh # Example experiment launcher
├── requirements.txt # Python dependencies
├── .gitmodules # Submodule pinning
│
├── ETNA/ # *** ETNA bundle (pyramidal Faber + pipeline runner) ***
│ ├── __init__.py # Public API
│ ├── hyperparams.py # Optimiser hyper-parameters
│ ├── fpga_accelerator.py # PYNQ overlay driver for wax_mi_accel
│ ├── registrators_pyramidal.py # EtnaMultiMetric (pyramidal MI / MSE / CC)
│ ├── optimizers_pyramidal.py # EtnaMultiPowell / EtnaMultiOnePlusOne
│ ├── pipeline.py # PipelineExecutor — chains XFeat + ETNA_Multi etc.
│ ├── README.md # ETNA bundle documentation
│ ├── etna.bit # Xilinx Kria bitstream
│ └── etna.hwh # Hardware description file
│
├── wrappers/ # Thin shims around third-party submodules
│ ├── __init__.py
│ ├── minima_wrapper.py # MINIMA (RoMa / LoFTR / SP+LG)
│ ├── lightglue_wrapper.py # LightGlue + SuperPoint / DISK / ALIKED / ...
│ ├── rift2_wrapper.py # RIFT2 multimodal rotation-invariant
│ ├── xfeat_wrapper.py # XFeat (accelerated_features)
│ ├── faber_wrapper.py # Faber (non-pyramidal) from necst/faber_fpga
│ └── _faber_local/ # Local fork of the non-pyramidal Faber
│
└── (submodules, pulled via git submodule update --init)
├── lightglue/ # github.com/cvg/LightGlue
├── MINIMA/ # github.com/LSXI7/MINIMA
├── rift2/ # github.com/canyagmur/RIFT2-MULTIMODAL-MATCHING-ROTATION-PYTHON
├── accelerated_features/ # github.com/verlab/accelerated_features
└── faber_fpga/ # github.com/necst/faber_fpga (non-pyramidal Faber)
git clone --recurse-submodules <this-repo>
cd STAR-Bench
# or, after a plain clone:
git submodule update --init --recursive| Backend | Kind | Notes |
|---|---|---|
sift, surf, orb |
Classical | OpenCV; SURF needs opencv-contrib-nonfree |
lightglue_<ext> |
Learning | ext = superpoint, disk, aliked, doghardnet, sift |
xfeat |
Learning | Uses accelerated_features |
minima_<v> |
Learning | v = roma, loftr, sp_lg; checkpoints in MINIMA/weights/ |
rift |
Multimodal classical | RIFT2 rotation-invariant descriptors |
faber_<opt>_<metric>[_fpga] |
Intensity (non-pyramidal) | Consumes necst/faber_fpga via wrappers/faber_wrapper.py |
etna_multi_<opt>[_fpga] |
Intensity (pyramidal) | ETNA_Multi, shipped in-tree under ETNA/ |
etna_<opt>_<DL>[_fpga] |
Intensity (pyramidal) + DL | ETNA_Multi (opt) coarse + DL refinement (via ETNA.pipeline.PipelineExecutor) |
etna_<DL>_<opt>[_fpga] |
DL + Intensity (pyramidal) | DL coarse + ETNA_Multi (opt) refinement (via ETNA.pipeline.PipelineExecutor) |
opt ∈ {powell, oneplusone}, DL ∈ {xfeat, lightglue_<ext>, minima_<v>}.
Append _fpga to offload the Mutual Information kernel to the wax_mi_accel
overlay.
STAR-Bench runs one backend per entry from the CLI. Multi-stage
chaining (e.g. XFeat coarse alignment followed by ETNA_Multi refinement)
is an advanced, Python-only API — it is not exposed by
starbench-main.py and must be invoked by importing
ETNA.pipeline.PipelineExecutor directly:
from ETNA import PipelineExecutor, StageCache
from starbench_detectors import DetectorFactory
exec_ = PipelineExecutor(
pipeline_config="xfeat,etna_multi_powell_mi_fpga",
detector_factory=DetectorFactory(),
device="cpu",
cache=StageCache(), # optional; reuses detector instances across pairs
)
result = exec_.run("fixed.png", "moving.png")StageCache is an optional in-process cache for expensive detectors
(XFeat, LightGlue, MINIMA checkpoints). It is only useful when you call
exec_.run() in a loop; single-shot Python callers can drop the
cache= argument entirely.
python starbench-main.py \
-i /path/to/image_sets \
-o results/ \
-p etna_multi \
--intensity-engines etna_multi \
--metric mi \
--fpga \
--visualizeUse -p all to sweep every registered backend. Combine with --tag to
filter which image sets are processed. See python starbench-main.py --help
for the full option list.
starbench_augmentation.py is a standalone driver that turns a raw
satellite dataset (SAR/VV, SAR/VH, optical RGB + band splits) into the
SO-numbered registration pairs expected by STAR-Bench. It applies a
random rigid transform per pair and writes the TRE-ready ground-truth
matrix T_reg_gt alongside the full-image T in the .mat payload.
python starbench_augmentation.pyEdit the BASE_CONFIG / NOMINAL_CONFIG / OFF_NOMINAL_CONFIG /
ANOMALY_CONFIG dictionaries at the top of the module to pick input /
output directories, difficulty level, and the random seed for
reproducible runs.
Once at least one run has finished, starbench_performance_evaluation.py
computes the Pareto frontiers across latency (Avg ProcTime(s)),
accuracy (Avg Final RMSE) and energy (Energy (calc)) directly
from the summary_report_overall.csv files produced by the harness:
python starbench_performance_evaluation.py \
--input results/results_YYYYMMDD_HHMMSS \
--output results/results_YYYYMMDD_HHMMSS/pareto \
--plot --per-datasetIt writes the 3D frontier plus every 2D projection, an annotated full
table with boolean is_pareto_* columns, and — when --plot is given
and matplotlib is installed — scatter plots with the Pareto points
highlighted. Runs without power logging (no Energy (calc) column)
automatically fall back to the latency-vs-accuracy 2D frontier instead
of failing.
create_final_table.py --compute-pareto wires the same analysis as the
final step of the aggregation table, so a full evaluation + Pareto
breakdown fits in a single command.
STAR-Bench targets three device classes out of the box. Pick the one you
want with the global --device flag (and optionally --fpga):
| Target | Flag | Notes |
|---|---|---|
| CPU | --device cpu |
Pure PyTorch + OpenCV, available everywhere |
| GPU (desktop) | --device cuda |
Any CUDA-capable card; used by XFeat, LightGlue, MINIMA, ETNA_Multi MI |
| GPU (Jetson) | --device cuda |
Same CUDA path, validated on NVIDIA Jetson Orin / Xavier |
| FPGA (Xilinx Kria) | --device cpu --fpga |
Offloads the MI kernel to wax_mi_accel, runs everything else on CPU |
--device auto picks cuda when a CUDA runtime is visible, otherwise
cpu. On a Jetson this lets the same command line run unchanged.
Every learning-based backend (XFeat, LightGlue, MINIMA, RIFT2) runs on CUDA
when --device cuda is selected. ETNA_Multi additionally executes its
Mutual-Information metric on the GPU via PyTorch kernels, so the full
coarse-to-fine pyramidal search benefits from CUDA acceleration. The path
has been profiled on NVIDIA Jetson Orin / Xavier modules to report runtime
and power side-by-side with the FPGA numbers.
The FPGA path targets the wax_mi_accel IP block on Xilinx Kria SOMs via
PYNQ. The bitstream (ETNA/etna.bit) and hardware handoff (ETNA/etna.hwh)
are loaded by FaberFPGAAccelerator (see ETNA/fpga_accelerator.py).
If PYNQ is unavailable the framework transparently falls back to the
PyTorch/CUDA software implementation of Mutual Information, so the same
backend name can be benchmarked on CPU, GPU, or FPGA by toggling the
--device / --fpga flags.
Third-party backends are consumed as unmodified git submodules. Any
STAR-Bench-specific glue (for instance the dict-returning
test_relative_pose_demo that MINIMA needs to integrate with the pipeline)
lives under wrappers/. This keeps upstream clean and makes upgrades a
simple git submodule update --remote.
The two Faber flavours are kept in separate places on purpose:
- Non-pyramidal
faber— pulled as a submodule fromgithub.com/necst/faber_fpgaand exposed viawrappers/faber_wrapper.py. STAR-Bench adaptations (optional FPGA offload, simplifiedcompute()signature) live inwrappers/_faber_local/as a local fork so that the upstream submodule stays pristine. - Pyramidal
etna_multi— lives in-tree underETNA/because STAR-Bench needs tight control over the multi-resolution scheduler and FPGA overlay coupling.
@article{disalvo2026satelliteautonomy,
author = {Di Salvo, Claudio and Del Sozzo, Emanuele and Sorrentino, Giuseppe and D'Arnese, Eleonora and Panicucci, Paolo and Conficconi, Davide},
title = {Are We Ready to Enable Satellite Autonomy Through On-Board Image Registration?},
journal = {Proc. ACM Meas. Anal. Comput. Syst.},
volume = {10},
number = {2},
pages = {30:1--30:41},
month = jun,
year = {2026},
doi = {10.1145/3805628}
}