Evaluating Map Fidelity Under Flight-Like Power Envelopes
Problem
A flight-class planetary cave mission has to ship a mapping instrument inside a power envelope that is not negotiable once the spacecraft is built. The NASA Small Spacecraft State-of-the-Art survey of power subsystems lays out baseline power budgets for flight-class vehicles, and the NASA Science Basics of Space Flight chapter on onboard systems sets the broader bookends: 300 W for a small spacecraft, 2.5 kW for a larger flagship, with the instrument slice of that budget typically well under 10 percent. A cave-mapping payload that wants a fair shot at an Artemis-era or NIAC-funded mission needs to demonstrate fidelity at 0.5 to 3 W, not at laboratory-bench wattages.
The pressure compounds for cave-specific concepts because tube-interior operations have additional power overhead that surface operations do not. Sub-Watt sensor budgets for passive acoustic recording leave very little margin for processing, communication, or motion control, and a cave rover that wants to do mapping plus locomotion plus relay communication has to share a power budget across functions that surface rovers can run sequentially. The cave-mission power envelope is therefore tighter at the per-function level than the headline wattage suggests, and a fidelity benchmark that does not account for the multi-function competition will overstate the achievable map quality by a meaningful margin.
The problem is that "fidelity at flight power" is rarely a single number. The instrument's residual-to-truth curve bends in ways that depend on environment, duty cycle, and stitching strategy. The ESA power systems page reinforces that a flight mission's power envelope is fixed at system level, and the MDPI Aerospace methodology for power contingency and operational envelope analysis walks through how operational envelopes get defined upstream of any one instrument. Planetary analog researchers evaluating map fidelity need a benchmark that makes those tradeoffs explicit rather than hiding them in a single "passed" checkmark.
Power envelopes also drive the choice between teleoperated and autonomous operating modes. The same wattage delivers different mapping fidelity under autonomous mapping than under teleoperated supervisory control, because autonomous operation can stretch the wattage across periods when ground-in-the-loop teleoperation would idle the rover. Mission planners need to evaluate the power-vs-fidelity tradeoff under each ops model, not just the wattage in isolation, because the operating-mode choice can shift the achievable fidelity by a factor as significant as the wattage envelope itself.
Solution
EchoQuilt's fidelity benchmark runs the same stitching engine across three flight-representative duty cycles — 0.5 W, 1.8 W, and 3 W — and reports the resulting point-cloud residual as a curve, not a point. The curve is the artifact mission planners actually need. At 0.5 W, the quilt runs passive-only acoustic ingestion with aggressive temporal subsampling; residuals in our Mauna Loa replay stayed under 12 cm, which is enough for coarse habitat siting but not for station-level geology. At 1.8 W, the quilt adds an IMU stream and moderate patch-density; residuals dropped under 5 cm, matching the Hadley analog campaign numbers. At 3 W, the quilt runs full patch-density and adds seismic-event correlation; residuals hit 1.9 cm, which is the point where geologists stop asking for higher resolution.
The Utah State DigitalCommons paper on flexible autonomous power management for small spacecraft was the source for the duty-cycled sensor tradeoffs we modeled. The key insight is that a flight mission does not usually pick one wattage — it moves between modes as the mission timeline allows. EchoQuilt's benchmark therefore reports not just the residual at each wattage but the residual-per-stitch-event, which is the unit that matters when a rover has 40 minutes of science window and a fixed energy allotment. The benchmark lets a mission planner answer questions like "would adding 90 seconds at 3 W cut my residual below 3 cm at this station" without running a new field campaign.

Three design choices make the benchmark useful in flight review. First, the benchmark ships its raw patch library alongside the residual curve, so reviewers can re-derive the curve against their own ground-truth reference. The NASA Science chapter 12 on science instruments discusses the instrument power constraints that drive these constraints and made the case for publishing raw patches. Second, the benchmark cross-references against teleoperated constraints so reviewers can see how the same wattage plays differently under each ops model. Third, the evaluation protocol surfaces residual-vs-environment-temperature curves explicitly, since the stitching engine's behavior at cryogenic conditions differs measurably from its room-temperature performance.
Advanced tactics
Three tactics sharpen the benchmark past the default three-point curve. First, run the benchmark across thermal-representative conditions. A 1.8 W duty cycle at 293 K does not match a 1.8 W duty cycle at 80 K inside a lunar tube; the IMU drift curves change, and the patch-conflict detector starts flagging more seams. Our Mauna Loa winter-night replay sits close to the thermal operating point of a near-equatorial lunar surface night and is the best terrestrial stand-in we have found. Expect the residual curve at 1.8 W to stretch by about 15 to 20 percent under realistic cryogenic analog conditions.
Second, report the tail of the residual distribution, not just the median. Flight review teams care about worst-case station residual more than about median-station performance, because it drives the data-downlink and the science-planning margins. EchoQuilt's benchmark reports 95th and 99th percentile residuals alongside the median, and that transparency has been the strongest individual factor we have seen in getting past an initial technical review.
Third, run a "wattage wavefront" simulation: gradually ramp duty-cycle wattage across a traverse and record the inflection points. The inflections are where the stitching engine transitions between regimes — novelty-detection-dominated to seam-refinement-dominated — and knowing those inflections gives the mission planner a concrete power-vs-residual tradeoff curve rather than a table of discrete points.
Fourth, anchor the benchmark against formally audited reference points where possible. The evaluation protocol borrows from the ground-truth approach in our MSHA anchor evaluation work, where the stitched quilt is compared against formally audited anchor positions rather than just against secondary reference data. This audit-anchored evaluation produces residual numbers that flight reviewers consistently treat as more credible than residuals derived from non-audited references, because the audit process documents the uncertainty in the reference itself.
Fifth, report the power transient performance, not just the steady-state. A flight mission rarely sits at a single wattage for long; it transitions between modes as the operational tempo changes, and each transition takes some number of seconds during which the stitching engine's behavior may be unstable. EchoQuilt's benchmark records the residual during transient periods separately from steady-state periods, which gives mission planners visibility into how the engine handles mode shifts. Transient residual is sometimes worse than steady-state residual by a factor of 2-3, and that fact is decisive when planning ops sequences that include frequent mode shifts.
Sixth, document the patch-priority logic explicitly. Two missions running at the same nominal wattage can produce very different residuals if their patch-priority logic differs, because patch priority determines which patches get the engine's most expensive processing. EchoQuilt's benchmark publishes the patch-priority configuration alongside the residual curve, so a reviewer evaluating the benchmark for their own mission can adjust the priority weights to match their mission's science objectives and re-derive the residual curve accordingly.
Seventh, document the benchmark's underlying assumptions transparently. Every fidelity benchmark assumes some prior on the environment, the sensor configuration, and the analyst's interpretation of "ground truth". A benchmark that hides those assumptions is harder to evaluate against alternative mission profiles. EchoQuilt's benchmark documentation includes an explicit assumptions section that flight-review teams have consistently found valuable, because it lets them quickly determine whether the benchmark's assumptions match their mission's reality before they invest time in evaluating the residual numbers.
CTA
If your team is preparing a flight cave-mapping concept, a NIAC instrument proposal, or a mission concept review that needs a defensible fidelity-vs-power curve, EchoQuilt's benchmark is ready to run against your duty-cycle profile. Each pilot ships with the 0.5 W, 1.8 W, and 3 W patch libraries as baseline priors for your power-tradeoff study, a thermal-representative replay harness tuned to Mauna Loa winter-night profiles that approximate near-equatorial lunar surface night thermal conditions, a wattage-wavefront simulation that gradually ramps duty-cycle wattage and records the inflection points between novelty-detection-dominated and seam-refinement-dominated regimes, a 95th- and 99th-percentile residual reporting module that surfaces worst-case station performance for downlink margin planning, and a transient-residual logger that captures the residual during mode-shift periods separately from steady-state.
Pilot teams shape the patch-priority logic configuration, the operating-mode tradeoff defaults under autonomous versus teleoperated control, and the audit-anchored evaluation reference format that the 2027 reference release will adopt for NIAC concept-review submissions. Priority goes to NIAC instrument PIs targeting cave-mapping payload concepts in the 2026 cycle, JPL Cave Rovers research teams scoping multi-function power budgets for tube-interior operations, MatISSE proposers preparing TRL 5 advancement under sub-Watt power envelopes, and ESA PANGAEA campaign coordinators running flight-power-fidelity analog campaigns at Lofthellir or La Corona. Join the Waitlist for Planetary Analog Researchers and we will share the 0.5 W, 1.8 W, and 3 W patch libraries as baseline priors for your power-tradeoff study.