Skip to content

physicaldrivegetter: add storcli2 physical drive getter#58

Open
ezekiel-alexrod wants to merge 3 commits into
mainfrom
feature/storcli2-physical-drive-getter
Open

physicaldrivegetter: add storcli2 physical drive getter#58
ezekiel-alexrod wants to merge 3 commits into
mainfrom
feature/storcli2-physical-drive-getter

Conversation

@ezekiel-alexrod

@ezekiel-alexrod ezekiel-alexrod commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What

Implements ports.PhysicalDrivesGetter for storcli2 / perccli2 (physicaldrivegetter/storcli2.go), mirroring the ssacli getter and the megaraid v1 physical-drive parsing. Also fixes the storcli2/perccli2 command runners to surface in-JSON errors when the binary exits non-zero, and aligns the stale storcli2 package docs with that corrected exit-code contract.

How

  • PhysicalDrives(ctrl) → a single /cN/eall/sall show all (its Drives List already carries each drive's detailed information).
  • PhysicalDrive(meta)/cN/eX/sY show all (or /cN/sY without an enclosure, as in megaraid v1), with a not-found guard on an empty list.
  • Each Drives List entry maps to a PhysicalDrive: vendor/serial/WWN (0x-prefixed as in megaraid v1) from Drive Detailed Information, model and type (Med) from Drive Information, size via utils.ConvertSizeBytes, status from the State/Status pair, and Reason carrying the raw status (as in the ssacli getter).
  • One implementation serves both binaries via an injected commandrunner.CommandRunner. Section structs live in the getter file and consume the shared storcli2.Decode + utils.UnmarshalToSlice.

State/Status mapping

The mapping follows the drive-state legend printed by the binary, verified on a live MegaRAID 9660-16i. Two storcli2 specifics drive the design:

  • storcli2 has no Failed drive state (storcli1 had one): failure is reported in Status. A Failed, Offline or Missing status therefore takes precedence over any state, so a non-functioning drive is never reported as in use — or worse, available for LV creation.
  • Every base state comes in suffixed variants (Shld shielded diagnostics, Sntz sanitize, Dgrd degraded — e.g. ConfDgrd, JBODSntz, UConfShld), so states are matched by family rather than exactly. The lone exception is Shld (“JBOD Shielded”), which belongs to the JBOD family without the prefix.

Resulting map: Conf* / JBOD* / Shld / GHS* / DHS* → used; UConf* → unassigned good, unless the status is Bad or the state is UConfUnsp (unsupported) → unassigned bad; Failed / Offline / Missing status → failed; anything else (Unusbl, Various, …) → unknown with the raw status in Reason.

Device paths are computed only for healthy JBOD drives (ComputePaths reads the real filesystem): a dead JBOD drive whose device node is gone must not fail the whole inventory. A synthetic-payload test pins this guard at the entity level (the captured fixtures contain no JBOD drive).

Command runner fix

storcli2/perccli2 do not use the exit code as a reliable success signal: some failures (invalid drive) exit non-zero while still writing their JSON error payload to stdout, others (invalid controller/VD) exit zero. cmd.Output() turned the former into a bare exit status N and discarded the payload. The runners now return stdout as-is on a non-zero exit with a payload, letting storcli2.Decode surface the in-JSON error ("No PD found."); a non-zero exit with no stdout stays a hard error. The storcli2 package docs (envelope.go, decode.go), which still claimed the CLI "exits 0 even on failure", are reworded accordingly.

Tests

Table-driven storcli2_test.go, reusing the testify mock runner and testdata/storcli2/ fixtures (real server captures: 24-drive inventory, single drives, UConf/Good, invalid-drive failure payload). Cases: full inventory, single drive (configured / unconfigured / not found), empty-list guard, JBOD entity-level mapping with the ComputePaths guard, and unit tables for the State/Status mapping (families, Failed/Offline/Missing precedence, hot spares, unsupported), JBOD family detection, disk type, WWN formatting and selector building. Runner tests cover both non-zero-exit paths. The full flow was also validated end-to-end against the live 9660-16i captures.

UG v1.1 alignment (second commit)

Cross-checking against the StorCLI2 User Guide v1.1 surfaced three gaps, fixed in physicaldrivegetter: align storcli2 typing and statuses with UG v1.1:

  • NVMe typing: storcli2 reports an NVMe drive with Med=SSD and the transport in Intf (the guide's sample shows Intf NVMe, Med SSD), so deriving the type from Med alone could never yield NVMe. The type now checks Intf first.
  • Unusable status: a documented drive status; an unconfigured drive with it was reported unassigned-good (selectable for VD creation). It now maps to unassigned-bad, like Bad.
  • Empty inventory: per the guide, showing a nonexistent object reports success, so an absent Drives List section is an empty inventory, not an error (mirroring the logical-volume getter's zero-VD contract). Replace (copyback) is also pinned as in-use in the status table.

storcli2 and perccli2 do not use the process exit code as a reliable success
signal: some failures (e.g. an invalid drive) exit non-zero while still writing
their JSON error payload to stdout, whereas others (invalid controller, invalid
VD) exit zero. cmd.Output() turned the former into a Go error and discarded the
payload, so storcli2.Decode never saw the in-JSON message and callers got a bare
"exit status N".

Return stdout as-is when the command exits non-zero but produced a payload, and
let storcli2.Decode determine success/failure from the JSON. A non-zero exit
with no stdout (or any other exec failure) stays a hard error. Verified against a
live MegaRAID 9660-16i: "/cN/eX/sY show all" on a missing drive exits 255 with a
"Drive not found" JSON payload.

Align the storcli2 package docs (envelope.go, decode.go), which still claimed
the CLI "exits 0 even on failure", with this corrected contract.
Implement ports.PhysicalDrivesGetter for storcli2 / perccli2, mirroring the
ssacli getter and the megaraid v1 physical-drive parsing. PhysicalDrives()
reads the whole inventory from a single "/cN/eall/sall show all" (which already
carries each drive's detailed information); PhysicalDrive() reads one drive via
its "/cN/eX/sY" selector (or "/cN/sY" without an enclosure, as in megaraid v1).
Each "Drives List" entry maps to a PhysicalDrive: vendor/serial/WWN from Drive
Detailed Information, model and type (Med) from Drive Information, size via
utils.ConvertSizeBytes, and status from the State/Status pair.

The State/Status mapping follows the drive-state legend printed by the binary:
unlike storcli1, storcli2 has no "Failed" drive state (failure is reported in
Status), and every base state (UConf, Conf, JBOD, GHS, DHS) comes in suffixed
variants (Shld, Sntz, Dgrd), so states are matched by family rather than
exactly. A "Failed", "Offline" or "Missing" status takes precedence over any
state so a non-functioning drive is never reported as in use or available;
configured, JBOD and hot-spare drives map to used; unconfigured drives are
available unless bad or unsupported. Device paths are only computed for
healthy JBOD drives, so a dead JBOD drive whose device node is gone does not
fail the whole inventory. Verified against a live MegaRAID 9660-16i (24-drive
inventory and single-drive reads).

One implementation serves both binaries through the injected
commandrunner.CommandRunner. Section structs live in the getter file and consume
the shared storcli2.Decode + utils.UnmarshalToSlice.
@ezekiel-alexrod ezekiel-alexrod requested a review from a team as a code owner June 11, 2026 09:54
Cross-checking the implementation against the StorCLI2 User Guide v1.1
surfaced three gaps:

- storcli2 reports an NVMe drive with "SSD" media and the NVMe transport in
  its interface (the guide's sample shows Intf "NVMe", Med "SSD"), so deriving
  the disk type from "Med" alone could never yield NVMe. The type now checks
  "Intf" first.
- "Unusable" is a documented drive status (alongside Good, Bad, Replace,
  Offline, Failed, Online, Missing); an unconfigured drive with that status
  was reported unassigned-good, i.e. selectable for VD creation. It now maps
  to unassigned-bad, like "Bad".
- Per the guide, showing a nonexistent object reports success, so an absent
  "Drives List" section is an empty inventory, not an error (mirroring the
  logical-volume getter's zero-VD contract).

Also pins "Replace" (a drive being copied back) as in-use in the status table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant