Skip to content

libmetal: lib: linux: improve Linux UIO-backed device open and test coverage#365

Open
bentheredonethat wants to merge 6 commits into
OpenAMP:mainfrom
bentheredonethat:uio-update
Open

libmetal: lib: linux: improve Linux UIO-backed device open and test coverage#365
bentheredonethat wants to merge 6 commits into
OpenAMP:mainfrom
bentheredonethat:uio-update

Conversation

@bentheredonethat

Copy link
Copy Markdown
Contributor

This series improves the Linux UIO-backed device-open flow in libmetal and
adds test coverage for the new API and Linux-specific helper paths.

The immediate motivation is to support Linux userspace applications that open
UIO-exposed devices through libmetal while keeping the existing bus/device
contract intact. In the current Linux implementation, the basic UIO path is
already present, but the backend is tightly coupled to bus probing, does not
cleanly separate resolved Linux device identities, does not unregister Linux
IRQ device state on close, and does not correctly retain the raw mapping
needed when a UIO map uses a non-zero offset.

This series addresses those gaps in three steps:

1. Add an explicit public helper,                                              
   metal_device_open_from_bus(), for bus/device based open while keeping       
   metal_device_open() as a compatibility wrapper.                             
                                                                               
2. Refactor the Linux UIO backend so it can populate a libmetal device         
   after resolving either a bus device name or a UIO class name, track the  
   resolved Linux identities more clearly, validate UIO map offsets, retain 
   raw mmap pointers for correct unmap, unregister Linux IRQ device state   
   on close, and tolerate Linux bus probe failure during metal_sys_init().  
                                                                               
3. Add cross-platform and Linux-specific tests for the new device-open         
   helper and the Linux UIO/IRQ bookkeeping helpers.                           

With these changes, downstream Linux host applications can continue to use
libmetal's device-open and IRQ registration model while relying on the
improved Linux UIO device handling in the library.

Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/irq.h
Comment thread lib/system/linux/sys.h Outdated
Comment thread lib/device.c Outdated
@bentheredonethat

Copy link
Copy Markdown
Contributor Author

Hi @arnopo

Thanks for the review. I agree the Linux change is too large as posted, and I can split it into smaller commits before respinning.

The main problem I am trying to solve is narrower than the current series makes it look. Linux userspace libmetal already opens platform/PCI devices through the existing Linux UIO-backed path.
Today applications have to pass the Linux bus device name, for example "ff360000.ipi". That name comes from the platform device/unit-address naming and is not a stable logical name across SoCs or
even across different instances.

For demos and portable userspace applications, we would like to use a logical name such as "demo-ipi". On systems using UIO, that logical name can be exposed by the kernel as the UIO class name,
for example:

  /sys/class/uio/uioX/name = demo-ipi

The Linux libmetal backend can then resolve that UIO class name back to the real bus device:

  demo-ipi
    -> /sys/class/uio/uioX/name
    -> /sys/class/uio/uioX/device
    -> /sys/bus/platform/devices/ff360000.ipi

and continue through the existing Linux bus open/bind/map/IRQ/close flow.

So the intended direction is not to make UIO mandatory for OpenAMP generally, and not to introduce a new Linux device model. It is only to let the existing Linux UIO backend accept the UIO class
name as an alias for the native bus device name.

I also agree with your comment about the new public API. The UIO-name use case does not require adding metal_device_open_from_bus(). I can drop that patch and keep the existing system-agnostic
API:

  metal_device_open("platform", "demo-ipi", &dev);

The Linux backend would interpret the device argument as either:

  1. the native bus device name, preserving existing behavior, or
  2. a UIO class-name alias, resolved internally to the native platform/PCI device.

Internally I plan to keep both identities clear, e.g. requested name / resolved bus device name / UIO name. For minimum behavior change, the returned device->name can remain the resolved native
bus device name; the UIO name is just an open-time alias.

For the respin, does the following direction sound acceptable?

  1. Drop the new metal_device_open_from_bus() public API.
  2. Split the Linux changes into smaller commits, likely:
    • UIO mmap offset validation and correct raw-pointer unmap tracking.
    • IRQ bookkeeping cleanup on device close.
    • UIO class-name alias resolution for the existing Linux UIO backend.
    • Tests and documentation updates.
  3. Include Doxygen comments in the same commits that introduce new internal Linux helpers.
  4. Clarify in the commit messages that UIO-name lookup is optional alias resolution for the existing Linux backend, not a requirement for OpenAMP Linux userspace in general.

If this direction makes sense, I will rework the series around that.

@arnopo

arnopo commented May 12, 2026

Copy link
Copy Markdown
Contributor

Hi @arnopo

Thanks for the review. I agree the Linux change is too large as posted, and I can split it into smaller commits before respinning.

The main problem I am trying to solve is narrower than the current series makes it look. Linux userspace libmetal already opens platform/PCI devices through the existing Linux UIO-backed path. Today applications have to pass the Linux bus device name, for example "ff360000.ipi". That name comes from the platform device/unit-address naming and is not a stable logical name across SoCs or even across different instances.

For demos and portable userspace applications, we would like to use a logical name such as "demo-ipi". On systems using UIO, that logical name can be exposed by the kernel as the UIO class name, for example:

  /sys/class/uio/uioX/name = demo-ipi

The Linux libmetal backend can then resolve that UIO class name back to the real bus device:

  demo-ipi
    -> /sys/class/uio/uioX/name
    -> /sys/class/uio/uioX/device
    -> /sys/bus/platform/devices/ff360000.ipi

and continue through the existing Linux bus open/bind/map/IRQ/close flow.

What about using symbolic link for that, as proposed by @tnmysh in OpenAMP/openamp-system-reference#101.
That would avoid resolution by /sys/class/uio/uioX/name if a /sys/class/uio/<name>/device symbolic is created with a udev rule.
Would it work in your case?

So the intended direction is not to make UIO mandatory for OpenAMP generally, and not to introduce a new Linux device model. It is only to let the existing Linux UIO backend accept the UIO class name as an alias for the native bus device name.

I also agree with your comment about the new public API. The UIO-name use case does not require adding metal_device_open_from_bus(). I can drop that patch and keep the existing system-agnostic API:

  metal_device_open("platform", "demo-ipi", &dev);

The Linux backend would interpret the device argument as either:

  1. the native bus device name, preserving existing behavior, or
  2. a UIO class-name alias, resolved internally to the native platform/PCI device.

Internally I plan to keep both identities clear, e.g. requested name / resolved bus device name / UIO name. For minimum behavior change, the returned device->name can remain the resolved native bus device name; the UIO name is just an open-time alias.

For the respin, does the following direction sound acceptable?

  1. Drop the new metal_device_open_from_bus() public API.

  2. Split the Linux changes into smaller commits, likely:

    • UIO mmap offset validation and correct raw-pointer unmap tracking.
    • IRQ bookkeeping cleanup on device close.
    • UIO class-name alias resolution for the existing Linux UIO backend.
    • Tests and documentation updates.
  3. Include Doxygen comments in the same commits that introduce new internal Linux helpers.

  4. Clarify in the commit messages that UIO-name lookup is optional alias resolution for the existing Linux backend, not a requirement for OpenAMP Linux userspace in general.

Sound good.

Thanks
arnaud

If this direction makes sense, I will rework the series around that.

@bentheredonethat

Copy link
Copy Markdown
Contributor Author

Thanks @arnopo, this is a good point.

Using a udev-created symlink such as:

/sys/class/uio/<logical-name>/device
would work in our use case and is a nice optimization when present, since it avoids scanning uioX/name.

For upstream libmetal, I would prefer to treat this as optional platform integration rather than a hard dependency, because not all deployments guarantee custom udev rules. So my plan is:

  1. keep the generic fallback that resolves via /sys/class/uio/uioX/name,
  2. optionally try the symlink path first when it exists.

That keeps behavior portable out of the box while allowing integrators to use the symlink approach for faster/cleaner lookup.

If you agree, I will document this in the commit message as “optional udev optimization, generic fallback preserved”.

@arnopo

arnopo commented May 12, 2026

Copy link
Copy Markdown
Contributor

Thanks @arnopo, this is a good point.

Using a udev-created symlink such as:

/sys/class/uio/<logical-name>/device would work in our use case and is a nice optimization when present, since it avoids scanning uioX/name.

For upstream libmetal, I would prefer to treat this as optional platform integration rather than a hard dependency, because not all deployments guarantee custom udev rules. So my plan is:

  1. keep the generic fallback that resolves via /sys/class/uio/uioX/name,
  2. optionally try the symlink path first when it exists.

That keeps behavior portable out of the box while allowing integrators to use the symlink approach for faster/cleaner lookup.

If you agree, I will document this in the commit message as “optional udev optimization, generic fallback preserved”.

I would prefer that we handle this in the same way we manage /dev/rpmsgX or /sys/class/remoteproc/remoteprocX devices, rather than adding it to libmetal.

I propose adding this PR to the agenda for the next OpenAMP meeting so we can discuss it further.

@bentheredonethat

Copy link
Copy Markdown
Contributor Author

Hi @arnopo @wmamills @tnmysh i have some updates here:

Update: I pushed the revised libmetal changes to this PR.

The current branch now fleshes out the Linux uio bus support while preserving the pre-existing platform/device open flow. Existing users that open devices through the platform bus continue to use the same bind-and-open path. The new uio bus path is additive and lets userspace open already exposed UIO devices by their /sys/class/uio/uioX/name value, which gives applications a stable logical lookup path without requiring generated platform device names.

I also kept the fallback behavior in place: native platform-bus open remains the primary path for existing users, and the UIO class-name path is only used when callers explicitly request the uio bus. The shared UIO populate logic is reused so both paths get the same mmap offset handling, IRQ setup, DMA-map behavior, and cleanup.

I investigated the symlink-based lookup option as requested. I do not think symlinks are needed for this PR. The kernel already exposes the stable lookup key we need through /sys/class/uio/uioX/name, and resolving that directly avoids adding another filesystem convention that would need to be created, documented, kept in sync with UIO enumeration, and handled across distros/init systems/containers. Using the existing UIO class metadata keeps the implementation self-contained in libmetal and avoids requiring deployment-side symlink management.

So the PR now takes this approach:

  • Add explicit uio bus support for opening devices by UIO class name.
  • Preserve the existing platform bus behavior and fallback path.
  • Reuse the same UIO populate/mapping/IRQ cleanup logic for both paths.
  • Avoid symlink lookup because /sys/class/uio/uioX/name is sufficient and already available.

@arnopo arnopo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentheredonethat
Sorry for the delay, please find some comment.
I need to past time on "lib: linux: add UIO bus open by class name" to understand your work. adding more comment would help me

Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/sys.h Outdated
Comment thread lib/system/linux/irq.c Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/sys.h Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c Outdated

@arnopo arnopo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the commits are still not easy to review. Reverse engineering is required to understand it.
Adding more details on the algorithm you try to apply in the commit message could help

Comment thread lib/system/linux/device.c Outdated
void *raw, *virt;
int irq_info;

i = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be initialized when declared

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/device.c Outdated
@bentheredonethat

Copy link
Copy Markdown
Contributor Author

@arnopo i split further the UIO rework so there are more comments, split the commits for easier review, and detail the algorithm used as well as the nits

Comment thread lib/system/linux/device.c Outdated
goto out;

close_dev:
if (ldrv->dev_close)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should have for_each_linux_driver(lbus, ldrv) here , no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NA.

Current metal_linux_dev_open() has a single for_each_linux_driver(lbus, ldrv) loop, and failed opens clean up the same driver that just attempted the open.

There is no post-loop close_dev: block anymore, so there is no need to add another for_each_linux_driver() there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an imbalance between the open and close operations in the function because for_each_linux_driver(lbus, ldrv) handles the open part. Therefore, either remove for_each_linux_driver or close all opened devices if an error occurs.

Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c Outdated
Comment thread lib/system/linux/device.c
@bentheredonethat

Copy link
Copy Markdown
Contributor Author

Thanks @arnopo, I addressed the review comments across the series.

Summary of updates:

  • In lib: linux: preserve device-open errors, metal_linux_dev_open() now closes a failed backend driver and continues trying remaining viable drivers, while preserving the first useful errno
    to return if all drivers fail.

  • In lib: linux: fix UIO mmap offset handling:

    • Removed the redundant !info->map_size check since info->offset >= info->map_size already catches map_size == 0.
    • Replaced the size_t overflow check with an explicit SIZE_MAX comparison and added the needed <stdint.h> include.
    • Switched the physical-address overflow check to the suggested wraparound-style check.
    • Moved the UIO map-read error handling into this commit: missing next map ends the loop, while other map-read errors fail open.
  • In lib: linux: clear UIO IRQ bookkeeping on close, IRQ disable/unregister now happens before unmapping device regions, and close resets the region count after unmapping.

  • In lib: linux: factor common UIO populate path:

    • Parent UIO open error paths after sysfs_open_device() now clean up locally.
    • Cleanup remains idempotent so the common open path can safely call dev_close() again.
    • Added negative-return checks for the touched snprintf() calls.

@bentheredonethat bentheredonethat force-pushed the uio-update branch 2 times, most recently from fba4fa2 to 849155e Compare June 25, 2026 13:36
@arnopo arnopo added this to the Release V2026.10 milestone Jun 26, 2026
Comment thread lib/system/linux/device.c Outdated
Comment on lines +147 to +150
* Translate UIO sysfs map attributes into the values libmetal needs:
* the mmap() length for cleanup, the usable physical start address, and
* the usable I/O region size after skipping the map offset.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Translate UIO sysfs map attributes into the values libmetal needs:
* the mmap() length for cleanup, the usable physical start address, and
* the usable I/O region size after skipping the map offset.
*/
/**
* @internal
*
* @brief Translate UIO sysfs map attributes into libmetal map information.
*
* This fills in the values required by libmetal:
* - the mmap() length used for cleanup,
* - the usable physical start address,
* - the usable I/O region size after skipping the map offset.
*
* @param info Pointer to the map information structure to populate.
*
* @return 0 on success, or a negative error code on failure.
*/

Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/irq.c Outdated
* Drop the device pointer associated with a Linux IRQ fd during device close.
* The caller must disable the IRQ first so the dispatch path cannot observe
* an enabled IRQ whose owning device has already been detached.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comment here and for other functions below . doxygen documentation in irq.h already documents the functions

Comment thread lib/system/linux/device.c
Comment thread lib/system/linux/device.c Outdated
close_list:
sysfs_close_list(dlist);
if (result)
goto fail;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really readable. Avoid such kind of label and use them only for error management.
here remove the label and the test on result, just keeping sysfs_close_list(dlist);

the at the end of the function:

close_list:
	sysfs_close_list(dlist);

fail:
	metal_uio_dev_close(lbus, ldev);
	return result;

Comment thread lib/system/linux/irq.c
The Linux bus open path may try more than one backend driver for a
device. When a backend finds the device but fails while opening it,
the common open loop currently discards that errno and returns
-ENODEV after all drivers have been tried.

Keep the first useful backend open error, preferring non-ENODEV
failures over a plain miss. This preserves the existing not-found
result while letting callers see real failures such as UIO map
population errors.

Signed-off-by: Ben Levinsky <ben.levinsky@amd.com>
UIO map offsets identify the usable resource start inside the
page-aligned mapping exposed by sysfs. The Linux backend previously
exposed and unmapped the adjusted virtual address directly.

Keep the raw mmap base and length for close, expose the usable
virtual address as raw mapping plus offset, and derive the libmetal
physical base and size from the usable portion of the UIO map.

Use the sysfs map size as the mmap length. For an unaligned resource,
UIO already reports a page-aligned address and a full mmap length, so
adding the offset to that length can over-map the resource and fail.

Reject offsets outside the system page size, reject offsets beyond the
map size, and report overflow before attempting to mmap the region.

Signed-off-by: Ben Levinsky <ben.levinsky@amd.com>
@bentheredonethat

Copy link
Copy Markdown
Contributor Author

@arnopo

addressed as follows:

  • For the UIO mmap offset helpers, I converted the comments for metal_linux_uio_validate_offset() and metal_linux_uio_map_info() to Doxygen format with @internal, @brief, parameter
    documentation, and return value documentation. I also documented struct metal_uio_map_info in the same style since it is part of that helper flow.

  • For the IRQ bookkeeping helpers, I removed the redundant function comments from lib/system/linux/irq.c because the functions are already documented in lib/system/linux/irq.h. I kept only the
    local inline comment that explains the teardown ordering/ownership inside the implementation.

  • For the UIO helper functions introduced in the common populate/class-name/synthetic-bus commits, I converted the function-level comments to Doxygen with @internal. This covers the newly
    introduced helpers in device.c, including the string/sysfs helpers, class-name lookup, common populate path, synthetic UIO bus helper, and synthetic-bus probe/close behavior.

  • For the close_list: readability issue, I reworked the flow so the normal path closes the sysfs list directly and continues, while close_list: is used only as an error cleanup label before
    the shared failure path. This removes the label-plus-result-test pattern and keeps labels focused on error management.

  • For the metal_linux_dev_open() open/close imbalance concern, I changed the retry logic so only clean -ENODEV backend misses continue to later drivers. If a backend finds the device but fails
    with a real error, the attempted backend is cleaned up and that errno is returned instead of continuing through the driver list.

  • I also removed duplicate inline explanation where the new Doxygen comments now carry the same information, especially around the synthetic UIO bus probe/close behavior.

A UIO-backed device registers its file descriptor with the Linux IRQ
controller so interrupt handling can find the owning metal device.
Closing the device must clear that association before closing the fd.

Add an internal unregister helper that detaches the device pointer
after the IRQ consumer has disabled the IRQ. Keep IRQ handler and
enable-state teardown owned by the standard IRQ disable and unregister
paths.

Signed-off-by: Ben Levinsky <ben.levinsky@amd.com>
Split the UIO open flow into two stages. The parent-bus path still
opens the platform or PCI sysfs device, binds it to the selected UIO
driver, finds the child UIO class device, and records the resolved
class and /dev paths.

Move the common stage into metal_uio_populate(). That helper waits for
the /dev/uioX node, opens it, reads each UIO map, maps the full mmap
extent, exposes the usable region after the sysfs offset, and registers
IRQ bookkeeping when the UIO fd supports interrupts.

Keep close-time cleanup unchanged by storing the raw mmap address and
length alongside the adjusted libmetal I/O region. On populate failure,
unmap any regions mapped so far and close the UIO fd locally before the
generic open path releases parent sysfs and driver override state.

Also make local error paths close the temporary UIO child list before
returning.

Signed-off-by: Ben Levinsky <ben.levinsky@amd.com>
Add the resolver used by the synthetic uio bus. It scans every
/sys/class/uio/uioX/name file, compares the first line against the
requested libmetal device name, and rejects duplicate matches because
they cannot be opened deterministically.

When a unique match is found, fill the same linux_device fields that
the parent-bus UIO path fills: cls_path points at the UIO sysfs class
directory, dev_path points at /dev/uioX, and the UIO name and device
node name are saved for diagnostics and future callers.

The class-name open callback then reuses metal_uio_populate(), so UIO
class opens and parent-bus UIO opens share mmap setup, IRQ registration,
DMA handling, and close-time cleanup.

Signed-off-by: Ben Levinsky <ben.levinsky@amd.com>
Register a synthetic Linux uio bus so callers can use the existing
metal_device_open("uio", name, ...) API shape to open UIO devices by
the value exported in /sys/class/uio/uioX/name.

This bus is not backed by a sysfs bus directory or a probed kernel
driver handle. During Linux bus initialization, register it only when
/sys/class/uio exists, and skip the normal sysfs bus and driver probing
that platform and PCI devices require.

During device open, allow the synthetic uio driver to run its class-name
open callback without an sdrv handle. The callback resolves the UIO class
device and then uses the shared populate path added earlier, so the new
bus preserves the same mmap, IRQ, DMA, and close semantics as existing
UIO-backed platform and PCI opens.

Also make bus close tolerate the missing sysfs bus handle and copy the
requested device name with snprintf() so oversized names fail cleanly.

Signed-off-by: Ben Levinsky <ben.levinsky@amd.com>

@arnopo arnopo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants