Skip to content

[improve][functions] Allow the standalone function worker to host the Packages Management Service (FileSystemPackagesStorage) for Oxia / non-ZooKeeper deployments #26082

Description

@lhotari

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

When Pulsar is deployed with Oxia as the metadata store (instead of ZooKeeper), Pulsar Functions cannot use package storage out of the box.

The Packages Management Service is broker-hosted: it is started in PulsarService.startPackagesManagementService() (gated on enablePackagesManagement) and the storage provider is built from the broker ServiceConfiguration (packagesManagementStorageProvider + getProperties()). There are two providers today:

  • BookKeeperPackagesStorageProvider (the default) — relies on DistributedLog/BookKeeper metadata in ZooKeeper, so it does not work with Oxia.
  • FileSystemPackagesStorageProvider — works without ZooKeeper, but FileSystemPackagesStorage is node-local (its STORAGE_PATH is a local directory), so it requires either a single broker or a shared filesystem.

In a Kubernetes deployment (the Apache Pulsar Helm chart) a natural and clean design is to split the function worker into its own component with a single replica and let that single-replica worker host the Packages Management Service backed by FileSystemPackagesStorage on a single RWX PVC. With a single replica there is no sharing problem, so this would make Functions work on Oxia without ZooKeeper and without a shared filesystem.

However, the standalone function worker cannot host the Packages Management Service today:

  • The function worker's REST web service (PulsarWorkerService / WorkerServer) does not register the /admin/v3/packages endpoints — those live in org.apache.pulsar.broker.admin.v3.Packages (in pulsar-broker) and depend on pulsar().getPackagesManagement().
  • WorkerConfig has only functionsWorkerEnablePackageManagement (which makes the worker use the service), not enablePackagesManagement / packagesManagementStorageProvider (which would make it host storage).
  • When functionsWorkerEnablePackageManagement=true, the worker delegates package storage to the broker via PulsarAdmin — e.g. ComponentImpl calls worker().getBrokerAdmin().packages().upload(...). When it is false, the worker stores packages directly in BookKeeper/DLog, which again needs ZooKeeper.

The net effect: in a split, Oxia-based deployment there is no component that can host FileSystemPackagesStorage for the worker.

Solution

Allow the standalone function worker to host the Packages Management Service itself:

  • Add enablePackagesManagement + packagesManagementStorageProvider (and the STORAGE_PATH / provider properties) support to WorkerConfig and the standalone worker startup (PulsarWorkerService).
  • Register the packages REST resource (org.apache.pulsar.broker.admin.v3.Packages, or a worker-side equivalent) in the function worker's web service, initializing PackagesManagementImpl with the configured PackagesStorage the same way PulsarService.startPackagesManagementService() does.

This would let a single-replica function worker + single RWX PVC + FileSystemPackagesStorageProvider provide package management for Oxia (no-ZooKeeper) deployments without a shared filesystem.

Alternatives considered

  • Keep package management on the broker with FileSystemPackagesStorage. Works, but FileSystem storage is node-local, so it requires a single broker; multiple brokers would each have their own local package directory.
  • Shared filesystem for FileSystemPackagesStorage across replicas. On cloud providers this could be backed by CSI drivers that provide RWX shared filesystems — e.g. GCP Filestore CSI on GKE, Amazon EFS CSI on EKS, Azure Files CSI on AKS. This would also let multiple function worker / broker replicas share package storage, and is a good longer-term option. The single-replica-worker + single-RWX-PVC approach is the simplest first step and needs no shared filesystem.
  • External package URLs only (builtin://, http(s)://, file://). Works without any package management service, but does not support uploaded packages (e.g. pulsar-admin functions create --jar ...).

Additional context

This is needed by the Apache Pulsar Helm chart (apache/pulsar-helm-chart), which is adding a standalone function-worker component and aims to support Pulsar Functions on Oxia. Until the worker can host package management, the chart can only offer BookKeeper package storage with the broker-embedded worker (requiring ZooKeeper), so a Helm deployment that combines the embedded function worker with Oxia has no working package storage.

Relevant code references:

  • PulsarService.startPackagesManagementService() / getPackagesManagement() ("Package Management Service is not enabled in the broker.")
  • ServiceConfiguration#enablePackagesManagement, ServiceConfiguration#packagesManagementStorageProvider
  • WorkerConfig#functionsWorkerEnablePackageManagement (no host-side equivalent)
  • org.apache.pulsar.functions.worker.rest.api.ComponentImpl upload path (getBrokerAdmin().packages().upload(...))
  • FileSystemPackagesStorage (STORAGE_PATH, default packages-storage) and FileSystemPackagesStorageProvider

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions