GitHub - SneaksAndData/arcane-stream-json: JSON stream for Arcane Streaming Service

JSON Stream Plugin for Arcane

This repository contains implementation of a JSON-Iceberg streaming plugin for Arcane. Use this app to livestream Json files to an Iceberg table, backed by Trino as a streaming batch merge consumer and Lakekeeper as a data catalog.

Quickstart

This source continuously ingests files with multiline-JSON content into a target Iceberg table. In order to configure the stream, you must provide the following:

Desired AVRO schema for the source. Note that this schema should conform with JSON created after JSON pointers and array explode have been applied. All fields in the schema must be defined as nullable. You can use this handy tool to generate the schema.
Source S3 path
JSON pointer expression, if desired data is a subset of a source json. For example, given

{
  "colA": "a",
  "colB": {
    "colC": "c",
    "propA": 1,
    "propB": "ABC"
  }
}

and jsonPointerExpression set to /colB, source will be transformed to:

{
  "colC": "c",
  "propA": 1,
  "propB": "ABC"
}

JSON pointers for array explode, if any. For example, given

{
  "colA": "a",
  "colB": [{
    "colC": "c1",
    "propA": 1,
    "propB": "ABC1"
  },{
    "colC": "c2",
    "propA": 2,
    "propB": "ABC2"
  }]
}

and jsonArrayPointers set to "/colB": {}, source will be transformed to:

{"colC": "c1", "propA": 1, "propB": "ABC1"}
{"colC": "c2", "propA": 2, "propB": "ABC2"}

emitting 2 rows from 1 source file entry.

Development setup

Tooling

Install the following tools:

mise - for managing tooling versions, environment variables: https://github.com/jdx/mise
just - for orchestrating tasks: https://github.com/casey/just
Docker/Docker compose - for integration testing: https://www.docker.com/products/docker-desktop/

Once the above are installed, run mise install. It will install other necessary tools (e.g. JDK and SBT) at recommended versions for this project only.

Getting access to GitHub Packages registry

In order to build, test and run the project, GITHUB_TOKEN environment variable needs to be set. It is used to authenticate against GitHub Maven package registry, specifially for JAR dependencies under https://maven.pkg.github.com/SneaksAndData/arcane-framework-scala.

Create new personal access token PAT (Personal Access Token). For example, fine-grained token with "Public repositories" access and without explicit permissions.

Export GITHUB_TOKEN environment variable before running any sbt commands. For example, put export GITHUB_TOKEN=github_pat_xxx line in your .zshrc/.bashrc file.

Common tasks

Building the project (fat JAR): just build
Building Docker image: just docker-build [tag]
Running integration tests: just it
Running streaming application locally:
- via just stream [--debug] or just backfill [--debug] (backfill mode). Note: dev.env is required, see dev.env.example for an example application configuration.
Cleaning build artifacts: just clean
Code style check: just check

Arcane operator and streams on Kind

Local K8S cluster (i.e. Kind) can be used to verify that Arcane operator and its dependencies coming from Helm charts are correctly setup.

Furthermore, Arcane is lightweight enough so that actual streams can be deployed on the local K8S cluster to, for example, try out or test features in a dev setup.

Setting up Kind

Kind itself should be already installed if you ran mise install. Next steps:

Create Kind cluster: kind create cluster --name arcane-json-dev
Create namespace: kubectl create namespace arcane --context kind-arcane-json-dev
Install required CRDs:

helm install arcane-crd oci://ghcr.io/sneaksanddata/helm/arcane-crd \
  --version vX.Y.Z \
  --namespace arcane \
  --kube-context kind-arcane-json-dev

Install Arcane operator:

helm install arcane oci://ghcr.io/sneaksanddata/helm/arcane-operator \
  --version vX.Y.Z \
  --namespace arcane \
  --kube-context kind-arcane-json-dev

Build a Docker image for this project: mise docker-build kind-dev
Load the Docker image to Kind cluster:

kind load docker-image \
    ghcr.io/sneaksanddata/arcane-stream-json:kind-dev \
    --name arcane-json-dev

Install chart from this project:

helm upgrade --install arcane-json ./.helm \
    --kube-context kind-arcane-json-dev \
    --namespace arcane \
    --set image.repository=ghcr.io/sneaksanddata/arcane-stream-json \
    --set image.tag=kind-dev \
    --set image.pullPolicy=IfNotPresent

Running streams in Kind

To be added...

Development

Project uses Scala 3.8.3 and tested on JDK 25.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.container		.container
.github		.github
.helm		.helm
project		project
src		src
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
README.md		README.md
bootstrap-lk.py		bootstrap-lk.py
build.sbt		build.sbt
dev.env.example		dev.env.example
docker-compose.yaml		docker-compose.yaml
integration-tests.env		integration-tests.env
integration-tests.properties		integration-tests.properties
justfile		justfile
mise.dev.toml		mise.dev.toml
mise.it.toml		mise.it.toml
mise.toml		mise.toml
populate-s3-reader-bucket.py		populate-s3-reader-bucket.py
stream-context-serialized-example.json		stream-context-serialized-example.json
stream-spec-yaml-example.yaml		stream-spec-yaml-example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JSON Stream Plugin for Arcane

Quickstart

Development setup

Tooling

Getting access to GitHub Packages registry

Common tasks

Arcane operator and streams on Kind

Setting up Kind

Running streams in Kind

Development

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JSON Stream Plugin for Arcane

Quickstart

Development setup

Tooling

Getting access to GitHub Packages registry

Common tasks

Arcane operator and streams on Kind

Setting up Kind

Running streams in Kind

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages