Skip to content

lcweden/jsontext

JSONText

A state machine for incremental JSON processing.

Quick Start

The following example demonstrates how to use JSONTextSelectorStream to extract all title values from a JSON stream fetched from DummyJSON.

import { JSONTextSelectorStream } from "jsontext";

const response = await fetch("https://dummyjson.com/users");
const addresses = response.body.pipeThrough(new JSONTextSelectorStream("$.users[*].address"));

for await (const value of addresses) {
  console.log(value.json());
}

Installation

jsontext is an ESM-only package available on both NPM and JSR. The core decoder and encoder run in any modern JavaScript environment; the optional *Stream classes additionally require WHATWG Streams support:

NPM

Install via npm:

npm install jsontext

Note

It may require Node.js 18 or later.

Deno

Install via JSR:

deno add jsr:@lcweden/jsontext

APIs

See full reference on JSR.

Category Exports
Core JSONTextDecoder, JSONTextEncoder
Stream JSONTextDecoderStream, JSONTextEncoderStream, JSONTextSelectorStream, JSONTextLineStream
Component Token, Value, KIND
Error SyntacticError

Concepts

JSON.parse is a native, single-pass parser. It will always be faster than jsontext on JSON that fits comfortably in memory.

jsontext makes a deliberate tradeoff: it gives up raw throughput to gain bounded memory, lower time to first result, incremental processing, and the ability to filter data without ever materializing it. This matters when the input is too large to hold, arrives in chunks, or you only care about a small slice of it.

// JSON.parse — needs the whole string, builds the whole tree
const data = await response.json(); // JSON.parse(await response.text());
const titles = data.map((item) => item.title);

// jsontext — reads bytes as they arrive, emits values one by one
const titles = response.body.pipeThrough(new JSONTextSelectorStream("$..title"));

for await (const value of titles) {
  console.log(value.json());
}

Tokens and Values

Represents JSON at two granularities:

  • Tokens: The smallest lexical unit (a scalar like "Alice", true, 123, or a delimiter like {, }, [, ]).

  • Values: A complete unit — a scalar, or an entire object or array including everything nested inside.

A token can never represent a whole object or array; a value always can.

import { JSONTextDecoder, KIND } from "jsontext";

const json = `{"name": "Alice", "tags": ["admin", "user"]}`;
const decoder = new JSONTextDecoder(new TextEncoder().encode(json));
decoder.end(); // signal no more bytes will be pushed

// readToken — one lexical step at a time
decoder.readToken().kind; // KIND.OBJECT_BEGIN  ('{')
decoder.readToken().asString(); // "name"
decoder.readToken().asString(); // "Alice"
decoder.readToken().asString(); // "tags"

// readValue — collapses an entire subtree into one Value
const tags = decoder.readValue();
tags.json(); // ["admin", "user"]

decoder.readToken().kind; // KIND.OBJECT_END    ('}')

Use Value when you need a specific subtree. Call value.json() to materialize it, or decoder.skipValue() to cheaply discard massive branches you don't need without parsing them.

Important

Tokens and values returned from a decoder are views into its internal buffer. The buffer is overwritten the next time you .push() more bytes or read another token/value, so anything you keep around must be copied first with .clone().

const collected = [];
let token;

while ((token = decoder.readToken()) !== undefined) {
  collected.push(token); // unsafe: all entries may end up pointing at the same bytes
  collected.push(token.clone()); // safe: independent copy
}

Push and pull

Decoding is split into two halves. You push bytes in whenever you have them — from a single buffer, a stream chunk, a socket, anything — and you pull tokens or values out at your own pace. The decoder buffers what it needs and waits for more bytes when a token straddles a chunk boundary.

This decoupling is what makes time to first result low. With JSON.parse you must wait for the entire payload before you can touch any data; if the server takes 5 seconds to stream a 50 MB response, you wait 5 seconds. With jsontext the first token is available as soon as the first few bytes arrive — typically tens of milliseconds.

import { JSONTextDecoder } from "jsontext";

const decoder = new JSONTextDecoder();

for await (const chunk of response.body) {
  decoder.push(chunk); // push: feed bytes as they arrive

  let token;
  while ((token = decoder.readToken()) !== undefined) {
    // pull: drain decodable tokens, then wait for more bytes
    handle(token);
  }
}

decoder.end();
decoder.checkEOF();

readToken returns undefined when the buffer is exhausted mid-token — that's the signal to go fetch more bytes, not an error. end() tells the decoder no more input is coming; checkEOF() then asserts that what arrived was a complete, well-formed document.

Composing with streams

The core JSONTextDecoder and JSONTextEncoder are manual state machines. For common use cases, use TransformStream wrappers that natively compose with fetch, files, and Web Streams.

import { JSONTextLineStream, JSONTextSelectorStream } from "jsontext";

// Filter a JSON Lines feed: keep only active users, write them back out as JSONL.
// JSONTextLineStream emits one Value per line, preserving the original bytes.
const encoder = new TextEncoder();

await response.body
  .pipeThrough(new JSONTextLineStream())
  .pipeThrough(
    new TransformStream({
      transform(value, controller) {
        const user = value.json();
        if (user.active) controller.enqueue(encoder.encode(value.text() + "\n"));
      },
    }),
  )
  .pipeTo(destination);

// Or extract a single slice from a large document with a JSONPath selector
response.body.pipeThrough(new JSONTextSelectorStream("$.users[*].email"));

Each stream is a thin adapter over the core API, so you can mix hand-driven decoding and stream piping in the same program without giving up either one's guarantees.

Locating syntax errors

When input violates RFC 8259, jsontext throws a SyntacticError carrying both the byte offset and the JSON pointer to help pinpoint the exact failure.

import { JSONTextDecoder, SyntacticError } from "jsontext";

try {
  const decoder = new JSONTextDecoder(new TextEncoder().encode(`{"a": 1, "b": }`));
  decoder.end();
  while (decoder.readToken() !== undefined) { /* ... */ }
} catch (error) {
  if (error instanceof SyntacticError) {
    console.error(error.message);
  }
}

Example Pipelines

Replace null with an empty string

Swap every null token for an empty string as the JSON flows through — no parsing the whole document, no intermediate object.

import { JSONTextDecoderStream, JSONTextEncoderStream, KIND, Token } from "jsontext";

stream
  .pipeThrough(new JSONTextDecoderStream()) // decode bytes into tokens
  .pipeThrough(
    new TransformStream({
      transform(token, controller) {
        if (token.kind === KIND.NULL) { // Detect a `null` token
          controller.enqueue(Token.fromString("")); // Emit an empty string token instead
        } else {
          controller.enqueue(token);
        }
      },
    }),
  )
  .pipeThrough(new JSONTextEncoderStream()); // encode tokens back into bytes

Extract and Restructure Data

Extract specific nested elements using JSONPath, and wrap them into a brand new JSON array structure directly in the stream pipeline.

import { JSONTextEncoderStream, JSONTextSelectorStream, Token } from "jsontext";

stream
  .pipeThrough(new JSONTextSelectorStream("$.todos[*].todo")) // extract all `todo` values from the `todos` array
  .pipeThrough(
    new TransformStream({
      start(controller) {
        controller.enqueue(Token.ARRAY_BEGIN); // emit a `[` to start the output array
      },
      transform(value, controller) {
        for (const token of value.tokens()) {
          controller.enqueue(token);
        }
      },
      flush(controller) {
        controller.enqueue(Token.ARRAY_END); // emit a `]` to end the output array
      },
    }),
  )
  .pipeThrough(new JSONTextEncoderStream()); // encode back to bytes for output

License

This project is licensed under the MIT License.

Acknowledgements

This project is inspired by Go's encoding/json/jsontext standard library.

About

State machine for incremental JSON processing.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors