Searchable X/Twitter Community Notes Database

Community Notes is a feature of X (formerly Twitter) where contributors can add context such as fact-checks under a post, see https://x.com/i/communitynotes

Unfortunately X/Twitter does not provide a way to search through the notes, either through a web interface or an API. This project is intended to fill that gap by providing a searchable database of the notes.

This project was born from the need to counter online disinformation on X/Twitter efficiently after russia's invasion of 🇺🇦 Ukraine in 2022.

Architecture

Note: the design was influenced by a similar project https://github.com/bpettis/birdwatch-scraper

The application consists of the following components:

PostgreSQL database - Stores the notes and import history
PostgREST - Auto-generates a RESTful API from the database schema
Go API server - Handles scheduled imports, download management, and control endpoints
Nginx - Reverse proxy for PostgREST, Go API, and serves the static web UI
Web UI - AlpineJS-based search interface with full-text search
Optional: Swagger UI for API docs, Adminer for database management

Two deployment modes are supported:

Docker Compose (development): Separate containers for each service, easier debugging
Single container (production): All services bundled in one image for simple deployment

In both modes, data persists in the Docker volume x-notes-db.

Note: PostgreSQL 18 requires the PGDATA environment variable to be set for data persistence to work correctly. This is already configured in both the Dockerfile and docker-compose.yml.

Version: This is version 0.0.14

Method 1: Docker Compose

In this version, we use Docker Compose to start separate containers for the PostgreSQL database, PostgREST, Nginx, and the Go API server.

Requirements

Internet connection
Docker with Compose plug-in

Start the Docker compose stack

  docker compose up -d

This will start the following services:

db: PostgreSQL database
postgrest: RESTful API server
nginx: Reverse proxy

Optional

You may also start the following services:

swagger: Swagger UI for PostgREST API

  docker compose scale swagger=1

adminer: Web-based database management tool

  docker compose scale adminer=1

Run the Loader

To load the notes into the database, trigger the import via API:

# Full import (all available files)
curl -X POST http://localhost:8080/api/imports/create

# Check import status
curl http://localhost:8080/api/imports/current

# List import history
curl http://localhost:8080/api/imports

Or use the web UI at http://localhost:8080 (Admin tab)

Accessing the note search UI

Open the following URL: http://localhost:8080

Other Useful URLs

Service	URL
PostgREST sample query	http://localhost:3000/note?limit=50&summary_ts.fts.Nigeria&select=summary
PostgREST sample query through nginx	http://localhost:8080/api/note?limit=50
Adminer (requires `adminer` container)	http://localhost:8082/?pgsql=db&username=postgres&db=postgres&ns=public&table=note
Postgrest SwaggerUI (requires `swagger` container)	http://localhost:8081
Postgres direct connection	localhost:5432

Method 2: Single Docker container

In this version, we use a single Docker container that runs PostgreSQL, PostgREST, Nginx, and the Go API server with built-in scheduler.

Build and start the Docker container

  make run

This will start a container named x-notes with all the services running inside it.

Running the loader

To load the notes into the database, trigger the import via API:

# Full import (all available files)
curl -X POST http://localhost:8080/api/imports/create

# Check import status
curl http://localhost:8080/api/imports/current

Monitoring the loader

While the loader is running, you can monitor the logs of the container:

    docker logs -f x-notes

You can also query this special Postgres COPY Progress Reporting view to see the progress of the loading:

    docker exec -it x-notes psql -U postgres -d postgres -c "SELECT * FROM pg_stat_progress_copy;"

Also accessible via PostgREST API:

    curl http://localhost:8080/api/pg_stat_progress_copy

Monitor the tuples_processed field to see how many notes have been loaded. The total number of notes to load can be known by counting the lines of the notes file (minus 1 for the header):

    docker exec -it x-notes /bin/sh -c "wc -l /home/data/*.tsv"

Accessing the notes

Open the following URL: http://localhost:8080

Building and Pushing Multi-Architecture Images

The build_multi.sh script builds and pushes multi-architecture Docker images with automatic versioning.

Versioning Strategy

Images are tagged based on git release tags:

Tagged commit (e.g., git tag v1.0.0): Image is tagged as both ogerardin/x-notes:1.0.0 and ogerardin/x-notes:latest
Untagged commit: Image is tagged as ogerardin/x-notes:latest only (no version-specific tag)

Creating a Version Tag

# Create and push a semantic version tag
git tag v1.0.0
git push origin v1.0.0

# Then build the image
./build_multi.sh

This will build and push images tagged as ogerardin/x-notes:1.0.0 and ogerardin/x-notes:latest.

Building Without a Tag

# Build image with current commit (untagged)
./build_multi.sh

This will build and push the image tagged as ogerardin/x-notes:latest only.

Supported Platforms

The build script targets the following platforms:

linux/amd64
linux/arm64
linux/arm/v7

Note: Building multi-architecture images requires Docker Buildx and a compatible builder. The script will create the builder automatically if it doesn't exist.

Technical notes

Fetching the notes data

Community notes are made available as downloadable files on this page: https://x.com/i/communitynotes/download-data. As of now, this project handles the main data file containing the notes themselves ("Notes data"). Additional data such as note ratings, notes status history, user enrollment are currently not loaded.

While not documented (and hence could change anytime), the pattern of the notes data file URLs is as follows: https://ton.twimg.com/birdwatch-public-data/%Y/%m/%d/notes/notes-XXXXX.zip

The loader discovers all available files by checking for notes-00000.zip, notes-00001.zip, etc. until a 404 is returned. Since the frequency of updates is not documented either, and has been observed to lag several days in the past, the loader tries to fetch the latest files by trying to access the URL for the current date, and if it fails, going back one day at a time until it finds files that exist.

Getting the data into PostgreSQL

The structure of the notes data files is described on this page: https://communitynotes.x.com/guide/en/under-the-hood/download-data Fortunately, the TSV file provided by X/Twitter is already in a format that is compatible with PostgresQL COPY command, provided that the target table has the appropriate structure. This is the fastest way to load large amounts of data into Postgres.

The structure of the notes table is defined in sql/notes_ddl.sql, which is executed automatically by PostgresQL at startup when its database is empty. Table column names must match strictly the field names in the TSV file (first row). There can be additional columns in the table, but all columns in the TSV file must be present in the table.

If the TSV file structure changes in the future, the table structure will need to be updated accordingly. The easiest way is to delete the Docker volume containing the database files (named x-notes-db), update sql/notes_ddl.sql and let PostgresQL recreate the database and table from scratch at next start.

Enabling full-text search

The table currently contains a single extra column summary_ts which enables using PostgresQL full-text search capabilities. This column is generated (computed) using the to_tsvector function, and stored into a tsvector format. A GIN index is created on this column to allow for fast full-text search queries, using the summary_ts field as search vector. For details, see: https://www.postgresql.org/docs/current/textsearch.html

When querying the database through PostgREST, we use the special PostgREST operator wfts. on the summary_ts column; this translates to the websearch_to_tsquery PostgresQL function, which allows for a web-style user-friendly search syntax. For example, the search climate change will search for notes whose summary contains the words "climate" and "change", in any order, and with some tolerance for variations (e.g., "climate-change" or "climate's change" would also match), while the search "climate change" (with quotes) will search for the exact phrase "climate change" in the text note.

References:

TODO

manage permissions

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
cmd/api		cmd/api
config		config
sql		sql
www		www
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile-dist		Dockerfile-dist
Makefile		Makefile
OCI-COMMANDS.txt		OCI-COMMANDS.txt
README.md		README.md
build_multi.sh		build_multi.sh
compose.yaml		compose.yaml
nginx.conf.template		nginx.conf.template
release.sh		release.sh
test_versioning.sh		test_versioning.sh
update_image_oci.sh		update_image_oci.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Searchable X/Twitter Community Notes Database

Architecture

Method 1: Docker Compose

Requirements

Start the Docker compose stack

Optional

Run the Loader

Accessing the note search UI

Other Useful URLs

Method 2: Single Docker container

Build and start the Docker container

Running the loader

Monitoring the loader

Accessing the notes

Building and Pushing Multi-Architecture Images

Versioning Strategy

Creating a Version Tag

Building Without a Tag

Supported Platforms

Technical notes

Fetching the notes data

Getting the data into PostgreSQL

Enabling full-text search

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Searchable X/Twitter Community Notes Database

Architecture

Method 1: Docker Compose

Requirements

Start the Docker compose stack

Optional

Run the Loader

Accessing the note search UI

Other Useful URLs

Method 2: Single Docker container

Build and start the Docker container

Running the loader

Monitoring the loader

Accessing the notes

Building and Pushing Multi-Architecture Images

Versioning Strategy

Creating a Version Tag

Building Without a Tag

Supported Platforms

Technical notes

Fetching the notes data

Getting the data into PostgreSQL

Enabling full-text search

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages