Community Notes is a feature of X (formerly Twitter) where contributors can add context such as fact-checks under a post, see https://x.com/i/communitynotes
Unfortunately X/Twitter does not provide a way to search through the notes, either through a web interface or an API. This project is intended to fill that gap by providing a searchable database of the notes.
This project was born from the need to counter online disinformation on X/Twitter efficiently after russia's invasion of 🇺🇦 Ukraine in 2022.
Note: the design was influenced by a similar project https://github.com/bpettis/birdwatch-scraper
The application consists of the following components:
- PostgreSQL database - Stores the notes and import history
- PostgREST - Auto-generates a RESTful API from the database schema
- Go API server - Handles scheduled imports, download management, and control endpoints
- Nginx - Reverse proxy for PostgREST, Go API, and serves the static web UI
- Web UI - AlpineJS-based search interface with full-text search
- Optional: Swagger UI for API docs, Adminer for database management
Two deployment modes are supported:
- Docker Compose (development): Separate containers for each service, easier debugging
- Single container (production): All services bundled in one image for simple deployment
In both modes, data persists in the Docker volume x-notes-db.
Note: PostgreSQL 18 requires the PGDATA environment variable to be set for data persistence to work correctly. This is already configured in both the Dockerfile and docker-compose.yml.
Version: This is version 0.0.14
In this version, we use Docker Compose to start separate containers for the PostgreSQL database, PostgREST, Nginx, and the Go API server.
- Internet connection
- Docker with Compose plug-in
docker compose up -dThis will start the following services:
db: PostgreSQL databasepostgrest: RESTful API servernginx: Reverse proxy
You may also start the following services:
swagger: Swagger UI for PostgREST API
docker compose scale swagger=1 adminer: Web-based database management tool
docker compose scale adminer=1To load the notes into the database, trigger the import via API:
# Full import (all available files)
curl -X POST http://localhost:8080/api/imports/create
# Check import status
curl http://localhost:8080/api/imports/current
# List import history
curl http://localhost:8080/api/importsOr use the web UI at http://localhost:8080 (Admin tab)
Open the following URL: http://localhost:8080
| Service | URL |
|---|---|
| PostgREST sample query | http://localhost:3000/note?limit=50&summary_ts.fts.Nigeria&select=summary |
| PostgREST sample query through nginx | http://localhost:8080/api/note?limit=50 |
Adminer (requires adminer container) |
http://localhost:8082/?pgsql=db&username=postgres&db=postgres&ns=public&table=note |
Postgrest SwaggerUI (requires swagger container) |
http://localhost:8081 |
| Postgres direct connection | localhost:5432 |
In this version, we use a single Docker container that runs PostgreSQL, PostgREST, Nginx, and the Go API server with built-in scheduler.
make runThis will start a container named x-notes with all the services running inside it.
To load the notes into the database, trigger the import via API:
# Full import (all available files)
curl -X POST http://localhost:8080/api/imports/create
# Check import status
curl http://localhost:8080/api/imports/currentWhile the loader is running, you can monitor the logs of the container:
docker logs -f x-notesYou can also query this special Postgres COPY Progress Reporting view to see the progress of the loading:
docker exec -it x-notes psql -U postgres -d postgres -c "SELECT * FROM pg_stat_progress_copy;"Also accessible via PostgREST API:
curl http://localhost:8080/api/pg_stat_progress_copyMonitor the tuples_processed field to see how many notes have been loaded. The total number of notes to load can be
known by counting the lines of the notes file (minus 1 for the header):
docker exec -it x-notes /bin/sh -c "wc -l /home/data/*.tsv"Open the following URL: http://localhost:8080
The build_multi.sh script builds and pushes multi-architecture Docker images with automatic versioning.
Images are tagged based on git release tags:
- Tagged commit (e.g.,
git tag v1.0.0): Image is tagged as bothogerardin/x-notes:1.0.0andogerardin/x-notes:latest - Untagged commit: Image is tagged as
ogerardin/x-notes:latestonly (no version-specific tag)
# Create and push a semantic version tag
git tag v1.0.0
git push origin v1.0.0
# Then build the image
./build_multi.shThis will build and push images tagged as ogerardin/x-notes:1.0.0 and ogerardin/x-notes:latest.
# Build image with current commit (untagged)
./build_multi.shThis will build and push the image tagged as ogerardin/x-notes:latest only.
The build script targets the following platforms:
linux/amd64linux/arm64linux/arm/v7
Note: Building multi-architecture images requires Docker Buildx and a compatible builder. The script will create the builder automatically if it doesn't exist.
Community notes are made available as downloadable files on this page: https://x.com/i/communitynotes/download-data. As of now, this project handles the main data file containing the notes themselves ("Notes data"). Additional data such as note ratings, notes status history, user enrollment are currently not loaded.
While not documented (and hence could change anytime), the pattern of the notes data file URLs is as follows:
https://ton.twimg.com/birdwatch-public-data/%Y/%m/%d/notes/notes-XXXXX.zip
The loader discovers all available files by checking for notes-00000.zip, notes-00001.zip, etc. until a 404 is returned. Since the frequency of updates is not documented either, and has been observed to lag several days in the past, the loader tries to fetch the latest files by trying to access the URL for the current date, and if it fails, going back one day at a time until it finds files that exist.
The structure of the notes data files is described on this page: https://communitynotes.x.com/guide/en/under-the-hood/download-data
Fortunately, the TSV file provided by X/Twitter is already in a format that is compatible with PostgresQL COPY command,
provided that the target table has the appropriate structure. This is the fastest way to load large amounts of data
into Postgres.
The structure of the notes table is defined in sql/notes_ddl.sql, which is executed automatically by PostgresQL
at startup when its database is empty. Table column names must match strictly the field names
in the TSV file (first row). There can be additional columns in the table,
but all columns in the TSV file must be present in the table.
If the TSV file structure changes in the future, the table structure will need to be updated accordingly. The easiest way
is to delete the Docker volume containing the database files (named x-notes-db), update sql/notes_ddl.sql and let
PostgresQL recreate the database and table from scratch at next start.
The table currently contains a single extra column summary_ts which enables using PostgresQL full-text search
capabilities. This column is generated (computed) using the to_tsvector function, and stored into a tsvector format.
A GIN index is created on this column to allow for fast full-text search queries, using the summary_ts field as search
vector.
For details, see: https://www.postgresql.org/docs/current/textsearch.html
When querying the database through PostgREST, we use the special PostgREST operator wfts. on the summary_ts column;
this translates to the websearch_to_tsquery PostgresQL function, which allows for a web-style user-friendly search
syntax. For example, the search climate change will search for notes whose summary contains the words
"climate" and "change", in any order, and with some tolerance for variations (e.g., "climate-change" or "climate's change" would also match), while
the search "climate change" (with quotes) will search for the exact phrase "climate change" in the text note.
References:
- manage permissions