Skip to content

AIDAVA-DEV/entity-alignment-public

Repository files navigation

Entity Alignment for Medical Terminologies

Code for the paper "A Two-Method Framework for Aligning Medical Terminologies", presented at SeWebMeDA-2026, May 10, 2026, Dubrovnik, Croatia.

Ostaszewski S., Kılıç Ö.D., Erol E.E., Dumontier M., Celebi R. (2026). A Two-Method Framework for Aligning Medical Terminologies. SeWebMeDA-2026.

Overview

This framework aligns disparate medical terminologies — CPT (procedures), NDC (medications), and ICD-9 (diagnoses) — to SNOMED CT. It combines expert-curated reference sets with two algorithmic strategies:

  • Setting A — Missing translation imputation (CPT & NDC): Uses medBERT embeddings and a Logistic Regression classifier to predict SNOMED CT matches for unmapped codes based on textual similarity.
  • Setting B — Context-aware ranking (ICD-9): Resolves 1-to-many mappings by ranking SNOMED CT candidates using Node2Vec graph embeddings of the SNOMED CT hierarchy, weighted by the patient's clinical context from their EHR.

Each aligned code is assigned a confidence score (κ) that reflects how decisively the patient's context supports the chosen translation.

Repository Structure

.
├── app.py                  # FastAPI service exposing /align and /align_graph endpoints
├── RankingModel.py         # Main ranking model (Setting B)
├── ContextExtractor.py     # Extracts patient context codes from an RDF graph
├── embedding_decoder.py    # Loads and queries embeddings
├── CodeTreeNode.py         # SNOMED CT hierarchy utilities
├── models.py               # Pydantic request/response models
├── utils.py                # Shared helpers
├── mappers/
│   ├── cpt_snomed_mapper.py
│   ├── icd_snomed_mapper.py
│   ├── loinc_snomed_mapper.py
│   └── ndc_snomed_mapper.py
└── tests/

Installation

Requires Python ≥ 3.10. Dependencies are managed with uv.

uv sync

Running the API

uv run uvicorn app:app --host 0.0.0.0 --port 8000

Or with Docker:

docker build -t entity-alignment .
docker run -p 8000:8000 entity-alignment

Environment variables (.env or Docker args):

Variable Description Default
SPARQL_ENDPOINT SPARQL endpoint for patient graphs http://localhost:3030/dataset/sparql
SPARQL_USERNAME SPARQL auth username
SPARQL_PASSWORD SPARQL auth password
DECODER_PATH Path to the trained decoder model
EMBEDDINGS Path to the embeddings pickle file

API Endpoints

POST /align — Fetch patient graph from SPARQL and return ranked SNOMED CT suggestions.

POST /align_graph — Same, but accepts an RDF graph payload directly.

Both endpoints return a ranked list of SNOMED CT URIs with confidence scores.

Results

Method Set 1 Hits@1 Set 1 Hits@5
medBERT text only 0.06 – 0.12 0.53 – 0.63
Shortest SNOMED path 0.44 – 0.52 0.82 – 0.88
Node2Vec (ours) 0.37 – 0.45 0.82 – 0.88

Node2Vec matches the accuracy of exact shortest-path computation while being ~480× faster (1,000 rankings/second vs. 8 minutes).

Acknowledgments

This work is supported by the Horizon Europe Framework Program under Grant Agreement No. 101057062 (AIDAVA). SNOMED CT licenses are handled via this project.

License

LICENSE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors