Skip to content
View MustaphaU's full-sized avatar

Block or report MustaphaU

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MustaphaU/README.md

Technical Articles & Projects:

1. Building and Deploying a Multistage Multimodal Recommender system on Amazon Elastic Kubernetes Service

Towards Data Science Post: https://towardsdatascience.com/deploying-a-multistage-multimodal-recommender-system-on-amazon-eks-featuring-bloom-filters-feature-caching-and-contextual-recommendations

Code: https://github.com/MustaphaU/Multistage-Multimodal-Recommender-System-on-Amazon-EKS-with-NVIDIA-Merlin

Model Serving Pipeline Figure 1: The model serving pipeline

This project presents a multistage multimodal recommender system built and deployed on Amazon Elastic Kubernetes Service. It features online and offline feature stores backed by Athena+S3 and Valkey (Redis) respectively. User cold-start is managed via Feature masking, context-aware retrieval & ranking, and near real-time personalization with online feature updates. The system also ingests multimodal item features which can improve the content based signal and item cold-starts. Recently interacted items are filtered using a Valkey (Redis) backed Bloom filter.

@article{momoh2026multistage,
title={Deploying a Multistage Multimodal Recommender System on Amazon Elastic Kubernetes Service},
author={Momoh, Mustapha Unubi},
platform={Towards Data Science},
year={2026},
month={May},
url={https://towardsdatascience.com/deploying-a-multistage-multimodal-recommender-system-on-amazon-eks-featuring-bloom-filters-feature-caching-and-contextual-recommendations}
}

The system is operationalized with Kubeflow pipelines. One pipeline orchestrates the initial feature setup, training the models, and deploying the NVIDIA Triton Inference server. The second pipeline manages the periodic incremental fine-tuning of the query tower and the ranker.
The MLOps architecture Figure 2: MLOps architecture

2. Deploying a Ranking only recommender system based on Deep Cross Network (DCN) with AUC based drift triggered fine-tuning.

Medium Article: https://mustaphaunubi.medium.com/building-a-recommender-system-with-continuous-retraining-on-amazon-eks-with-nvidia-merlin-hugectr-5b734c71bbc5

Code: https://github.com/MustaphaU/Merlin-RecSys-MLOps-on-AWS

MLOps Figure 1: Ads-ranking MLOps with monitoring component for drift detection and auto-retraining

In this project, the DCN based recommendation model is trained on a subset of the Criteo 1TB logs dataset to predict Click Through Rates (CTR). The system includes a monitoring component that watches the system for performance drift and triggers incremental training run once drift is detected. The NVIDIA Triton Inference Server is autoscaled based on a custom latency metric via two options: Kubernetes HPA & Karpenter OR Kubernetes HPA & Cluster Autoscaler.

@article{momoh2026continuous,
 title={Building a single-stage Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin, HugeCTR, NVIDIA Triton Inference Server, and Kubeflow Pipelines},
 author={Momoh, Mustapha Unubi},
 platform={Medium},
 year={2026},
 month={March},
 url={https://mustaphaunubi.medium.com/building-a-recommender-system-with-continuous-retraining-on-amazon-eks-with-nvidia-merlin-hugectr-5b734c71bbc5}
}

Contact:

Email: mustaphaunubi@gmail.com.

Pinned Loading

  1. Multistage-Multimodal-Recommender-System-on-Amazon-EKS-with-NVIDIA-Merlin Multistage-Multimodal-Recommender-System-on-Amazon-EKS-with-NVIDIA-Merlin Public

    Deploying a Multimodal Recommender System on Kubernetes featuring Cold Start handling, Bloom Filters, and Feature Caching.

    Python 2

  2. Merlin-RecSys-MLOps-on-AWS Merlin-RecSys-MLOps-on-AWS Public

    Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin and Triton Inference Server

    Python 1

  3. RAGwithAmazonTitan RAGwithAmazonTitan Public

    A Demo of Retrieval Augmented Generation with Amazon Titan, Bedrock, Kendra, and LangChain

    Python 1

  4. Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM Public

    A simple project demonstrating LLM assisted review of documentation on Atlasssian Confluence.

    Python

  5. TensorRT-LLM TensorRT-LLM Public

    Forked from NVIDIA/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

    C++

  6. Complete-Les-Miserables-Animated-Transition Complete-Les-Miserables-Animated-Transition Public

    Animated Transition between Node-link and Adjacency Matrix

    HTML 3 1