Skip to content
View CodeNinjaSarthak's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report CodeNinjaSarthak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
CodeNinjaSarthak/README.md

AI Engineer focused on production GenAI systems, reliable LLM infrastructure, retrieval systems, and robustness evaluation under real-world constraints.


About Me

class SarthakChauhan:

    role = [
        "AI Engineer",
        "ML Systems Builder",
        "Research Engineer"
    ]

    interests = [
        "Reliable LLM Systems",
        "Distributed Inference",
        "Retrieval-Augmented Generation",
        "Vision Robustness",
        "Temporal Memory Systems",
        "Evaluation Under Distribution Shift"
    ]

    currently_building = [
        "Production-scale GenAI infrastructure",
        "Long-context retrieval and reranking systems",
        "Low-latency async AI systems",
        "Reliable LLM evaluation pipelines"
    ]

What I Work On

⚑ Production AI Systems

  • Built production LLM systems serving 1000+ users
  • Reduced generation latency from 21s β†’ 6s
  • Designed async orchestration using asyncio.gather
  • Built provider fallback routing: Azure β†’ Claude / Gemini
  • Engineered Redis worker pipelines with bounded concurrency
  • Implemented SSE streaming, rate limiting, and circuit breakers

πŸ”¬ Research & Evaluation

  • 3 IEEE publications (2 first-author)
  • Working on memory systems for temporal reasoning
  • Evaluating robustness under distribution shift
  • Benchmarking calibration across vision architectures
  • Researching retrieval quality and reranking systems
  • Building RL environments for AI safety evaluation

Featured Work

πŸš€ SafeAct-Env

AI Safety RL Environment Finalist β€” Meta Γ— Scaler PyTorch OpenEnv Hackathon (Top 2.6%)

  • Multi-task RL environment for reversible vs irreversible actions
  • Deterministic graders with hidden risk classifier
  • 164 passing tests with reproducible evaluation
  • Built across infra, filesystem, DB, and medical safety tasks

Stack: Python FastAPI Docker RL


🧠 Eidetic Memory

Memory System for Conversational AI

  • Achieved 56.3% LoCoMo QA
  • +39.3 pp temporal improvement over RAG baseline
  • Per-speaker memory isolation + neural reranking
  • Averaged only 1.9 LLM calls/query

Focus Areas: Temporal reasoning β€’ retrieval β€’ reranking β€’ memory systems

Stack: FastAPI Qdrant Cross-Encoder LLMs


⚑ StreamMind

Real-Time Semantic Question Clustering

  • Reduced instructor response time by 68%
  • Designed fault-tolerant async processing pipeline
  • Handled 100+ concurrent doubts
  • Semantic deduplication using online clustering

Infra: Redis workers β€’ pgvector β€’ WebSockets β€’ circuit breakers

Stack: FastAPI Redis pgvector Gemini


🏫 Medha AI

Production GenAI System @ Cograd

  • Serving curriculum-aligned generation workflows
  • Reduced lesson-plan latency 3.5Γ—
  • Reduced exam generation latency 2.5Γ—
  • Multi-provider orchestration with graceful degradation
  • Multi-HyDE retrieval + reranking pipeline

Infra: Async orchestration β€’ Redis β€’ Qdrant β€’ Azure OpenAI

Stack: FastAPI Redis Qdrant MongoDB



Selected Research

Vision Robustness & Calibration

Evaluating 12 ImageNet-pretrained architectures across IN-Val, IN-V2, IN-R, IN-A, and IN-C using:

  • ECE
  • AURC
  • selective prediction
  • corruption robustness
  • universal failure analysis

Dense-Fog Highway Dehazing

Benchmarked 10 dehazing architectures and identified a 15–20 dB PSNR gap between synthetic benchmarks and real dense-fog highway conditions.

Hinglish Abuse Detection

Improved F1 from 0.784 β†’ 0.866 on a 700K-post dataset using:

  • XLM-R transfer learning
  • BiGRU attention fusion
  • multilingual representation learning

Publications

πŸ“„ Hinglish Abusive Comment Detection Using Transformer-Based Models

AICAPS 2026 β€” IEEE Kerala Section First Author

πŸ“„ Image and Video Dehazing for Dense-Fog Indian Highway Scenarios

DICCT 2026 First Author

πŸ“„ Deep Learning-Based Brain Tumour Identification

IC3SE 2025 β€” IEEE UP Section Second Author


Tech Stack

Languages & ML

Python PyTorch TensorFlow FastAPI

LLM & Retrieval

LangChain LangGraph Qdrant pgvector

Systems & Infra

Redis PostgreSQL Docker Azure


Achievements

  • πŸ† Meta Γ— Scaler PyTorch OpenEnv Hackathon β€” Finalist (Top 2.6%)
  • πŸ† Amazon ML Challenge 2024 β€” Top 0.5%
  • πŸ† IIT Bombay Convolve β€” Top 50 / 4189 teams
  • πŸŽ“ Dean’s List β€” Top 10%
  • πŸ“š GPA: 9.42 / 10.0

GitHub Stats




Activity Graph


Connect

Building reliable AI systems, retrieval infrastructure, and evaluation pipelines.



Pinned Loading

  1. abusive-detection abusive-detection Public

    Abusive language detection for code-mixed Hinglish using mBERT, XLM-RoBERTa, and hybrid Transformer-BiLSTM models on 824K ShareChat comments.

    Jupyter Notebook 1

  2. safeact-env safeact-env Public

    SafeAct-Env: An OpenEnv environment for training agents to identify and avoid irreversible actions across 5 real-world operational domains

    Python 1

  3. architectai architectai Public

    Multi-agent system that interrogates your idea, runs live research, and produces a structured architecture plan you can ship from.

    Python 1

  4. eidetic-memory eidetic-memory Public

    Per-speaker memory isolation with neural reranking for multi-party LLM agents. 56.3% on LoCoMo (+39.3 pp on temporal over RAG) at 1.9 LLM calls per query.

    Python 1

  5. imagenet-failure-universality imagenet-failure-universality Public

    Cross-architectural audit of 12 ImageNet-pretrained vision models across 5 distribution shifts. Finds 7.10% of clean ImageNet is misclassified by every model simultaneously (p<10⁻⁴), climbing to 35…

    Jupyter Notebook 1

  6. speakql speakql Public

    Agentic NL-to-SQL engine with graph-based JOIN discovery. Query any database in plain English.

    Python 1