AI Engineer / Sydney

Siddhant Karki

Siddhant Karki

I build production ML systems end to end. Voice AI pipelines, model training and evaluation, knowledge infrastructure, tokenizer research. I work below the abstraction layer when needed.

Professional Experience

Work

AI Engineer

CallD.AI
2025 - Present

Owning two major production systems across live-call infrastructure and AI post-call analysis, from ingestion through storage, processing, APIs, and dashboards.

Pipeline & Infrastructure

Owned two generations of the AI analysis pipeline across Twilio, LiveKit, and Asterisk call sources, delivering end-to-end processing through S3, Deepgram STT, LLM analysis, and PostgreSQL persistence

Cut LLM token usage by 75% and improved processing speed 3x by redesigning speaker re-label system with turn-index maps, then further reduced cost through OpenAI-to-Bedrock migration and Nova Micro prompt caching

Improved distributed processing reliability with PostgreSQL FOR UPDATE SKIP LOCKED row locking, 10 concurrent workers, jittered retries, thundering herd prevention, and graceful shutdown

Analytics & Platform

Implemented per-utterance LLM-based speaker diarization and enforced schema-validated structured outputs with Pydantic, eliminating malformed LLM responses across the analysis pipeline

Built the analytics platform from data model to API to frontend, including materialized-view-backed reporting, 8+ dashboard endpoints, and a configurable widget system

Reduced dashboard load times by 75% through COALESCE removal, targeted PostgreSQL indexes, and SQL restructuring

Research & Open Source

Projects

MCP Knowledge Base with Hybrid Retrieval and Wiki Synthesis

Python / MCP / FastAPI / LanceDB / ONNX Runtime / FlashRank / NetworkX

GitHub
23
MCP Tools
68%
MRR@5 Improvement
6
Search Signals

Enabled AI agents to research across PDFs, videos, code, and web pages through a 23-tool MCP server that returns metadata-only results first (50 tokens each), letting agents evaluate relevance before committing context window to full-text retrieval

Improved retrieval MRR@5 by 68% over vector-only search with 6-signal hybrid search: vector similarity, BM25, fuzzy entity boost, RRF fusion, cross-encoder reranking, and query intent routing

Automated cross-source wiki generation where every claim links back to its source chunks with contradiction detection across sources and recursive gap-filling

Nepali Tokenizer Efficiency: Cross-Model Benchmark and Remediation Pipeline

Python / SentencePiece / PyTorch / HuggingFace / Qwen3 / LoRA

Case Study
17
Tokenizers Benchmarked
52%
Phi-4 Token Reduction
45.6%
BPC Improvement

Proved a 2-6x tokenizer cost penalty for Nepali text by benchmarking 17 tokenizers across 9 LLM families on 2,054 Nepali documents with corpus-weighted metrics

Reduced training cost 3.5x with custom TrainableTokenEmbedding wrappers that froze 151K base rows and trained only 15K appended rows, cutting params from 1.4B to 38M

Extended 4 production tokenizers, then ran LoRA CPT and SFT on Qwen3-4B, cutting Phi-4 tokens/word by 52% and improving Nepali BPC from 1.96 to 1.07

Nepali ASR: Fine-tuned Qwen3-ASR with Cross-Dataset Evaluation

Qwen3-ASR-1.7B / PyTorch / HuggingFace / A100

Case Study
8
Models Evaluated
2/3
Datasets Beat MMS
~$7
Compute Cost

Beat Meta MMS-1B on spontaneous speech (55.8% vs 62.4% WER on IndicVoices-R) and synthetic speech (31.4% vs 40.5% on OpenSLR-43) using only 157h of single-language fine-tuning

Benchmarked 8 ASR models across 3 domains, found and fixed a float16 dtype bug that invalidated prior Whisper evaluations for Nepali

Background

Education

Victoria University

Bachelor of Information Technology

2023 - 2025GPA 6.4 / 7.0