AI Engineer / Sydney

Siddhant Karki
I build production ML systems end to end. Voice AI pipelines, model training and evaluation, knowledge infrastructure, tokenizer research. I work below the abstraction layer when needed.
Professional Experience
Work
AI Engineer
CallD.AIOwning two major production systems across live-call infrastructure and AI post-call analysis, from ingestion through storage, processing, APIs, and dashboards.
Pipeline & Infrastructure
Owned two generations of the AI analysis pipeline across Twilio, LiveKit, and Asterisk call sources, delivering end-to-end processing through S3, Deepgram STT, LLM analysis, and PostgreSQL persistence
Cut LLM token usage by 75% and improved processing speed 3x by redesigning speaker re-label system with turn-index maps, then further reduced cost through OpenAI-to-Bedrock migration and Nova Micro prompt caching
Improved distributed processing reliability with PostgreSQL FOR UPDATE SKIP LOCKED row locking, 10 concurrent workers, jittered retries, thundering herd prevention, and graceful shutdown
Analytics & Platform
Implemented per-utterance LLM-based speaker diarization and enforced schema-validated structured outputs with Pydantic, eliminating malformed LLM responses across the analysis pipeline
Built the analytics platform from data model to API to frontend, including materialized-view-backed reporting, 8+ dashboard endpoints, and a configurable widget system
Reduced dashboard load times by 75% through COALESCE removal, targeted PostgreSQL indexes, and SQL restructuring
Research & Open Source
Projects
MCP Knowledge Base with Hybrid Retrieval and Wiki Synthesis
Python / MCP / FastAPI / LanceDB / ONNX Runtime / FlashRank / NetworkX
Enabled AI agents to research across PDFs, videos, code, and web pages through a 23-tool MCP server that returns metadata-only results first (50 tokens each), letting agents evaluate relevance before committing context window to full-text retrieval
Improved retrieval MRR@5 by 68% over vector-only search with 6-signal hybrid search: vector similarity, BM25, fuzzy entity boost, RRF fusion, cross-encoder reranking, and query intent routing
Automated cross-source wiki generation where every claim links back to its source chunks with contradiction detection across sources and recursive gap-filling
Nepali Tokenizer Efficiency: Cross-Model Benchmark and Remediation Pipeline
Python / SentencePiece / PyTorch / HuggingFace / Qwen3 / LoRA
Proved a 2-6x tokenizer cost penalty for Nepali text by benchmarking 17 tokenizers across 9 LLM families on 2,054 Nepali documents with corpus-weighted metrics
Reduced training cost 3.5x with custom TrainableTokenEmbedding wrappers that froze 151K base rows and trained only 15K appended rows, cutting params from 1.4B to 38M
Extended 4 production tokenizers, then ran LoRA CPT and SFT on Qwen3-4B, cutting Phi-4 tokens/word by 52% and improving Nepali BPC from 1.96 to 1.07
Nepali ASR: Fine-tuned Qwen3-ASR with Cross-Dataset Evaluation
Qwen3-ASR-1.7B / PyTorch / HuggingFace / A100
Beat Meta MMS-1B on spontaneous speech (55.8% vs 62.4% WER on IndicVoices-R) and synthetic speech (31.4% vs 40.5% on OpenSLR-43) using only 157h of single-language fine-tuning
Benchmarked 8 ASR models across 3 domains, found and fixed a float16 dtype bug that invalidated prior Whisper evaluations for Nepali
Background
Education
Victoria University
Bachelor of Information Technology