Teachingmachinestounderstandhumans.
I'm Gautam — a data scientist, ML engineer and linguist building human-grade datasets, annotation pipelines and evaluation frameworks for frontier AI labs.
Inside the labs shaping modern AI.
Outlier
Senior AI Trainer & Linguistic Reviewer
Lead reviewer for multilingual RLHF projects. Built rubric systems for code, reasoning and creative writing tasks used across 200+ contractors.
DataAnnotation.tech
Senior Annotator — LLM Evaluation
Side-by-side preference judgements, jailbreak analysis and chain-of-thought grading for frontier chat models.
Invisible AI
Data Operations Specialist
Pipeline ownership for vision + language datasets. Reduced annotation cycle time by 38% through taxonomy refactoring.
TransPerfect
Computational Linguist
Localization QA, MT post-editing and terminology management for enterprise clients across EMEA & APAC.
RWS
Linguistic Quality Specialist
Authored style guides and TM curation strategies powering high-volume translation workflows.
A toolkit forged at the seam of language & machines.
Data Annotation
Bounding boxes, NER, intent, sentiment, multi-turn dialog labelling across 14+ languages.
Data Labelling Ops
Designing taxonomies, guidelines and QA loops for 50+ annotator teams.
Linguistic Expertise
Computational morphology, syntax trees, dialectal nuance and code-switching analysis.
LLM Evaluation
Red-teaming, hallucination scoring and side-by-side preference modelling.
ML Engineering
Fine-tuning, embeddings, retrieval pipelines and production-grade inference.
ML Training
Curating instruction datasets, SFT and reward modelling for chat assistants.
Data Collection
Crowd-sourced speech, image and text corpora with consent-first workflows.
Data Science
Experimentation, statistical modelling and decision-grade analytics dashboards.
From messy reality to model-ready signal.
Discover
Design
Deploy
Deliver
Four working languages. One craft.
English
Hindi
Marathi
French
What teams say after we ship.
Gautam rewrote our annotation guidelines in a weekend and our IAA jumped 22 points. Rare combination of rigor and speed.
The only reviewer I trust to find the failure mode I didn't think to look for.
A linguist who actually understands the model. We'd hire him as a full-time researcher tomorrow.
A linguist who fell in love with neural networks.
I started in comparative linguistics — chasing the grammar of dying dialects and the wandering etymologies of trade languages. Then transformers happened. Suddenly the questions I'd been asking about language could be answered, scaled, and broken in entirely new ways.
Today I sit at the intersection of human judgement and model behaviour. I label, I train, I evaluate. I write the guidelines, run the QA, and ship the dataset. When teams need someone who can speak both phonology and PyTorch, I get the call.
When I'm offline you'll find me in a library learning a fifteenth alphabet, or quietly losing at chess.
Let's build something human-grade.
Available for contract roles, annotation audits, dataset reviews and long-term residencies. Replies within 24h.