Available for Q3 — 2026

Teachingmachinestounderstandhumans.

I'm Gautam — a data scientist, ML engineer and linguist building human-grade datasets, annotation pipelines and evaluation frameworks for frontier AI labs.

Projects shipped120+

Languages worked in14

Annotations reviewed2.4M

Scroll

Outlier✦

DataAnnotation.tech✦

TransPerfect✦

RWS✦

Invisible AI✦

Scale AI✦

Surge HQ✦

Labelbox✦

Outlier✦

DataAnnotation.tech✦

TransPerfect✦

RWS✦

Invisible AI✦

Scale AI✦

Surge HQ✦

Labelbox✦

01 — Selected Work

Inside the labs shaping modern AI.

Outlier

Senior AI Trainer & Linguistic Reviewer

Lead reviewer for multilingual RLHF projects. Built rubric systems for code, reasoning and creative writing tasks used across 200+ contractors.

DataAnnotation.tech

Senior Annotator — LLM Evaluation

Side-by-side preference judgements, jailbreak analysis and chain-of-thought grading for frontier chat models.

Invisible AI

Data Operations Specialist

Pipeline ownership for vision + language datasets. Reduced annotation cycle time by 38% through taxonomy refactoring.

TransPerfect

Computational Linguist

Localization QA, MT post-editing and terminology management for enterprise clients across EMEA & APAC.

RWS

Linguistic Quality Specialist

Authored style guides and TM curation strategies powering high-volume translation workflows.

02 — Expertise

A toolkit forged at the seam of language & machines.

Data Annotation

Bounding boxes, NER, intent, sentiment, multi-turn dialog labelling across 14+ languages.

NERIntentRLHF

Data Labelling Ops

Designing taxonomies, guidelines and QA loops for 50+ annotator teams.

TaxonomyQAIAA

Linguistic Expertise

Computational morphology, syntax trees, dialectal nuance and code-switching analysis.

MorphologySyntax

LLM Evaluation

Red-teaming, hallucination scoring and side-by-side preference modelling.

RLHFDPOEval

ML Engineering

Fine-tuning, embeddings, retrieval pipelines and production-grade inference.

PyTorchHFvLLM

ML Training

Curating instruction datasets, SFT and reward modelling for chat assistants.

SFTReward

Data Collection

Crowd-sourced speech, image and text corpora with consent-first workflows.

SpeechVision

Data Science

Experimentation, statistical modelling and decision-grade analytics dashboards.

StatsA/B

03 — Process

From messy reality to model-ready signal.

Discover

Map your dataset, model behaviour and downstream metric. We define what 'good' looks like before a single label is drawn.

Design

Author guidelines, edge-case catalogues, taxonomy and inter-annotator agreement tests.

Deploy

Calibrate annotators, run gold-standard pilots and stand up QA dashboards with live drift monitoring.

Deliver

Ship versioned datasets, evaluation reports and the playbook your team needs to scale without me.

04 — Linguistic Range

Four working languages. One craft.

Native

English

100%

Native

Hindi

अ

100%

Native

Marathi

म

95%

Professional

French

Éé

80%

05 — Words

What teams say after we ship.

Gautam rewrote our annotation guidelines in a weekend and our IAA jumped 22 points. Rare combination of rigor and speed.

— Head of Data, Frontier AI Lab

The only reviewer I trust to find the failure mode I didn't think to look for.

— Research Engineer, RLHF Team

A linguist who actually understands the model. We'd hire him as a full-time researcher tomorrow.

— ML Lead, Multilingual NLP

06 — About

A linguist who fell in love with neural networks.

I started in comparative linguistics — chasing the grammar of dying dialects and the wandering etymologies of trade languages. Then transformers happened. Suddenly the questions I'd been asking about language could be answered, scaled, and broken in entirely new ways.

Today I sit at the intersection of human judgement and model behaviour. I label, I train, I evaluate. I write the guidelines, run the QA, and ship the dataset. When teams need someone who can speak both phonology and PyTorch, I get the call.

When I'm offline you'll find me in a library learning a fifteenth alphabet, or quietly losing at chess.

BasedRemote / Asia

TimezoneGMT+5:30

StackPyTorch · HF · Label Studio · Prodigy

ReadingBender & Koller — 'Climbing towards NLU'

CurrentlyBuilding eval rubrics for multilingual reasoning

07 — Get in touch

Let's build something human-grade.

Available for contract roles, annotation audits, dataset reviews and long-term residencies. Replies within 24h.

Status Open to work

ResponseWithin 24h

Based inIndia · Remote