Who gets to train the Indian model?

India is about to have its AI moment. The current public discourse is almost entirely about consumption. Almost none of it is about the much more important question of who will be calibrating the models and to whose standards.

In the last eighteen months, every major Indian newspaper has published a version of the same headline. "India is ready for AI." "AI will transform the Indian economy." "Indian startups embrace AI." The coverage is, with exceptions, enthusiastic, vague, and almost entirely focused on what India will do with AI once AI arrives.

I want to write about a question that is receiving almost no attention, and which I think will turn out to be the politically consequential one. The question is not what India will do with AI. The question is what standards the AI will be trained to, and by whom.

What training actually is, politically

A frontier AI model is not a neutral object. It is a system that has been taught, across billions of examples, what counts as a good answer. What counts as a correct image. What counts as a well-written sentence. What counts as funny, acceptable, tasteful, reasonable. These judgments are not inherent to the model. They are taught by humans, called raters or evaluators or domain experts, whose preferences become the model's preferences.

Who those humans are matters enormously. If the raters are predominantly from one country, one class, one education system, or one set of aesthetic assumptions, the model will inherit those assumptions. It will treat those assumptions as the default. Everything else will look, to the model, like a deviation that might be less desirable.

This is not a theoretical problem. It is, right now, the actual political economy of frontier AI. The vast majority of high-quality RLHF work on the biggest models is done by evaluators in a narrow demographic band. The resulting models are, predictably, better at tasks that band cares about and worse at tasks the band does not.

A note on my own position. I work at xAI. I am one of the Indian evaluators doing this work. I am not writing this piece despite that. I am writing it because of it. The vantage point I have on this problem is specific, and the view is not comforting.

What India is missing

The current Indian public conversation about AI is centered almost entirely on use cases. Will AI help small businesses? Will it transform healthcare? Will it displace jobs? All of these are real questions. None of them are the most important question.

The most important question is: when a frontier AI model encounters a specifically Indian cultural, aesthetic, linguistic, or social situation, does it have any idea what a good answer looks like? In my direct working experience: often, it does not. Not because the model is stupid, but because nobody in the training pipeline was well-calibrated on the Indian version of that situation.

The model has learned, in great detail, what an American wedding photograph is supposed to look like. It knows much less about what a Kolkata Durga Puja pandal photograph is supposed to look like. Not zero. But less. And the gap between "zero" and "less" is the gap that will, over the next five years, determine whether Indian users feel these models are made for them or merely available to them.

Why this will get worse before it gets better

There are three specific reasons I think the gap is likely to widen in the short term.

First, the economics favor scale over depth. Paying a thousand generalist raters in a low-wage country to produce training data is cheaper than paying a hundred domain experts in India to do it properly. The first option is, at current market rates, more attractive to AI labs than the second. Until or unless this changes, the training pipelines will remain scaled but shallow.

Second, India's institutional response is nascent. The government has started talking about "sovereign AI" in vague terms. A handful of Indian startups are training smaller models on Indian datasets. But there is, as of this writing, no serious public-sector investment in the kind of domain-expert evaluation infrastructure that would let Indian standards be legibly represented in frontier models. The conversation is still at the "we should do something" stage. It needs to be at the "here is what we have done" stage within three years, or the moment will pass.

Third, the framing of the problem is wrong. India is being encouraged, by its own commentariat and by global commentary alike, to think of AI as something to adopt. This is the consumer frame. The more useful frame is to think of AI as something to co-author. Co-authorship requires a seat at the training table. Adoption requires only a credit card.

What would actually help

I will name three specific things, each of which is achievable within the next two years if there is political will.

One. Public-sector funding for a domain-expert rater corps. Paid well. Selected rigorously. Covering visual, linguistic, culinary, architectural, religious, and social judgment. Not a large number. A few hundred people, paid at professional rates, with the brief of contributing evaluation signal to any AI lab that wants a serious model for Indian users. This is not industrial policy in the 1960s sense. It is the kind of public investment that, done well, pays itself back in the quality of the resulting models.

Two. Required provenance standards for AI-generated content in India, along the lines of what the EU is starting to pilot. If content was generated by a model trained to a specific standard, the audience should know. This is a consumer protection issue, not a creative freedom issue.

Three. A serious public university program in AI evaluation as a professional discipline, parallel to the way we think of medical residency or legal training. Domain expertise cannot be improvised. It has to be built. Right now, we are asking people with no training in evaluation to evaluate models that shape what a billion people see every day. This is, on inspection, absurd.

What will probably happen instead

Probably none of the above. Probably, India will continue to be a net consumer of models trained mostly elsewhere, with a small, loud domestic effort that produces sub-frontier alternatives, and with a commentariat that keeps writing about consumption. The window to do this differently is open right now. I am pessimistic about it staying open.

I am writing about this because I think the window being open is a fact that deserves to be stated publicly, even if the action that would follow from stating it does not occur. The question is not rhetorical. Who gets to train the Indian model? Right now, mostly not Indians. This is both a problem and an opportunity. The problem is going to be written about for a decade. The opportunity has about thirty-six months.

Who gets to train the Indian model?

What training actually is, politically

What India is missing

Why this will get worse before it gets better

What would actually help

What will probably happen instead

Ray's economic effect on Rajasthan, reconsidered

What professional colorists catch that ML engineers miss

Ground truth in visual AI, a literature map