Methodology

Sampling Methods That Actually Work in Uzbekistan

Probability vs non-probability sampling in the real Uzbek field: the sampling frame, the 2026 census, the mahalla as a cluster, migration, quotas, and weighting.

МИAISurvey Methodology14 min read

Textbook sampling assumes you have a complete list of the population and you draw random numbers from it. In the Uzbek field there usually is no such list, no addresses in the familiar sense, and half the adult men from a Fergana Valley mahalla — the neighborhood community — are working in Russia at the moment you knock. So the question isn't which sampling method is "most correct," but which one survives real conditions, and how to avoid a bias that no amount of later weighting can repair.

Probability vs non-probability: where the line falls

Every method belongs to one of two groups, and which one you are working in decides what you are even entitled to write in the report.

In a probability sample, every person has a known, non-zero chance of being selected. Only these samples let you correctly generalize findings to the whole population, compute confidence intervals, and say "with a margin of error of ±3%." In a non-probability sample, selection depends on the respondent's availability or the interviewer's choice — strict statistical estimates simply do not apply, however neat the final table looks.

The line matters because in Uzbekistan the temptation to slide into non-probability selection is enormous: there is no sampling frame, distances are long, and reaching Karakalpakstan takes a day. But if you sell a client a "representative national survey" while interviewers actually surveyed whoever happened to be home during the day, that is no longer a probability sample, and an honest report has to admit it.

Sample size affects precision. The selection method affects whether that precision means anything at all. A thousand correctly selected questionnaires beat ten thousand collected however they came.

The core problem of the Uzbek field: the sampling frame

A sampling frame is the list you draw respondents from. In countries with a population register this is a solved problem. In Uzbekistan there has historically been no such list available to researchers: address registries are incomplete, registered residence (propiska) does not match where people actually live, and no one hands you a ready list of households.

That is exactly why the big event of recent years is the population and agriculture census conducted from 15 January to 28 February 2026 (the Statistics Agency, formerly the State Statistics Committee). It is the first full census of the entire independence era. For us as researchers its practical meaning is huge: for the first time there is an authoritative sampling frame and benchmark distributions for post-stratification — by sex, age, region, and settlement type. The country's population is roughly 37.9 million, and that is now a recounted figure, not a rough guess.

One caveat: census results are released gradually, not as a single file on the closing day. So in 2026 the sensible tactic is to treat the census as the new gold standard for benchmarks and weighting, while checking which tables have actually been published before you build quotas on them.

The mahalla as a natural sampling unit

Since there is no full list of people, the country's structure comes to the rescue. Uzbekistan is divided into 9,000–10,000 mahallas — a ready-made grid of primary sampling units (PSUs). A mahalla is compact, has boundaries, a chairman (the raisi), and an office. For cluster sampling it is an almost perfect building block: you select mahallas, and within them households.

But this is also where the main trap hides. The raisi holds a list of the mahalla's households, and it is sorely tempting to take it as a ready sampling frame. Do not take it blindly. That list systematically omits:

  • labour migrants, formally deregistered or simply uncounted;
  • renters and newcomers leasing their housing;
  • unregistered residents and those who "on paper" live elsewhere.

As a result the raisi's list is skewed toward the settled, registered, loyal population. If you are studying exactly that group, it will do. If you need the whole population of adult residents, it will systematically discard the most mobile. How to work with the raisi without turning him into a source of bias is something we cover in detail in our piece on field access and local authorities.

Simple random sampling: elegant in theory

Every member of the population has an equal chance of being selected, like a lottery draw. The method is transparent, gives an unbiased estimate, and asks no assumptions of you. It has one flaw, but a fatal one: it needs a complete sampling frame — the very list that usually does not exist in the Uzbek field.

In practice, a "pure" simple random sample across the country at the household-selection stage is almost never seen here. But it works beautifully at other levels: randomly select mahallas from a regional list, randomly select route starting points, randomly select the respondent within a household. That is, you inject randomness not through one big draw but at every step of a multistage plan.

Stratified sampling: indispensable in Uzbekistan

The population is first split into homogeneous groups — strata — and then sampled within each. This guarantees that every important group lands in the sample and improves precision for the same sample size. For Uzbekistan stratification is not a luxury but a necessity, because the country is profoundly heterogeneous.

Which strata really matter

  • Tashkent — the most Russified, urban, and digital; a separate world in income and behavior.
  • The Fergana Valley (Andijan, Fergana, Namangan) — densely populated, more traditional, Uzbek-speaking, with high out-migration for work.
  • Karakalpakstan — a distinct language and identity, low density, vast distances, a sensitive context; a proper survey here requires a Karakalpak version of the questionnaire.
  • Samarkand and Bukhara — a significant share of Tajik-speaking respondents who are more comfortable answering in Tajik.

If you do not stratify by region and settlement type (urban/rural — and rural is roughly half the country), any "nationwide" sample quietly tilts toward wherever it was cheaper and easier to reach. Stratification forces you to allocate the workload honestly in advance.

Proportional vs disproportional allocation

There is a subtlety that trips up even experienced teams. If you want estimates for the country as a whole, the sample size per stratum is allocated proportionally to its share of the population — the Fergana Valley gets many interviews, sparsely populated Karakalpakstan few. But if you need reliable estimates within each region separately, the small strata have to be oversampled (taken above their share), or you will end up with 30 questionnaires for Karakalpakstan and no statistics at all. The proportions are then restored at the final stage by weighting. The allocation decision is made before the field, driven by what the client needs: a single national figure or breakdowns by region.

Cluster sampling: how the Uzbek field actually works

The population is divided into natural groups — clusters (mahallas, houses) — a subset is selected at random, and you work inside them. This is dramatically cheaper: instead of driving an interviewer to a single address in a distant district, you seat them in a selected mahalla for 15–20 interviews.

In practice almost every national study here is a multistage stratified cluster sample: the country is split into strata (region × urban/rural), within strata mahallas are randomly selected as PSUs, within a mahalla households are selected (by random route or from a list), and within a household a specific respondent is chosen. Major international programmes such as the UNICEF MICS surveys are built exactly this way, and they are a convenient methodological reference point.

The price of cheapness is the cluster effect: people within one mahalla resemble each other (income, language, way of life), so each additional interview in the same cluster adds less new information. To reach the same precision as simple random sampling, cluster sampling needs a larger size. The practical rule is not to squeeze too many interviews from one mahalla, but to take more mahallas a little at a time.

Quota sampling: the market's favorite tool and its trap

Interviewers recruit respondents so that the shares for given attributes (sex, age, region) match the population. The method is fast and cheap, which is why it appears in Uzbek market research more often than all the others combined. And that is fine — for many commercial tasks it is enough.

But let us call things by their names: it is a non-probability method. Quotas ensure the final structure matches on a few visible attributes, but they do not control whom exactly the interviewer picked inside the quota. And naturally the interviewer picks the convenient: the person on the street, the one who opened the door, the one who agreed. The hidden bias lives precisely here.

A classic Uzbek example: the quota "men aged 30–45" formally fills up, but because of labour migration there are almost no such men at home — so the interviewer completes the quota with the atypical "leftovers" (the unemployed, shift workers, those who returned ill). The structure by sex and age will match, while the hidden portrait of the group is skewed. The quota plastered over the hole with a number, but did not close it in substance.

When a quota is defensible and when it isn't

Don't throw quotas out — just apply them where they belong. Quota sampling is sensible when the attribute you study is weakly related to how the interviewer picks a respondent: packaging tests, brand awareness, reactions to advertising in Tashkent retail. Here the hidden "convenience" bias barely affects the result. A quota is dangerous when the measured quantity is directly tied to a person's availability: employment, income, migration plans, political sentiment — everything that distinguishes the one who stayed home from the one who left. A simple rule: the more the topic overlaps with who is even home and willing to talk, the less you should trust a quota and the more you need probability selection.

The migration that breaks any "who's home" sample

This is the Uzbek reality you cannot route around. Millions of working-age Uzbeks, predominantly men, work abroad — mainly in Russia and Kazakhstan, with the heaviest outflow from the Fergana Valley. The consequence for the fieldworker is simple and harsh: at the moment of the visit there are disproportionately many women, the elderly, and the young at home, while working-age adult men are systematically missing.

Any "we survey whoever opened the door" sample collapses because of this. You can fight back with three tools, and ideally all at once:

  1. Stratification and quotas with a sober awareness that men aged 25–45 will have to be deliberately "hunted" — in the evenings, on weekends, through repeat visits.
  2. Within-household selection rules (below), so the interviewer doesn't default to "whoever is convenient."
  3. Weighting at the final stage to the census sex-and-age structure — but as fine-tuning, not absolution.

On top of the migration skew sits a gender skew. In conservative and rural households a male stranger may not be received at all, and a woman may not come out to the survey without the male head present. So reaching adult men and reaching women are two different problems, solved in part by the composition of the field team: on "women's" topics (health, family, children) female interviewers are often irreplaceable. A selection grid not backed by the right gender mix in the crew stays theoretical — formally you must survey the head's wife, but in practice there is no one to send to her.

Selecting the respondent within the household

In Uzbekistan large multigenerational households are common: under one roof a grandmother, parents, married sons, grandchildren. If you do not set an explicit selection rule, the interviewer surveys whoever is convenient — usually the person who opened the door and agreed. This is a quiet but powerful bias.

So a formal selection step is built into the interviewer instructions:

  • The Kish grid — a predefined table that, given the number of adults in the household, unambiguously dictates whom to survey. Objective, but it demands discipline.
  • The last-birthday method — survey the adult whose birthday fell most recently. Easier to explain and to apply in a real courtyard.

In digital CAPI this rule can be baked straight into the questionnaire: the app itself asks for the household composition and names whom to survey, leaving the interviewer no room for a convenient choice. How to embed such logic in the instrument is something we discuss in our guide to questionnaire design, and verifying that the interviewer actually followed the rule is covered in our piece on field quality control.

Weighting: fine-tuning, not a lifeline

After the field, the sample is weighted: observations are given weights so the final structure matches a benchmark — now the 2026 census distributions by sex, age, region, and settlement type. This is a correct and necessary step. But it has a hard limit.

Weighting fixes known and measured skews. If your sample is short of men aged 25–45, weighting raises the weight of the ones you have. But if those few men are systematically unlike the absent ones — and with migration they are, since precisely the atypical remained — the raised weight only amplifies the bias rather than correcting it. Weighting cannot return to the sample those who were never in it.

Hence the practical conclusion: weighting is the finishing polish on good selection, not a substitute for bad selection. First you honestly build a probability, stratified, cluster design with within-household selection rules, and only then fine-tune the remainder to the census.

How to choose the method for your task

A quick guide through the typical situations of the Uzbek field:

  • A national representative population survey. Multistage stratified cluster sample: region × urban/rural strata, mahallas as PSUs, random-route household selection, Kish/last-birthday within, weighting to the census.
  • A fast market study in Tashkent. Quotas by sex/age/district are acceptable, but be aware of the hidden bias and do not present the result as a probability sample.
  • Studying a specific group (e.g. migrant households in Fergana). Stratification by migration status plus targeted selection; be ready for "who's home" to fail you.
  • A sensitive region such as Karakalpakstan. Budget for a Karakalpak version, long distances, and caution in wording.

Ready to turn this plan into a working instrument? Build a questionnaire with household-selection logic and quotas right in the AISurvey builder, and if you are just starting out, see our step-by-step introduction to the platform. More breakdowns on methodology and the field are collected in our blog.

Frequently asked questions

Can I use the mahalla raisi's household list as a sampling frame?
Only with your eyes open. The raisi's list systematically omits labour migrants, renters, and unregistered residents, so it is skewed toward the settled, registered population. For studying exactly that group it will do, but for a representative sample of all adult residents it discards the most mobile. It is safer to use the mahalla as a cluster and select households by random route.
How does the 2026 census change sampling work in Uzbekistan?
The census (15 January to 28 February 2026, the Statistics Agency) is the first full census of the independence era. For the first time it gives researchers an authoritative sampling frame and benchmark distributions by sex, age, region, and settlement type for post-stratification and weighting. Note that results are released gradually, so check which tables have already been published before building quotas on them.
How does labour migration affect the sample?
Because of out-migration for work (especially from the Fergana Valley), at the time of the survey there are disproportionately many women, elderly, and young people at home, while men aged 25–45 are systematically missing. Any 'whoever opened the door' sample is skewed as a result. You compensate with a combination: quotas that deliberately recruit men in evenings and on weekends, within-household selection rules, and weighting to the census.
How does quota sampling differ from stratified sampling?
In stratified sampling, selection within groups is random — a probability method to which confidence intervals apply. In quota sampling, the interviewer decides whom to survey as long as the quotas are met — a non-probability method that does not control hidden bias inside the quota. That makes stratified sampling more statistically sound, even though quota sampling is faster and cheaper.
Will weighting save a bad sample?
No. Weighting corrects known and measured skews (sex, age, region) against the census benchmark, but it cannot return people who were never in the sample at all. If the men who remained in the field are systematically unlike those who left, raising their weight amplifies the bias rather than fixing it. Weighting is the finishing polish on good selection, not a replacement for it.
#sampling in uzbekistan#representativeness#methodology#mahalla#2026 census#stratification
Share:Telegram

About the author

МИ

AISurvey Methodology

AISurvey methodologists on sampling, question wording, and data quality in social and market research.