From RUM to Gravity: Using Random Utility to Derive Migration Flows.

This post outlines the journey from the micro-foundations of the Random Utility Model (RUM) to the macro-level Gravity Model often used in migration studies and many other areas. We can see how individual location choices, driven by utility maximization, aggregate up to predictable migration flows influenced by origin characteristics, destination attractiveness, and the costs connecting them. McFadden's Random Utility Maximization framework is frequently used in many models of choosing one option from a choice set, and it can be illustrative to work through it.

1. The Random Utility Model Framework

The RUM is a workhorse for modeling individual choice, including the decision to migrate. The core idea is that an individual $i$ evaluates the utility $U_{ij}$ derived from choosing alternative $j$ (e.g., a location). This utility is assumed to have two components: a systematic part $V_{ij}$, which depends on observable characteristics, and a random, unobserved part $\epsilon_{ij}$.

$$ U_{ij} = V_{ij} + \epsilon_{ij} $$

An individual will choose the alternative $j$ from their available choice set $C_i$ that offers the highest utility.

$$ \forall k \in C_i \quad U_{ik} \le U_{ij} $$

Since the $\epsilon_{ij}$ terms are random, we can only speak of the *probability* that alternative $j$ is chosen. This probability, $P_{ij}$, is the chance that the utility of $j$ exceeds the utility of all other options $k$ in the choice set $C_i$.

$$ P_{ij} = P(V_{ik}+\epsilon_{ik} \le V_{ij} + \epsilon_{ij} \quad \forall k \in C_i, k \ne j) $$

Rearranging this gives:

$$ P_{ij} = P(\epsilon_{ik} - \epsilon_{ij} \le V_{ij} - V_{ik} \quad \forall k \in C_i, k \ne j) $$

A common assumption, leading to the Multinomial Logit (MNL) model, is that the random terms $\epsilon_{ij}$ are independently and identically distributed (IID) following a Gumbel distribution (meaning that the IIA assumption (Independence of Irrelevant Alternatives) holds). Under this assumption, the choice probability takes a convenient closed form:

$$ P_{ij} = \frac{e^{V_{ij}}}{\sum_{k \in C_i} e^{V_{ik}}} $$

(This formula tells us that the probability of choosing location $j$ depends on its own systematic utility relative to the sum of exponentiated utilities of all possible choices, including staying put.)

2. Specifying Migration Utility and Aggregate Flows

Let's apply this to migration. Consider the utility for an individual $i$ currently at origin $o$ choosing destination $j$. A common specification for the systematic utility $V_{ij}$ (or $V_{oj}$ when considering migration from $o$ to $j$) is:

$$ V_{oj} = \alpha_j - \beta c_{oj} \quad (+ \gamma' Z_i) $$

Here, $\alpha_j$ represents the intrinsic attractiveness of destination $j$ (e.g., wages, amenities), $c_{oj}$ is the cost of migrating from $o$ to $j$ (e.g., distance, cultural differences, with $c_{oo}=0$ for staying), and $\beta$ is a parameter reflecting sensitivity to cost. $Z_i$ could include individual characteristics, but we often focus on location attributes for aggregate models.

To move from individual probabilities to aggregate migration flows, $M_{oj}$, we multiply the number of potential migrants at the origin, $N_o$, by the average probability of choosing destination $j$, $\bar{P}_{oj}$.

$$ M_{oj} = N_o \times \bar{P}_{oj} $$

3. The RUM-Gravity Connection

Substituting the MNL probability formula (assuming homogeneity or averaging over individuals) into the aggregate flow equation yields:

$$ M_{oj} = N_o \times \frac{e^{V_{oj}}}{\sum_{k \in C_o} e^{V_{ok}}} = N_o \times \frac{e^{\alpha_j - \beta c_{oj}}}{\sum_{k \in C_o} e^{\alpha_k - \beta c_{ok}}} $$

We can rewrite this to highlight its structure:

$$ M_{oj} = \frac{(N_o) \times (e^{\alpha_j}) \times (e^{-\beta c_{oj}})}{\sum_{k \in C_o} e^{V_{ok}}} $$

This equation strongly resembles the classic Gravity Model of migration:

$N_o$: Represents the "mass" or push factors of the origin.
$e^{\alpha_j}$: Represents the attractiveness or pull factors of the destination.
$e^{-\beta c_{oj}}$: Represents the inverse of the migration cost, often related to distance or friction between locations.
$\sum_{k \in C_o} e^{V_{ok}}$: This denominator term is crucial. It's often called the "Multilateral Resistance" or "Accessibility" term. It captures the influence of *all* possible destinations (including staying put) on the flow between a specific origin-destination pair. It normalizes the probability, ensuring that flows sum correctly.

(Thus, the RUM provides a behavioural micro-foundation for the empirically successful, but often ad-hoc, Gravity Model.)

4. Developing an Empirical Specification

To estimate this model using data, we can work with the logarithm of the migration flow. Let $M_{odt}$ be the flow from origin $o$ to destination $d$ at time $t$. Taking the log of the RUM-derived flow expression gives (conceptually):

$$ \ln(M_{odt}) = \underbrace{\ln(N_{ot})}_{\text{Origin Push}} + \underbrace{V_{odt}}_{\text{Utility}} - \underbrace{\ln\left(\sum_{k} e^{V_{okt}}\right)}_{\text{Multilat. Resist. } (IV_{ot})} $$

Next, we model the components using observable variables and account for unobservables. Let $\mathbf{Z}_{ot}$ be origin characteristics (push factors), $\mathbf{X}_{dt}$ be destination characteristics (pull factors), and $C_{od}$ be the cost (e.g., distance) between $o$ and $d$.

We can approximate the systematic utility $V_{odt}$ and the log origin size $\ln(N_{ot})$ as linear functions plus unobserved error terms ($u_{dt}$, $w_{ot}$):

$$ \begin{align*} V_{odt} &\approx \mathbf{X}_{dt}\boldsymbol{\beta}_X + \beta_c \ln(C_{od}) + u_{dt} \\ \ln(N_{ot}) &\approx \mathbf{Z}_{ot}\boldsymbol{\beta}_Z + w_{ot} \end{align*} $$

(Note: Cost $C_{od}$ is included in the utility term here, reflecting its impact on the destination's appeal from a specific origin.)

Substituting these back into the log-flow equation gives:

$$ \ln(M_{odt}) \approx \mathbf{Z}_{ot}\boldsymbol{\beta}_Z + \mathbf{X}_{dt}\boldsymbol{\beta}_X + \beta_c \ln(C_{od}) + \underbrace{[w_{ot} + u_{dt} - IV_{ot}]}_{\text{Unobserved Factors}} $$

(The challenge now is how to handle the combined unobserved factors, including the complex Multilateral Resistance term $IV_{ot}$.)

5. Controlling for Unobservables with Fixed Effects

A powerful way to handle time-invariant unobserved heterogeneity and the Multilateral Resistance term is by using fixed effects in a panel data setting.

Origin Fixed Effects ($\gamma_o$): Absorb all time-invariant characteristics of the origin, including the time-invariant parts of $w_{ot}$ and, crucially, the main component of the Multilateral Resistance term $IV_{ot}$ which depends heavily on the origin's relative location and accessibility.
Destination Fixed Effects ($\gamma_d$): Absorb all time-invariant characteristics of the destination, including the time-invariant parts of $u_{dt}$ (unobserved destination amenities/disamenities).
Time Fixed Effects ($\gamma_t$): Absorb common shocks that affect all migration flows in a given year (e.g., national economic cycles, policy changes).

Including these fixed effects leads to a widely used empirical gravity specification:

$$ \ln(M_{odt}) = \mathbf{Z}_{ot}\boldsymbol{\beta}_Z + \mathbf{X}_{dt}\boldsymbol{\beta}_X + \beta_c \ln(C_{od}) + \gamma_o + \gamma_d + \gamma_t + \epsilon_{odt} $$

Here, $\epsilon_{odt}$ is the remaining idiosyncratic error term. The coefficients $\boldsymbol{\beta}_Z$, $\boldsymbol{\beta}_X$, and $\beta_c$ capture the effects of time-varying origin/destination characteristics and bilateral costs, respectively, while controlling for unobserved heterogeneity via the fixed effects. (This fixed-effects specification allows for robust estimation of the drivers of migration flows by purging estimates of biases from many unobserved confounding factors - or at least that's the idea.)

6. Conclusion

Going from the Random Utility Model to the empirical Gravity Model demonstrates how micro-level behavioral assumptions about individual choices can be aggregated to derive macro-level models of migration flows. The RUM provides theoretical justification for the gravity structure, while the inclusion of fixed effects in the empirical specification allows to control for complex unobserved factors like multilateral resistance and location-specific amenities, leading to more reliable estimates of migration determinants. This framework can of course be used to a great effect in many more situations - whenever we want to model choosing from a set of choices. That being said, it's of course riddled with problems when it comes to actually checking its econometrical validity (as is everything).