Finding all enclave county subdivisions in the US

How many Metuchens are there?

If you live in New Jersey and like to look at Google Maps (as one does), you may have noticed that the town of Metuchen is completely surrounded by the neighboring town of Edison.

Map of Edison township and the borough of Metuchen
One wonders why the larger town doesn't simply eat the smaller one.

According to Wikipedia, this means Metuchen is an enclave of Edison:

An enclave is a territory that is entirely surrounded by the territory of only one other state or entity.

This got me thinking — how many more towns like this are there in the US? I wanted to walk through how I found the answer to this question. (If you only care about the answer, jump to Results).

Whenever you have map questions like this, there’s a very good chance that the US Census Bureau has the data to answer it. After poking around on their website a little bit, I found a page called “Cartographic Boundary Files”, which contains a download link for a file with mapping data for all county subdivisions1 in the country as of 2024.

Specifically, the file is a shapefile. I’d never used one before; it turns out this is a commonly used format to store geospatial data like map boundaries (which are represented as polygons) and attributes associated with those boundaries. I found a nice website called mapshaper where you can upload shapefiles and explore the data visually. After uploading the census.gov shapefile, I could see the boundaries of all towns in the US, and see various properties of each one:

Screenshot of mapshaper showing towns for the entire United States. A panel on the left shows the attributes of each town.
There's Metuchen again!

Each town has many attributes, most of which seem to be numerical identifiers used by the census bureau. For our purposes, we only care about name (NAME), county (NAMELSADCO), and state (STUSPS).

So, it seems that our task is essentially to find which polygons in a list are completely contained within other polygons in that list; in other words, a computational geometry problem. After a quick google search, I found two Python libraries that would be helpful for the task:

  • shapely: a library for manipulation and analysis of planar geometric objects. We can use this to check if one polygon is contained within another.
  • geopandas: parses shapefiles into pandas dataframes, with geometry data stored as shapely objects.

I’d never heard of either of these libraries, so I setup a Python environment with uv, then broke out opencode with Claude Opus 4.5 and told it to get to work2. Its initial code found 0 enclaves, which was obviously wrong. After a little bit of back and forth with the model it seemed to get the results I wanted. The issues it ran into were:

  • The census data’s boundaries for towns contains holes; that is, even if polygon A visually appears to fully contain polygon B, a simple shapely.contains_properly wouldn’t pass, because the hole that B occupies in A’s geometry is not considered to be part of A. Claude fixed this by creating a new function that fills in the holes of the outer polygon, and then checking if the inner polygon is contained within the new “filled” polygon.
  • Some towns are made up of multiple disconnected polygons (see for example, Belle Fourch, SD). The code needed to check if all disjoint parts of the inner polygon were contained within some part of the outer polygon, rather than just checking if the entire inner polygon was contained within the entire outer polygon.

Finally, I edited the code a little bit to include the names of the counties and states, as well as the areas of both towns in each pair. This was the final code:

import json
import geopandas as gpd
from shapely import Polygon, contains_properly
from shapely.validation import make_valid


def get_filled_parts(geom):
    """Return a list of polygons with all interior holes filled."""
    if geom.geom_type == "Polygon":
        return [Polygon(geom.exterior)]
    elif geom.geom_type == "MultiPolygon":
        return [Polygon(p.exterior) for p in geom.geoms]
    return [geom]


def get_parts(geom):
    """Return a list of all polygon parts."""
    if geom.geom_type == "Polygon":
        return [geom]
    elif geom.geom_type == "MultiPolygon":
        return list(geom.geoms)
    return [geom]


def has_holes(geom):
    """Check if geometry has interior holes."""
    if geom.geom_type == "Polygon":
        return len(list(geom.interiors)) > 0
    elif geom.geom_type == "MultiPolygon":
        return any(len(list(p.interiors)) > 0 for p in geom.geoms)
    return False


def all_parts_contained(outer_filled_parts, inner_geom):
    """Check if all parts of inner_geom are properly contained by some outer part."""
    inner_parts = get_parts(inner_geom)

    for inner_part in inner_parts:
        # Check if this inner part is properly contained by any outer filled part
        contained = False
        for outer_filled in outer_filled_parts:
            if contains_properly(outer_filled, inner_part):
                contained = True
                break
        if not contained:
            return False
    return True


def main():
    # Load the shapefile
    print("Loading shapefile...")
    gdf = gpd.read_file("cb_2024_us_cousub_500k.shp")
    print(f"Loaded {len(gdf)} county subdivisions")

    # Make geometries valid to avoid topology issues
    print("Validating geometries...")
    gdf["geometry"] = gdf["geometry"].apply(make_valid)

    # Build spatial index for efficient querying
    print("Building spatial index...")
    sindex = gdf.sindex

    # Find polygons with holes (potential containers)
    print("Finding polygons with holes...")
    has_holes_mask = gdf["geometry"].apply(has_holes)
    containers_gdf = gdf[has_holes_mask]
    print(f"Found {len(containers_gdf)} polygons with holes")

    print("Finding containment relationships...")

    results = []

    for idx, outer in containers_gdf.iterrows():
        outer_geom = outer.geometry
        outer_filled_parts = get_filled_parts(outer_geom)

        # Use spatial index to get candidate geometries that intersect
        candidates_idx = list(sindex.query(outer_geom, predicate="intersects"))

        for candidate_idx in candidates_idx:
            if candidate_idx == idx:
                continue

            inner = gdf.iloc[candidate_idx]
            inner_geom = inner.geometry

            # Check if all parts of inner are properly contained by some outer part
            if all_parts_contained(outer_filled_parts, inner_geom):
                results.append(
                    {
                        "container": {
                            "name": outer["NAME"],
                            "county": outer["NAMELSADCO"],
                            "state": outer["STUSPS"],
                            "area": outer_geom.area,
                        },
                        "contained": {
                            "name": inner["NAME"],
                            "county": inner["NAMELSADCO"],
                            "state": inner["STUSPS"],
                            "area": inner_geom.area,
                        },
                    }
                )

    print(f"\nFound {len(results)} containment relationships")

    # Write results to JSON
    with open("containers.json", "w") as f:
        json.dump(results, f, indent=2)

    print("Results written to containers.json")


if __name__ == "__main__":
    main()

Something I liked about the code that Claude generated was the use of the sindex method of the geopandas dataframe; this makes the search for candidate polygons that might contain a given polygon a lot faster. I probably would not have thought to use that method until I ran into a performance bottleneck after writing the more naive O(n^2) algorithm.

Now, I had a big JSON file to explore, but it was easy to answer my original question (and more!) with the built-in functions in nushell. So, here are some interesting bits of information.

Results

There are 1472 towns that are completely contained within another town. You can download the full list of them here3.

Only 23 states have any enclaves at all. Minnesota is the king of enclaves, with 333. Texas, Nevada, Illinois, Arkansas, and Alaska are tied for the least, with only 1 each. Enclaves seem to be most common in the midwest. Here’s the breakdown:

statecountpercentage
MN33322.62%
PA31921.67%
ND20714.06%
WI20513.93%
SD19913.52%
MI513.46%
NE352.38%
KS312.11%
OH251.70%
NJ231.56%
NY140.95%
IA80.54%
MO40.27%
LA30.20%
NM30.20%
PR30.20%
ME20.14%
VT20.14%
AK10.07%
AR10.07%
IL10.07%
NV10.07%
TX10.07%

The title for “county in the US with the most enclaves” is actually tied between two counties: York County and Westmoreland County, both in PA. Each of these counties contains 17 (!) enclaves.

A “parent” town can contain one or more “child” towns. Of the 1472 enclaves found,

  • 1277 towns contain only 1 child town
  • 74 towns contain 2 child towns
  • 10 towns contain 3 child towns
  • 1 town contains 4 child towns
  • 1 town contains 6 child towns
  • 1 town contains 7 child towns (!!)

Hempfield in Westmoreland County, PA has the most enclaves of any town, with 7 other towns contained within it: New Stanton, Youngwood, Arona, Adamsburg, Greensburg, South Greensburg, and Southwest Greensburg.

The largest enclave in terms of absolute area (of both the parent and child town) is Las Vegas, which is contained by Clark, Clark County, NV.

The largest enclave in terms of ratio of child area to parent area, is Windsor Township in Jefferson County, MO, which contains the town of Arnold, MO. Arnold is 68.3% the size of Windsor Township. The smallest enclave by the same measure is Nisland in Butte County, SD, which is contained by West Butte, SD. Nisland is only 0.02% the size of West Butte. The average child-to-parent area ratio across all enclaves is 3%, with a standard deviation of 3.9 percentage points.

My original example of Metuchen, NJ ranks #75 in terms of child-to-parent area ratio, with Metuchen being 9% the size of Edison.

The most common name for the parent town of an enclave is “Union”, which I think is kinda funny in a mathematical way; there 12 of these. The most common name for the child town of an enclave is “Adams”, which is not as funny; there are 3 of these.

The most common name for the county that an enclave occurs in is “Grant County”, which occurs 24 times. 10 of these are in WI, 6 are in SD, 4 are in ND, and 4 are in MN.

Footnotes

  1. County subdivisions are generally what you would think of as a “town” or “city”. Technically, these can be called towns, cities, municipalities, boroughs, townships, unincorporated communities, and more, depending on the state. I’m just gonna call them all “towns” for brevity’s sake.

  2. You can see the full chat session here, which I’ve edited to remove sensitive information.

  3. The area of the towns is given in terms of the coordinate system used in the shapefile. I didn’t bother to convert this into something more interpretable like sq mi, since relative area was more interesting to me.