Finding all enclave county subdivisions in the US
How many Metuchens are there?
If you live in New Jersey and like to look at Google Maps (as one does), you may have noticed that the town of Metuchen is completely surrounded by the neighboring town of Edison.
According to Wikipedia, this means Metuchen is an enclave of Edison:
An enclave is a territory that is entirely surrounded by the territory of only one other state or entity.
This got me thinking — how many more towns like this are there in the US? I wanted to walk through how I found the answer to this question. (If you only care about the answer, jump to Results).
Whenever you have map questions like this, there’s a very good chance that the US Census Bureau has the data to answer it. After poking around on their website a little bit, I found a page called “Cartographic Boundary Files”, which contains a download link for a file with mapping data for all county subdivisions1 in the country as of 2024.
Specifically, the file is a shapefile. I’d never used one before; it turns out this is a commonly used format to store geospatial data like map boundaries (which are represented as polygons) and attributes associated with those boundaries. I found a nice website called mapshaper where you can upload shapefiles and explore the data visually. After uploading the census.gov shapefile, I could see the boundaries of all towns in the US, and see various properties of each one:
Each town has many attributes, most of which seem to be numerical identifiers
used by the census bureau. For our purposes, we only care about name (NAME),
county (NAMELSADCO), and state (STUSPS).
So, it seems that our task is essentially to find which polygons in a list are completely contained within other polygons in that list; in other words, a computational geometry problem. After a quick google search, I found two Python libraries that would be helpful for the task:
shapely: a library for manipulation and analysis of planar geometric objects. We can use this to check if one polygon is contained within another.geopandas: parses shapefiles into pandas dataframes, with geometry data stored asshapelyobjects.
I’d never heard of either of these libraries, so I setup a Python environment with uv, then broke out opencode with Claude Opus 4.5 and told it to get to work2. Its initial code found 0 enclaves, which was obviously wrong. After a little bit of back and forth with the model it seemed to get the results I wanted. The issues it ran into were:
- The census data’s boundaries for towns contains holes; that is, even if
polygon
Avisually appears to fully contain polygonB, a simpleshapely.contains_properlywouldn’t pass, because the hole that B occupies in A’s geometry is not considered to be part of A. Claude fixed this by creating a new function that fills in the holes of the outer polygon, and then checking if the inner polygon is contained within the new “filled” polygon. - Some towns are made up of multiple disconnected polygons (see for example, Belle Fourch, SD). The code needed to check if all disjoint parts of the inner polygon were contained within some part of the outer polygon, rather than just checking if the entire inner polygon was contained within the entire outer polygon.
Finally, I edited the code a little bit to include the names of the counties and states, as well as the areas of both towns in each pair. This was the final code:
import json
import geopandas as gpd
from shapely import Polygon, contains_properly
from shapely.validation import make_valid
def get_filled_parts(geom):
"""Return a list of polygons with all interior holes filled."""
if geom.geom_type == "Polygon":
return [Polygon(geom.exterior)]
elif geom.geom_type == "MultiPolygon":
return [Polygon(p.exterior) for p in geom.geoms]
return [geom]
def get_parts(geom):
"""Return a list of all polygon parts."""
if geom.geom_type == "Polygon":
return [geom]
elif geom.geom_type == "MultiPolygon":
return list(geom.geoms)
return [geom]
def has_holes(geom):
"""Check if geometry has interior holes."""
if geom.geom_type == "Polygon":
return len(list(geom.interiors)) > 0
elif geom.geom_type == "MultiPolygon":
return any(len(list(p.interiors)) > 0 for p in geom.geoms)
return False
def all_parts_contained(outer_filled_parts, inner_geom):
"""Check if all parts of inner_geom are properly contained by some outer part."""
inner_parts = get_parts(inner_geom)
for inner_part in inner_parts:
# Check if this inner part is properly contained by any outer filled part
contained = False
for outer_filled in outer_filled_parts:
if contains_properly(outer_filled, inner_part):
contained = True
break
if not contained:
return False
return True
def main():
# Load the shapefile
print("Loading shapefile...")
gdf = gpd.read_file("cb_2024_us_cousub_500k.shp")
print(f"Loaded {len(gdf)} county subdivisions")
# Make geometries valid to avoid topology issues
print("Validating geometries...")
gdf["geometry"] = gdf["geometry"].apply(make_valid)
# Build spatial index for efficient querying
print("Building spatial index...")
sindex = gdf.sindex
# Find polygons with holes (potential containers)
print("Finding polygons with holes...")
has_holes_mask = gdf["geometry"].apply(has_holes)
containers_gdf = gdf[has_holes_mask]
print(f"Found {len(containers_gdf)} polygons with holes")
print("Finding containment relationships...")
results = []
for idx, outer in containers_gdf.iterrows():
outer_geom = outer.geometry
outer_filled_parts = get_filled_parts(outer_geom)
# Use spatial index to get candidate geometries that intersect
candidates_idx = list(sindex.query(outer_geom, predicate="intersects"))
for candidate_idx in candidates_idx:
if candidate_idx == idx:
continue
inner = gdf.iloc[candidate_idx]
inner_geom = inner.geometry
# Check if all parts of inner are properly contained by some outer part
if all_parts_contained(outer_filled_parts, inner_geom):
results.append(
{
"container": {
"name": outer["NAME"],
"county": outer["NAMELSADCO"],
"state": outer["STUSPS"],
"area": outer_geom.area,
},
"contained": {
"name": inner["NAME"],
"county": inner["NAMELSADCO"],
"state": inner["STUSPS"],
"area": inner_geom.area,
},
}
)
print(f"\nFound {len(results)} containment relationships")
# Write results to JSON
with open("containers.json", "w") as f:
json.dump(results, f, indent=2)
print("Results written to containers.json")
if __name__ == "__main__":
main()
Something I liked about the code that Claude generated was the use of the
sindex
method
of the geopandas dataframe; this makes the search for candidate polygons that
might contain a given polygon a lot faster. I probably would not have thought
to use that method until I ran into a performance bottleneck after writing the
more naive O(n^2) algorithm.
Now, I had a big JSON file to explore, but it was easy to answer my original question (and more!) with the built-in functions in nushell. So, here are some interesting bits of information.
Results
There are 1472 towns that are completely contained within another town. You can download the full list of them here3.
Only 23 states have any enclaves at all. Minnesota is the king of enclaves, with 333. Texas, Nevada, Illinois, Arkansas, and Alaska are tied for the least, with only 1 each. Enclaves seem to be most common in the midwest. Here’s the breakdown:
| state | count | percentage |
|---|---|---|
| MN | 333 | 22.62% |
| PA | 319 | 21.67% |
| ND | 207 | 14.06% |
| WI | 205 | 13.93% |
| SD | 199 | 13.52% |
| MI | 51 | 3.46% |
| NE | 35 | 2.38% |
| KS | 31 | 2.11% |
| OH | 25 | 1.70% |
| NJ | 23 | 1.56% |
| NY | 14 | 0.95% |
| IA | 8 | 0.54% |
| MO | 4 | 0.27% |
| LA | 3 | 0.20% |
| NM | 3 | 0.20% |
| PR | 3 | 0.20% |
| ME | 2 | 0.14% |
| VT | 2 | 0.14% |
| AK | 1 | 0.07% |
| AR | 1 | 0.07% |
| IL | 1 | 0.07% |
| NV | 1 | 0.07% |
| TX | 1 | 0.07% |
The title for “county in the US with the most enclaves” is actually tied between two counties: York County and Westmoreland County, both in PA. Each of these counties contains 17 (!) enclaves.
A “parent” town can contain one or more “child” towns. Of the 1472 enclaves found,
- 1277 towns contain only 1 child town
- 74 towns contain 2 child towns
- 10 towns contain 3 child towns
- 1 town contains 4 child towns
- 1 town contains 6 child towns
- 1 town contains 7 child towns (!!)
Hempfield in Westmoreland County, PA has the most enclaves of any town, with 7 other towns contained within it: New Stanton, Youngwood, Arona, Adamsburg, Greensburg, South Greensburg, and Southwest Greensburg.
The largest enclave in terms of absolute area (of both the parent and child town) is Las Vegas, which is contained by Clark, Clark County, NV.
The largest enclave in terms of ratio of child area to parent area, is Windsor Township in Jefferson County, MO, which contains the town of Arnold, MO. Arnold is 68.3% the size of Windsor Township. The smallest enclave by the same measure is Nisland in Butte County, SD, which is contained by West Butte, SD. Nisland is only 0.02% the size of West Butte. The average child-to-parent area ratio across all enclaves is 3%, with a standard deviation of 3.9 percentage points.
My original example of Metuchen, NJ ranks #75 in terms of child-to-parent area ratio, with Metuchen being 9% the size of Edison.
The most common name for the parent town of an enclave is “Union”, which I think is kinda funny in a mathematical way; there 12 of these. The most common name for the child town of an enclave is “Adams”, which is not as funny; there are 3 of these.
The most common name for the county that an enclave occurs in is “Grant County”, which occurs 24 times. 10 of these are in WI, 6 are in SD, 4 are in ND, and 4 are in MN.
Footnotes
-
County subdivisions are generally what you would think of as a “town” or “city”. Technically, these can be called towns, cities, municipalities, boroughs, townships, unincorporated communities, and more, depending on the state. I’m just gonna call them all “towns” for brevity’s sake. ↩
-
You can see the full chat session here, which I’ve edited to remove sensitive information. ↩
-
The area of the towns is given in terms of the coordinate system used in the shapefile. I didn’t bother to convert this into something more interpretable like sq mi, since relative area was more interesting to me. ↩