“What is this you call property?”, asked Massasoit, the leader of the Native American Wampanoag tribe. “It cannot be the earth, for the land is our mother, nourishing all her children, beasts, birds, fish and all men. The woods, the streams, everything on it belongs to everybody and is for the use of all. How can one man say it belongs only to him?”

Good question, Massasoit. Yet, due to a tragic combination of the pathogenic bacteria Leptospira and aggressive colonists the answer became irrelevant and the concept of land ownership proliferated through the majestic lands of the new world like a virus.

Today, the dust has settled and the iron horse has carried the white man to the west coast, where I currently reside. As I explored the wonderful city of Los Angeles I began to wonder to whom, exactly do I owe the pleasure of my environment? Who “owns” the dirt I stand on? So I did some research.

The “United States” is divided into 3,144 counties and county equivalents. Of these, Los Angeles county is the most populous, with over 10 million residents. The least populous, Loving County, Texas has only 82. Funny story, in 2006 a group of Libertarians attempted to buy up land and seize power in Loving County with the goal of establishing their ideals, but were thwarted by the local sheriff. The group is currently featured on a “Wanted” poster in the county’s sole courthouse.

LA County has an area of 4,751 mi2, divided across 88 cities and 2,379,680 parcels. However, much of the land is “unincorporated”, meaning it does not fall within the jurisdiction of an established city. If you would like to establish your own city in LA County you can apply to the LAFCO for as little as $2,5001. The information regarding parcel owner, location, and “assessed value” for collecting property taxes is maintained by the Assessor’s Office2.

Formats and Tools

Most, if not all, counties use GIS (geographic information systems) to maintain this data3. The LA office uses Microsoft Access for ownership and assessed value information, and the popular Shapefile format for geometry and mapping. ESRI (environmental systems research institute), founded in 1969, dominates land-use consulting with their popular ArcGIS software and Shapefile format developed in the early 1990s. A Shapefile consists of several different files, 3 of which are mandatory:

.shp – feature geometry as a set of either WKT (well known text) of WKB (well known binary) coordinates. Each of these entries can be one of several different simple datatypes such as

POINT (30 10)

LINESTRING (30 10, 10 30, 40 40)

POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))

MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))

.shx – index of positional geometry to allow quickly stepping forward and backward

.dbf – old school simple database format popular in the 1990s, here stores attributes for each shape

There are several optional files, the most important of which though is

.prj – represents the projection information of the coordinates in the shapes. More on this soon.

I performed extensive cleaning and simplification on the assessor’s office data as part of this analysis, the bulk of which was done with PostgreSQL and the fantastic PostGIS extension4.

If you want to follow along, say with an EC2 instance, first grab some dependencies.

sudo apt-get -y install postgresql postgresql-contrib postgis postgresql-9.3-postgis-2.1-scripts

Now let’s create a database for our geospatial data

createdb gis psql -d gis -c 'create extension postgis'

Projections

The earth is not a perfect sphere. We represent it instead as a “geoid”, a mathematical object that, ideally, represents the precise shape of the earth if it were only under the influence of gravitation and rotation. While imperfect, the geoid, combined with satellite data provides a somewhat close approximation to the actual shape of the earth. The geoid works in tandem with different “datum”, which are coordinate systems used by regions to define a coordinate system consistent with the geoid. Today, improvement to the model and coordinate systems has led to the possibility of a single global standard for the globe, WGS84, that is gaining in popularity. Still, datums are typically more precise when defined only for a single region.

The datum used by the LA Assessor’s office is NAD1983. The naming convention originated with the first North American survey in 1901, based on an ellipsoid geoid model developed in 1866. The system was updated in 1927 based on surveys of the entire continent but using the same geoid, and updated again in 1983 using satellite and remote sensing data using GRS 80 as the geoid, the same model originally used by the popular global standard WGS84. If it sounds simple, it is not, but if you’re interested it’s a great reason to learn spherical harmonics.

Just remember a datum is a coordinate system defined on a geoid, which is a model of the earth. Geoids and datums exist for other planets too, like Mars.

I re-projected the assessor’s office data from NAD1983 to WGS84 using QGIS. All projections have a corresponding SRID (spatial reference system identifier). Let’s load the shapefile into a PostGIS table, making sure to tell it the projection WGS84, which has an SRID of 43265. You can download it from me.

wget http://dwur9qzdkvp67.cloudfront.net/la_parcels.tar.xz tar -xvf la_parcels.tar.xz shp2pgsql -I -s 4326 -g geom la_parcels.shp la_parcels | psql -d gis > import.log

Basic Queries

Next let’s get an SQL prompt

psql -d gis

And run a simple query. This should improve performance a bit.

vacuum analyze;

Now let’s have some fun. What are the most expensive pieces of land in LA County?

select land_value, owner_name, address_number, street_name from la_parcels order by land_value desc limit 10;

land_value owner_name address_number street_name 252,479,578 CATALINA MEDIA DEVELOPMENT II 3000 ALAMEDA AVE 183,110,154 MOBIL OIL CORP 3700 190TH ST 154,291,320 BH WILSHIRE INTERNATIONAL LLC 9900 WILSHIRE BLVD 137,295,210 BP WEST COAST PRODUCTS LLC 1801 SEPULVEDA BLVD 136,879,311 WARNER BROS ENTERTAINMENT INC 4000 WARNER BLVD 135,354,000 NEXT CENTURY ASSOCIATES LLC 2025 AVENUE OF THE STARS 132,214,693 UNIVERSAL STUDIOS LLC 3900 LANKERSHIM BLVD 129,077,263 TWENTIETH CENTURY FOX FILM CORP 10201 PICO BLVD 126,062,515 UNIVERSAL STUDIOS LLC 3900 LANKERSHIM BLVD 122,666,461 TISHMAN SPEYER ARCHSTONE SMITH 3600 BARHAM BLVD

Entertainment and oil companies dominate here. That is land only though. I wonder which of these has the most expensive “improvement”, or building? Let’s use a nested query.

select improvement_value, owner_name, address_number, street_name from (select land_value, improvement_value, owner_name, address_number, street_name from la_parcels order by land_value desc limit 10) as lands order by improvement_value desc;

20th Century Fox wins with $265 million. Exxon Mobil’s sprawling refinery is assessed at only $19 million. Someone must be trying hard to keep property taxes low. Okay what about the most expensive properties overall? Combining land and building value?

select (land_value+improvement_value) as total_value, owner_name, address_number, street_name from la_parcels order by total_value desc limit 20;

total_value owner_name address_number street_name 628,970,661 CHILDREN HOSPITAL OF LOS ANGELES 4550 SUNSET BLVD 550,125,487 KAISER FOUNDATION HOSPITALS 9343 IMPERIAL HWY 523,472,588 CEDARS SINAI MEDICAL CENTER 8720 ALDEN DR 521,206,676 TRS OF THE J PAUL GETTY TRUST 199 CHURCH LANE 515,115,546 TRS OF THE J PAUL GETTY TRUST 1200 GETTY CENTER DR 511,156,285 TRS OF THE J PAUL GETTY TRUST 0 511,156,285 TRS OF THE J PAUL GETTY TRUST 0 478,152,504 CENTURY CITY MALL LLC 10250 SANTA MONICA BLVD 477,297,278 HAY,DOROTHY L DECD EST OF AND 121 LA CIENEGA BLVD 475,058,030 ANHEUSER BUSCH INC 15800 ROSCOE BLVD 466,751,222 TRIZEC 333 LA LLC 333 HOPE ST 439,548,987 MARANGI,LEONARD M ETAL TRS LESSR 100 CONGRESS ST 439,000,000 COMMUNITY REDEVELOPMENT AGENCY 350 GRAND AVE 420,500,000 WILSHIRE COURTYARD LP 5700 WILSHIRE BLVD 397,617,220 DISNEY,WALT PRODUCTIONS INC 500 BUENA VISTA ST 396,907,059 CEDARS SINAI MEDICAL CENTER 127 SAN VICENTE BLVD 394,172,461 TWENTIETH CENTURY FOX FILM CORP 10201 PICO BLVD 376,000,000 2121 AVENUE OF THE STARS LLC 2121 AVENUE OF THE STARS 364,457,522 1999 STARS LLC 1999 AVENUE OF THE STARS 361,003,213 UNIVERSAL STUDIOS LLC 3900 LANKERSHIM BLVD

Hospitals monopolize the top spots. Healthcare is expensive. No surprise to see the magnificent Getty Center either. I wonder if the multiple entries are redundant or it’s worth $2 billion. Wouldn’t be surprised either way. Did you know it’s free? Free! Unlike the hospital.

Let’s find another landmark. How about Dodger Stadium? We’ll use a forgiving string compare to make sure we match the street.

select (land_value+improvement_value) as total_value, owner_name, address_number, street_name from la_parcels where address_number = 1000 and street_name ilike 'elysian park%'

total_value owner_name address_number street_name 84,409,139 “REALCO INTERMEDIARY LLC” 1000 “ELYSIAN PARK AVE”

Dodgers stadium must be worth more than $84 million. How do the assessed values compare to real world values? Let’s use One Wilshire as an example. It sold in 2013 for $437.5 million and its assessed value is $297.5 million. Not too far off.

Now let’s use the aggregation function sum() and group by to find the most expensive cities by area in LA County. Since the city column is still a bit messy we’ll use having to eliminate the outliers.

select city, price_per_area from (select sum(area) as area, sum(land_value) as land_value, count(ain) as parcels, city, (sum(land_value) / sum(area)) as price_per_area from la_parcels group by city having count(ain) > 1000 order by price_per_area desc) as cities

city price_per_area total_value MANHATTAN BEACH 110.6230954 12516203385 BEVERLY HILLS 82.13910857 22623901990 HERMOSA BEACH 60.45970815 5267669536 PALOS VERDES EST 47.59576314 4671096274 W HOLLYWOOD 44.57732453 2080465255 SANTA MONICA 43.00346533 26202224215 SAN MARINO 42.27213826 4967172261 LA CANADA FLT 31.25619651 3794539133 EL SEGUNDO 21.44013791 5831487943 BURBANK 21.05596602 17013420158 SOUTH PASADENA 20.63481846 3601284180 MONTROSE 20.56928866 560769462

I still want an answer to my original question. Who owns the most land?

select owner_name, count(ain), sum(land_value)+sum(improvement_value) as holding, sum(area) as lands from la_parcels group by owner_name order by lands desc limit 10

owner_name parcel_count holding_value holding_area U S GOVT 2695 406477416 33837253322 STATE OF CALIF 1387 271712949 2551166380 L A CITY 5498 1123805464 1586934789 SANTA CATALINA ISLAND 79 22222591 1504028479 L A COUNTY 1768 603224059 870795789 TEJON RANCH CO 59 5411765 703066562 L A CITY DEPT OF WATER AND POWER 2340 212220963 627091955 NEWHALL LAND AND FARMING CO 415 199329086 618742553 MOUNTAINS RECREATION AND 622 70849023 487140373 L A CO FLOOD CONTROL DIST 3916 54042362 448175685

By area huge swaths of the county is controlled by the federal and state government, with a few agriculture companies such as Tejon Ranch and Newhall Land and Farming Company sprinkled in.

More Advanced Queries

I wonder what percentage of LA County is not held by one of the 10 entities above or is unincorporated? Here we will use a view, which allows you to treat a query like its own table.

create view top_owners as select owner_name from (select owner_name, count(ain), sum(land_value)+sum(improvement_value) as holding, sum(area) as lands from la_parcels group by owner_name order by lands desc limit 10) as owners

Now in order to select the lands these guys own we’ll use a join statement, specifying where the view and table intersect. Join by default is inner, meaning we’ll only get rows where the field matches. Now we can succinctly find the parcels owned by these entities.

select count(*) from la_parcels join top_owners on la_parcels.owner_name = top_owners.owner_name

And finally use a union operation, which can combine multiple select statements into a single column.

select sum(area) from la_parcels join top_owners on la_parcels.owner_name = top_owners.owner_name union select sum(area) from la_parcels

43234395887 142503798431

About 30%. And finally what % of the land in the city of LA is devoted to public space? I think this is an important metric for any city.

select sum(area) from la_parcels where owner_name ilike 'l a city' or owner_name ilike 'l a city park' union select sum(area) from la_parcels where city ilike 'los angeles'

1621548676 20341577058

8%. Not bad. Griffith park, Elysian park, MacArthur park, Runyon Canyon, Grand park, Vista Hermosa, LA has some fantastic public spaces. The largest green areas are owned by the federal or state government and are outside the city, though not terribly far. Due to lack of Zoning Code standardization it is difficult to get a good picture of what the lands are used for.

Spatial Queries

At last, we unleash PostGIS. Note that since we are using WGS84 our results will be in latitude and longitude rather than meters as above.

First it’s good to know what we’re working with.

select distinct GeometryType(geom) from la_parcels

The geom field is exclusively MULTIPOLYGON. If we want to work with simpler shapes we can unroll them with ST_Dumps() in to the POLYGON type.

Alright I wonder where is the geographic center of LA County? We’ll use ST_Extent() to roll up all of our geometries in to a bounding box, and find the X and Y coordinates of its center with ST_Centroid().

select ST_Y(ST_Centroid(ST_Extent(geom))), ST_X(ST_Centroid(ST_Extent(geom))) from la_parcels

latitude longitude 33.80924996626 -118.29553231211

This result is from a simple box around our shapes. That is not very rigorous. Instead we should roll up all of our shapes together and form a “convex hull”, the minimum geometry that encloses them, and find the centroid of that. Let’s find how far the center of Malibu is from that point. Here we use a geometry constructor. the true in the ST_Distance() function gets us the distance across the geoid rather than straight through.

select ST_Distance(malibu_center, ST_MakePoint(-118.29553231211, 33.80924996626, 1)) from (select ST_Centroid(ST_ConvexHull(ST_Collect(geom))) as malibu_center from la_parcels where city ilike 'malibu') as malibu

0.505358312522112 is our answer. That’s quite a drive. Especially in traffic.

Last but not least, I took the liberty of creating a web interface to this dataset. Moving around the whole dataset would be inefficient, so I use google maps and javascript to ask a minimal flask application, which in turn asks the database which parcels are within the bounding box of the map’s current view. The @ operator finds the geometries inside the envelope I build. The core query is simply:

select distinct ain, land_value + improvement_value as total_value, land_value, improvement_value, owner_name, year_built, address_number, street_name, city, state, zip_code, zoning_code, area, perimeter, ST_AsGeoJSON(ST_MakeValid(geom)), ST_AsGeoJSON(ST_Centroid(ST_MakeValid(geom))) from la_parcels where geom @ ST_MakeEnvelope(%s,%s,%s,%s)

The bounding box of the map is passed to the %s parameters. A few of the geometries are invalid so ST_MakeValid() helps us out there. The query is amazingly fast, around 12ms. Drawing on the map is the slow part. Google Maps has gotten too complicated. But it still works pretty well as long as you don’t zoom out too far and avoid residential areas. The american dream of individual home ownership is slowing down my app. Hopefully I can speed it up but for now you can play around with it here. Click on the geometry to see the value. Some buildings are broken up in to many individual parcels in three dimensions. That is the difference between an apartment and a condominium. In a condo, you own the parcel from the county.

Conclusion

I was disappointed with how difficult it was to obtain this data initially and the poor quality it came in. Governments and constituencies of all sizes stand to benefit enormously from investment in modern software tools and stronger commitments to transparency.

While building I began to dream of having all the parcels of the United States in a single database. That would be a fascinating study, but the data is horribly spread about and fragmented. If you are interested in obtaining the data for your county, or another, and structuring it in to the same schema I would be very grateful, and promise to share the collected information. You can track the completeness of what has been gathered for California here. Please contact me if interested in contributing.

What if we had the data for other nations too? Could we put the entire world in a computer?

References

[1] http://lalafco.org/Forms/Application%20Form12-11%27.pdf

[2] http://assessor.lacounty.gov/extranet/default.aspx

[3] to the pedants that still insist on using “these data” give it a rest, it’s confusing to most people

[4] http://postgis.net/

[5] http://spatialreference.org/ref/epsg/wgs-84/

[6] http://postgis.net/docs/manual-1.3/ch06.html

Do you love databases? Soylent is hiring a Chief Database Architect.