Buzzards Bay Project NEP logo

Buzzards Bay National Estuary Program

Joining Assessors Billing Records to GIS Parcel Data for Watershed Nitrogen Loading

One of the foundations in developing sound nitrogen management strategies for coastal waters is the development of good estimates of watershed loadings based on a series of reasonable loading assumptions and loading coefficients that can be applied to summaries of land use or tables of parcel data. The technical basis of these watershed loadings coefficients is certainly critical, and we discuss this issue on our N-loading assumptions web page.

The application of these loading coefficients to land use data is equally important, and these assumptions are sometimes an under evaluated component of watershed nitrogen loading analyses. Many of the most important sources of nitrogen loading depend upon the successful incorporation of municipal parcel data into Geographic Information System (GIS) data sets. Therefore, a key element in the development of nitrogen loading estimates is to ensure municipal data such as sewer and water account information, were effectively and successfully joined to the GIS data. This is an important task, because joining municipal Assessor or billing department information to GIS data sets is not always straightforward, and can be complicated by the fact that Assessors data or other billing departments may lack common parcel identifiers, be stored in different database systems or software, and or have inconsistencies or errors that result in the introduction of errors into GIS data sets that are difficult for inexperienced data reviewers to detect. The successful joining of these data sets is especially important for estimating onsite wastewater loading (septic systems), which in turn can contribute 50% or more of controllable nitrogen loadings in any given watershed.

It is often the case that Assessors offices and other town departments maintain their own databases, and that these databases are seldom compared or integrated, and where they have been integrated and with GIS coverages the joining may be imperfect, and create artifacts in the data. This webpage describes the most common GIS dataset errors, and how to detect and correct them.

Some of the issues described on this page can be remedied by the adoption parcel database standards such as those described detail on the MassGIS website, on their Parcel Level III Standards. While the adoption of such standards can minimize the problems described here, ultimately the organization and completeness of assessor and water department databases will define the extent of database processing required for a nitrogen loading analysis. In the end, all GIS databases developed for the purposes of nitrogen loading analyses must be visually validated by the developer of the GIS database and corroborated by local officials utilizing the rich mapping capabilities of GIS software.

Joining Assessors Records to GIS data: The two most significant problems

The most significant errors in parcel loading analysis are introduced by the fact that parcel maps held by assessors (as purportedly recorded at the county deeds office), may not match assessors billing records on a one-to-one basis. This occurs primarily for two reasons.

First, often after down zoning (e.g. increasing minimum lot sizes), lots under common ownership may be functionally combined to create a single buildable lot. If the municipality did not require that adjacent lots be combined at the county deeds office, or if the property owner did not choose to voluntarily merge these multiple lots at the Deeds office, these properties will remain separate lots on Assessor maps.

The second cause of mismatches between the Assessor parcel maps and billing occur when multiple billing accounts apply to a single parcel. This occurs when a parcel contains condominiums or time shares.

Coincidentally, both the process of joining one billing record to multiple parcels and the process of joining multiple billing records to a single parcel results in an overestimate in house counts (and septic systems) in a watershed as illustrated in the figure below showing some parcels in Acushnet where one house is built on paired lots.

In this figure from the Town of Acushnet, the three single family homes shown all had one assessors billing record (outlined in red) that listed two parcels for each home (outlined in green), billed as a single entity. The shapefile in this case was populated with data in the assessor's database, resulting in the creation of fictitious building structures (yellow numbers) in the GIS databases. The extent that these problems occur depends on how the Assessor's database is constructed. In some towns the common parcels may have the same account number but contain different use codes (one parcel may be labeled as a single family (101) and the other parcel labeled unbuildable (132). The extent of this problem in any GIS dataset should always be evaluated by labeling polygons by a particular field and overlaying the data onto an aerial map (such as number of buildings in this case).

One Billing Record Joined to Multiple Parcels Replicates Assessors Data

Many Assessors offices do not create a separate record for every parcel, and in the case of merged parcels may select a specific map-lot number to enter as the primary billing parcel, and list either in a comment field or in an "extra lot" field, the identity of the merged lots. However, these fields may not be complete, accurate, lack delimiters, or use hyphens to indicate a range of parcels merged. Thus, the actual relationship between the "master" parcel and the "child" parcels are not completely defined.

Even where towns have a comprehensive geodatabase, with good QA controls, artifacts can be created during export of data into shapefiles. For example, the town of Plymouth has an ArcGIS assessor's database (a CAMA file) that contains roughly 27,000 records. However, the GIS parcel coverage (based on Assessors maps) contains more than 32,000 parcels. This geodatabase is properly constructed and ties all the "extra lots" to the proper Assessor's record. However, if this geodatabase is exported into a shapefile that preserves all parcels, and all the parcels under common ownership are populated with identical information such use code, buildings, and other fields. This result is illustrated by the figure below. This problem becomes especially apparent when certain fields are enumerated. For example, the 2009 Plymouth Assessors database indicates there are 27,012 buildings in the town. However, the GIS shapefile exported from this database suggests incorrectly that there are 32,260 buildings in the town.

To avoid this problem, when using shapefile databases for nitrogen loading calculations, in ArcGIS, parcels should be dissolved by Assessors account number (CAMA number, Vision ID, etc). If manipulating the dbf files in Excel or other spreadsheet program, the same result can be achieved by using a pivot table function, and summarizing the data by Assessor account number. Depending upon on how well the GIS shapefile and Assessor's data were initially joined, other database artifacts can also be created with this process.

Corollary Problem: Multiple Assessors Records linked to single GIS parcel creates duplicate parcels

The creation of duplicate parcels in a GIS coverage can occur when multiple Assessors records are joined to a single GIS parcel. One way this situation occurs is when a parcel that contains condominiums (which may exist in the GIS coverage as building size parcels) are surrounded by a parcel that is in common ownership. During a join, this common ownership parcel is replicated by the number of co-owners. Parcels may be also duplicated by other editing and database manipulation actions.

The problem of parcel duplication is particularly insidious because they are not apparent unless they are looked for. This is because the duplicate parcels are identical, they cannot be found by magnifying the view, and parcel selection by mouse will typically only select the top most duplicate. They are, however, easy to find by several techniques. One quick test is to calculate the area of a coverage dissolved into a single-part coverage, and compare this area to the total area of the original parcels. An excellent example of this problem is illustrated by the parcel coverage for the Town of Plymouth posted on the MassGIS website (as of 2010). The area of the parcels in this coverage total a remarkable 2,442 square miles, despite the fact that the Town of Plymouth only covers 102.8 square miles. The extra area was caused by the duplication of Myles Standish State Forest parcels and other parcels, due to multiple associations for individual parcels in the Assessor's database.

There are many ways to find the duplicate parcels, and there are even discussions and scripts posted on the ESRI.com website. One process is to create a LOC_ID for each parcel based on and x and y centroid for each parcel (integer values are fine). Create a new field labeled "COUNT", and fill this field with a value of '1' . Use the Dissolve toolbox command, dissolving by the LOC_ID field. In the statistics field of this command, select the sum of the value "COUNT". Any parcel where COUNT>1 contains duplicate parcels, and this information can be used to adjust the original shapefile and database.

In the figure on the left from Fairhaven's GIS database, the red-outlined parcel occurs 27 times, stacked on top of one another. A face value analysis of the database might suggest that each of the 27 condominium units (contained in the 5 buildings shown) were each located on an individual 7.5-acre lot. It is also possible to find coverages like the one on the right, where buildings are digitized for each condo unit (sometimes with multiple owners). In some cases the outer shared parcel is duplicated in the GIS database, sometimes with information identical to the individual units.



The extent of these problems vary from town to town, but in the analysis of large watersheds like the Acushnet River and Wareham River watersheds, thousands of fictitious parcels could be counted if these issues are not addressed.