Bienvenue sur PostGIS.fr

Bienvenue sur PostGIS.fr , le site de la communauté des utilisateurs francophones de PostGIS.

PostGIS ajoute le support d'objets géographique à la base de données PostgreSQL. En effet, PostGIS "spatialise" le serverur PostgreSQL, ce qui permet de l'utiliser comme une base de données SIG.

Maintenu à jour, en fonction de nos disponibilités et des diverses sorties des outils que nous testons, nous vous proposons l'ensemble de nos travaux publiés en langue française.

source: trunk/workshop-foss4g/validity.rst @ 1

Revision 1, 6.7 KB checked in by djay, 13 years ago (diff)

Initial import of the svn tree

Section 20: Validity

In 90% of the cases the answer to the question, "why is my query giving me a 'TopologyException' error" is "one or more of the inputs are invalid". Which begs the question: what does it mean to be invalid, and why should we care?

What is Validity

Validity is most important for polygons, which define bounded areas and require a good deal of structure. Lines are very simple and cannot be invalid, nor can points.

Some of the rules of polygon validity feel obvious, and others feel arbitrary (and in fact, are arbitrary).

  • Polygon rings must close.
  • Rings that define holes should be inside rings that define exterior boundaries.
  • Rings may not self-intersect (they may neither touch nor cross one another).
  • Rings may not touch other rings, except at a point.

The last two rules are in the arbitrary category. There are other ways to define polygons that are equally self-consistent but the rules above are the ones used by the :term:`OGC` :term:`SFSQL` standard that PostGIS conforms to.

System Message: ERROR/3 (<string>, line 20); backlink

Unknown interpreted text role "term".

System Message: ERROR/3 (<string>, line 20); backlink

Unknown interpreted text role "term".

The reason the rules are important is because algorithms for geometry calculations depend on consistent structure in the inputs. It is possible to build algorithms that have no structural assumptions, but those routines tend to be very slow, because the first step in any structure-free routine is to analyze the inputs and build structure into them.

Here's an example of why structure matters. This polygon is invalid:

POLYGON((0 0, 0 1, 2 1, 2 2, 1 2, 1 0, 0 0));

You can see the invalidity a little more clearly in this diagram:

./validity/figure_eight.png

The outer ring is actually a figure-eight, with a self-intersection in the middle. Note that the graphic routines successfully render the polygon fill, so that visually it is appears to be an "area": two one-unit squares, so a total area of two units of area.

Let's see what the database thinks the area of our polygon is:

SELECT ST_Area(ST_GeometryFromText('POLYGON((0 0, 0 1, 1 1, 2 1, 2 2, 1 2, 1 1, 1 0, 0 0))'));
 st_area
---------
       0

What's going on here? The algorithm that calculates area assumes that rings to not self-intersect. A well-behaved ring will always have the area that is bounded (the interior) on one side of the bounding line (it doesn't matter which side, just that it is on one side). However, in our (poorly behaved) figure-eight, the bounded area is to the right of the line for one lobe and to the left for the other. This causes the areas calculated for each lobe to cancel out (one comes out as 1, the other as -1) hence the "zero area" result.

Detecting Validity

In the previous example we had one polygon that we knew was invalid. How do we detect invalidity in a table with millions of geometries? With the :command:`ST_IsValid(geometry)` function. Used against our figure-eight, we get a quick answer:

System Message: ERROR/3 (<string>, line 54); backlink

Unknown interpreted text role "command".
SELECT ST_IsValid(ST_GeometryFromText('POLYGON((0 0, 0 1, 1 1, 2 1, 2 2, 1 2, 1 1, 1 0, 0 0))'));
f

Now we know that the feature is invalid, but we don't know why. We can use the :command:`ST_IsValidReason(geometry)` function to find out the source of the invalidity:

System Message: ERROR/3 (<string>, line 64); backlink

Unknown interpreted text role "command".
SELECT ST_IsValidReason(ST_GeometryFromText('POLYGON((0 0, 0 1, 1 1, 2 1, 2 2, 1 2, 1 1, 1 0, 0 0))'));
Self-intersection[1 1]

Note that in addition to the reason (self-intersection) the location of the invalidity (coordinate (1 1)) is also returned.

We can use the :command:`ST_IsValid(geometry)` function to test our tables too:

System Message: ERROR/3 (<string>, line 76); backlink

Unknown interpreted text role "command".
-- Find all the invalid polygons and what their problem is
SELECT name, boroname, ST_IsValidReason(the_geom)
FROM nyc_neighborhoods
WHERE NOT ST_IsValid(the_geom);
          name           |   boroname    |                     st_isvalidreason
-------------------------+---------------+-----------------------------------------------------------
 Howard Beach            | Queens        | Self-intersection[597264.083368305 4499924.54228856]
 Corona                  | Queens        | Self-intersection[595483.058764138 4513817.95350787]
 Steinway                | Queens        | Self-intersection[593545.572199759 4514735.20870587]
 Red Hook                | Brooklyn      | Self-intersection[584306.820375986 4502360.51774956]

Repairing Invalidity

First the bad news: there is no guaranteed way to fix invalid geometries. The worst case scenario is identifying them with the :command:`ST_IsValid(geometry)` function, moving them to a side table, exporting that table, and repairing them externally.

System Message: ERROR/3 (<string>, line 99); backlink

Unknown interpreted text role "command".

Here's an example of SQL to move invalid geometries out of the main table into a side table suitable for dumping to an external cleaning process.

-- Side table of invalids
CREATE TABLE nyc_neighborhoods_invalid AS
SELECT * FROM nyc_neighborhoods
WHERE NOT ST_IsValid(the_geom);
-- Remove them from the main table
DELETE FROM nyc_neighborhoods
WHERE NOT ST_IsValid(the_geom);

A good tool for visually repairing invalid geometry is OpenJump (http://openjump.org) which includes a validation routine under Tools->QA->Validate Selected Layers.

Now the good news: a large proportion of invalidities can be fixed inside the database using the :command:`ST_Buffer` function.

System Message: ERROR/3 (<string>, line 116); backlink

Unknown interpreted text role "command".

The buffer trick takes advantage of the way buffers are built: a buffered geometry is a brand new geometry, constructed by offsetting lines from the original geometry. If you offset the original lines by nothing (zero) then the new geometry will be structurally identical to the original one, but because it is built using the :term:`OGC` topology rules, it will be valid.

System Message: ERROR/3 (<string>, line 118); backlink

Unknown interpreted text role "term".

For example, here's a classic invalidity -- the "banana polygon" -- a single ring that encloses an area but bends around to touch itself, leaving a "hole" which is not actually a hole.

POLYGON((0 0, 2 0, 1 1, 2 2, 3 1, 2 0, 4 0, 4 4, 0 4, 0 0))
./validity/banana.png

Running the zero-offset buffer on the polygon returns a valid :term:`OGC` polygon, consisting of an outer and inner ring that touch at one point.

System Message: ERROR/3 (<string>, line 128); backlink

Unknown interpreted text role "term".
SELECT ST_AsText(
         ST_Buffer(
           ST_GeometryFromText('POLYGON((0 0, 2 0, 1 1, 2 2, 3 1, 2 0, 4 0, 4 4, 0 4, 0 0))'),
           0.0
         )
       );
POLYGON((0 0,0 4,4 4,4 0,2 0,0 0),(2 0,3 1,2 2,1 1,2 0))

Note

The "banana polygon" (or "inverted shell") is a case where the :term:`OGC` topology model for valid geometry and the model used internally by ESRI differ. The ESRI model considers rings that touch to be invalid, and prefers the banana form for this kind of shape. The OGC model is the reverse.

System Message: ERROR/3 (<string>, line 145); backlink

Unknown interpreted text role "term".
Note: See TracBrowser for help on using the repository browser.