More

Converting huge multipolygon to polygons

Converting huge multipolygon to polygons


I have a shapefile with some huge multipolygons, with 100.000's of parts. What would be the easiest way to split them to singlepart polygons? I'm looking for something like QGIS ”Multipart to singlepart” function, but the file is to large for QGIS to handle. I'm guessing that there is probably already some Python module that can do it for me. Any tips?


Shapefiles have no type MultiPolygon (type = Polygon), but they support them anyway (all rings are stored in one polygon = list of polygons, look at GDAL: ESRI Shapefile)

It is easier with Fiona and Shapely:

import fiona from shapely.geometry import shape, mapping # open the original MultiPolygon file with fiona.open('multipolygons.shp') as source: # create the new file: the driver and crs are the same # for the schema the geometry type is "Polygon" instead output_schema = dict(source.schema) # make an independant copy output_schema['geometry'] = "Polygon" with fiona.open('output.shp', 'w', driver=source.driver, crs=source.crs, schema=output_schema) as output: # read the input file for multi in source: # extract each Polygon feature for poly in shape(multi['geometry']): # write the Polygon feature output.write({ 'properties': multi['properties'], 'geometry': mapping(poly) })

from GDAL mailing list using python

import os from osgeo import ogr def multipoly2poly(in_lyr, out_lyr): for in_feat in in_lyr: geom = in_feat.GetGeometryRef() if geom.GetGeometryName() == 'MULTIPOLYGON': for geom_part in geom: addPolygon(geom_part.ExportToWkb(), out_lyr) else: addPolygon(geom.ExportToWkb(), out_lyr) def addPolygon(simplePolygon, out_lyr): featureDefn = out_lyr.GetLayerDefn() polygon = ogr.CreateGeometryFromWkb(simplePolygon) out_feat = ogr.Feature(featureDefn) out_feat.SetGeometry(polygon) out_lyr.CreateFeature(out_feat) print 'Polygon added.' from osgeo import gdal gdal.UseExceptions() driver = ogr.GetDriverByName('ESRI Shapefile') in_ds = driver.Open('data/multipoly.shp', 0) in_lyr = in_ds.GetLayer() outputshp = 'data/poly.shp' if os.path.exists(outputshp): driver.DeleteDataSource(outputshp) out_ds = driver.CreateDataSource(outputshp) out_lyr = out_ds.CreateLayer('poly', geom_type=ogr.wkbPolygon) multipoly2poly(in_lyr, out_lyr)

Ogr2ogr large gdb file to SQLite warning: "organizePolygons() received a polygon with more than 100 parts"

I’m trying to write a .gdb file (could also be any other format, geojson, gpkg, etc.) to a Spatialite DB but it takes awhile and I am getting a warning.

I know SQLite has low default settings for transactions and memory which is why I added the -gt unlimited and --config OGR_SQLITE_CACHE 1024 options but they dont have any affect on the speed of the command. With those 2 options removed it takes the same amount of time.

Is there a command in ogr2ogr I can use to make this run faster, or maybe I should switch to geopandas and use some shapely function to solve this?

Thanks any help is appreciated!

One Answer

The error means that your source data contains a polygon with more than 100 parts. I believe that in this context parts mean rings, both outer and inner ones. Rather good explanation can be found from https://gdal.org/doxygen/classOGRGeometryFactory.html#a9ce97d39ca2e0dd88f415227e71b7fd5

Organize polygons based on geometries. Analyse a set of rings (passed as simple polygons), and based on a geometric analysis convert them into a polygon with inner rings, (or a MultiPolygon if dealing with more than one polygon) that follow the OGC Simple Feature specification. All the input geometries must be OGRPolygon/OGRCurvePolygon with only a valid exterior ring (at least 4 points) and no interior rings. The passed in geometries become the responsibility of the method, but the papoPolygons "pointer array" remains owned by the caller. For faster computation, a polygon is considered to be inside another one if a single point of its external ring is included into the other one. (unless 'OGR_DEBUG_ORGANIZE_POLYGONS' configuration option is set to TRUE. In that case, a slower algorithm that tests exact topological relationships is used if GEOS is available.) In cases where a big number of polygons is passed to this function, the default processing may be really slow. You can skip the processing by adding METHOD=SKIP to the option list (the result of the function will be a multi-polygon with all polygons as toplevel polygons) or only make it analyze counterclockwise polygons by adding METHOD=ONLY_CCW to the option list if you can assume that the outline of holes is counterclockwise defined (this is the convention for example in shapefiles, Personal Geodatabases or File Geodatabases). For FileGDB, in most cases, but not always, a faster method than ONLY_CCW can be used. It is CCW_INNER_JUST_AFTER_CW_OUTER. When using it, inner rings are assumed to be counterclockwise oriented, and following immediately the outer ring (clockwise oriented) that they belong to. If that assumption is not met, an inner ring could be attached to the wrong outer ring, so this method must be used with care. If the OGR_ORGANIZE_POLYGONS configuration option is defined, its value will override the value of the METHOD option of papszOptions (useful to modify the behavior of the shapefile driver)

I am not sure if I understood right how to control which method is used for analysing the rings but I would try with this command:

Write a mail to gdal-dev list if this command does not make any difference or throws some error.


Every spatial data comes with a Coordinate reference system (CRS in short) that describes how the data are represented on the surface of the earth. An in-depth definition of CRS is beyond the scope of this post, here we will just see how to get CRS information from sf object and how to transform them into new reference system.

CRS are defined in sf by their epsg code, for instance the classical GPS/WGS84 CRS has the 4326 code, let's explore this:

When working with spatial data it is key to make sure that the CRS are adequate, for instance computing areas or distances make more sense in projected CRS rather than in geographic CRS. Also, when doing spatial operations on several spatial objects all the CRS have to be identical.


Simple features

There are a crap ton of packages for R that allow you to interact with shapefiles and spatial data. Here we will focus on a modern package for reading and transforming spatial data in a tidy format. Simple features or simple feature access refers to a formal standard that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects. It also describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them.

The standard is widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. A subset of simple features forms the GeoJSON standard.

R has well-supported classes for storing spatial data ( sp ) and interfacing to the above mentioned environments ( rgdal , rgeos ), but has so far lacked a complete implementation of simple features, making conversions at times convoluted, inefficient or incomplete. The sf package tries to fill this gap.


Ytdjtkui

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

Finitely generated matrix groups whose eigenvalues are all algebraic

What Exploit Are These User Agents Trying to Use?

Machine learning testing data

How to prevent "they're falling in love" trope

What is required to make GPS signals available indoors?

Does the Idaho Potato Commission associate potato skins with healthy eating?

Convert seconds to minutes

Car headlights in a world without electricity

How to compactly explain secondary and tertiary characters without resorting to stereotypes?

What is the most common color to indicate the input-field is disabled?

Is there a hemisphere-neutral way of specifying a season?

Knowledge-based authentication using Domain-driven Design in C#

Bullying boss launched a smear campaign and made me unemployable

OP Amp not amplifying audio signal

Processor speed limited at 0.4 Ghz

Unlock My Phone! February 2018

How to show a landlord what we have in savings?

What does the same-ish mean?

What is the fastest integer factorization to break RSA?

How do I exit BASH while loop using modulus operator?

Mathematica command that allows it to read my intentions

Do creatures with a speed 0ft., fly 30ft. (hover) ever touch the ground?

What do you call someone who asks many questions?

Spatial overlays : Union between multiline and multipolygon with geopandas

Find pairs of near features with geopandas, fiona, GDAL etcfrom geometrycollection to polygon geometry with pythonDifference between lines and polygons using geopandasgeopandas dissolve versus postGIS postgresql aggregated groupby dissolveKnown Intersecting Polygons returning false for .intersects() in geopandasFilter a GeoPandas dataframe for points within a specific countryBuffer with dissolve - Geopandas - unary_union multipolygonOverlay Union Geopandas improve performanceWhy is Union in ArcMap much faster than other approaches?How to create a shapefile [polygon type] from a Geodataframe, returned from a Oracle Spatial cursor with geometry column type=cx_Oracle.LOB?

I try to overlays (apply union) my multipolygon on the multiline with GeoPandas, but it seem like the GeoPandas overlay function work only with (Multi)polygon. Is there any way to make that with multiline and multipolygon with GeoPandas. Here is my code :

What output are you expecting overlaying polygons with lines? Split lines with polygon attributes?

Yeah exactly, that what i want to do, to split lines with polygon attributes

Can you hard-code some sample geometries using shapely objects?

Yes, i think i can dp that

I try to overlays (apply union) my multipolygon on the multiline with GeoPandas, but it seem like the GeoPandas overlay function work only with (Multi)polygon. Is there any way to make that with multiline and multipolygon with GeoPandas. Here is my code :

What output are you expecting overlaying polygons with lines? Split lines with polygon attributes?

Yeah exactly, that what i want to do, to split lines with polygon attributes

Can you hard-code some sample geometries using shapely objects?

Yes, i think i can dp that

I try to overlays (apply union) my multipolygon on the multiline with GeoPandas, but it seem like the GeoPandas overlay function work only with (Multi)polygon. Is there any way to make that with multiline and multipolygon with GeoPandas. Here is my code :

I try to overlays (apply union) my multipolygon on the multiline with GeoPandas, but it seem like the GeoPandas overlay function work only with (Multi)polygon. Is there any way to make that with multiline and multipolygon with GeoPandas. Here is my code :

What output are you expecting overlaying polygons with lines? Split lines with polygon attributes?

Yeah exactly, that what i want to do, to split lines with polygon attributes

Can you hard-code some sample geometries using shapely objects?

Yes, i think i can dp that

What output are you expecting overlaying polygons with lines? Split lines with polygon attributes?

Yeah exactly, that what i want to do, to split lines with polygon attributes

Can you hard-code some sample geometries using shapely objects?

Yes, i think i can dp that

What output are you expecting overlaying polygons with lines? Split lines with polygon attributes?

What output are you expecting overlaying polygons with lines? Split lines with polygon attributes?

Yeah exactly, that what i want to do, to split lines with polygon attributes

Yeah exactly, that what i want to do, to split lines with polygon attributes

Can you hard-code some sample geometries using shapely objects?

Can you hard-code some sample geometries using shapely objects?

Yes, i think i can dp that

Yes, i think i can dp that


3. Working with raster data¶

Unlike vector data (geometric objects: like points, lines, polygons), raster data is a matrix of values of “pixels” (also called “cells”). Each cell represents a small area and contains a value representing some information:

Raster data is like digital image data you look at on your computer, except that now, each pixel represents a spatial region. The “resolution” of a raster is the area that each pixel represents. A 1 meter resolution raster means that each pixel represents a 1 m x 1 m area on the ground. However, when we say “high resolution” we often mean, a low value of resolution for each pixel, i.e., 1 meter resolution is higher than 8 meter resolution as exmplified by the image below:

Like vector data, there are different file formats for storing raster data. The most common is GeoTIFF ( .tif ), which is essentially an image file with georeferencing information embedded within it. Raster data is used for a variety of problems, common examples include satellite imagery and digital elevation models. Those things are gettings a bit outside the scope of this course but let’s briefly look at some raster data below. The core packge for working with raster data in Python is rasterio.

I have a satellite image raster file of part of UBC in my data folder which I downloaded from the Abacus Data Network. Let’s load it in with rasterio :

We can start to investigate things like the width and height (in pixels/cells) of the raster:

Raster data often have “bands” representing different information (for example, a colour image usually has red, green, and blue bands). This particular satellite image has 4 bands (in order: red, blue, green, infrared):

We could import the first band as a numpy array using:

But before importing anymore data, it’s helpful to just see the image. First I’m going to “downsample” my raster (reduce the resolution by increasing the cell size) to reduce the size of the data and speed things up a bit:

As our data is just numpy array(s), we can plot it with the matplotlib function plt.imshow() :

Of course, it looks more realistic using all channels:

rasterio has lots of advanced fucntionality for manipulating and plotting raster data if you find the need to. Check out the documentation.


Test Driven Development (TDD)

Having identified the reference implementation (Turf.js’s pointsWithinPolygon function) and our two inputs to the function, next I wrote a small TypeScript module to run in Node.js and call Turf.js and print out the results. I saved these to geojson files into tests/fixtures/natural-earth/ . Now there are 6 test cases which we will benchmark against later. Simple and Complex polygons, and 3 sizes of Point collections (10, 100 and 1,000).

Here is an excerpt of one of the unit tests in Rust. Notice the pattern match at the end:


Hfrhyu

Is there a kind of relay only consumes power when switching?

Is there hard evidence that the grant peer review system performs significantly better than random?

Can the Great Weapon Master feat's "Power Attack" apply to attacks from the Spiritual Weapon spell?

Did Deadpool rescue all of the X-Force?

How fail-safe is nr as stop bytes?

How can I reduce the gap between left and right of cdot with a macro?

The code below, is it ill-formed NDR or is it well formed?

How do I use the new nonlinear finite element in Mathematica 12 for this equation?

Why does it sometimes sound good to play a grace note as a lead in to a note in a melody?

What is the topology associated with the algebras for the ultrafilter monad?

How often does castling occur in grandmaster games?

Why wasn't DOSKEY integrated with COMMAND.COM?

Why do we need to use the builder design pattern when we can do the same thing with setters?

What's the meaning of "fortified infraction restraint"?

How come Sam didn't become Lord of Horn Hill?

AppleTVs create a chatty alternate WiFi network

Question about debouncing - delay of state change

What would you call this weird metallic apparatus that allows you to lift people?

What is a fractional matching?

Why is it faster to reheat something than it is to cook it?

How to react to hostile behavior from a senior developer?

Can anything be seen from the center of the Boötes void? How dark would it be?

Quickly find all polygons that overlap with two or more polygons in Shapely

Unicorn Meta Zoo #1: Why another podcast?How to find the intersection areas of overlapping buffer zones in single shapefile?Find and list all polygons that overlap with another polygonHow to convert small overlaping polygon features to inner ring in QGISShapely (geos) crashes during unary_unionDissolving polygons based on attributes with Python (shapely, fiona)?Summing attribute values for areas where multiple polygons overlap using QGIS?ArcGIS find all polygons that intersect with more that one polygon from another layerClipping algorithm that can make one layer with no overlapping polygons from N polygonsMultipolygon created from Scratch LayerWhen does Shapely's “polygonize_full” detect a dangle?Cannot find polygons that are inside a big polygon using GeoPandas

I have a set of Polygons (in Shapely) that I wish to find all intersections that are comprised of overlaps of two or more polygons. I have been working based off the code found in this answer. I have a code so far that works in the if 1: statement below, but it takes a very long time to run. The code that is in the if 0: statement, runs quickly, but produces an output with some errors. See below for an example of the error.

Below, the orange represents all the shapes that I would like to check if they are produced (by unary union in Shapely) by two or more overlaps, the cyan represents the output of the if-statement that works (if 1 in above code), and the pink represents the incorrect selected polygons by the if-statement that doesn't work (if 0 in above code). The top left corners of the text boxes in the pictures represent that point being selected, and as you can see the second picture, the pink area only has one polygon from the STEREO_OBSERVATIONS (which is the set of polygons that are being checked for overlap) so it should not be selected. By contrast, the left adjacent cyan square to the pink square highlighted in the third picture has two polygons from STEREO_OBSERVATIONS and as such is correctly selected by the cyan result. The layering of the polygon layers is shown in the first picture.

What is the best way to query the STRtree structure so as to check to intersection or overlap? I have tried to use the centroid in the code, but that hasn't seemed to produce the correct output.

Fig. 1 (Legend/layering)

Fig 2. (Incorrect selection by fast algorithm [if 0 in code] in pink)

Fig 3. (Correct selection by slow algorithm [if 1 in code]


R as GIS for Economists

By spatial join, we mean spatial operations that involve all of the following:

  • overlay one spatial layer (target layer) onto another spatial layer (source layer)
  • for each of the observation in the target layer
    • identify which objects in the source layer it geographically intersects (or a different topological relation) with
    • extract values associated with the intersecting objects in the source layer (and summarize if necessary),
    • assign the extracted value to the object in the target layer

    For economists, this is probably the most common motivation for using GIS software, with the ultimate goal being to include the spatially joined variables as covariates in regression analysis.

    We can classify spatial join into four categories by the type of the underlying spatial objects:

    • vector-vector: vector data (target) against vector data (source)
    • vector-raster: vector data (target) against raster data (source)
    • raster-vector: raster data (target) against vector data (source)
    • raster-raster: raster data (target) against raster data (source)

    Among the four, our focus here is the first case. The second case will be discussed in Chapter 5. We will not cover the third and fourth cases in this course because it is almost always the case that our target data is a vector data (e.g., city or farm fields as points, political boundaries as polygons, etc).

    Category 1 can be further broken down into different sub categories depending on the type of spatial object (point, line, and polygon). Here, we will ignore any spatial joins that involve lines. This is because objects represented by lines are rarely observation units in econometric analysis nor the source data from which we will extract values. 64 Here is the list of the types of spatial joins we will learn.

    1. points (target) against polygons (source)
    2. polygons (target) against points (source)
    3. polygons (target) against polygons (source)

    3.3.1 Case 1: points (target) vs polygons (source)

    Case 1, for each of the observations (points) in the target data, finds which polygon in the source file it intersects, and then assign the value associated with the polygon to the point 65 . In order to achieve this, we can use the st_join() function, whose syntax is as follows:

    Similar to spatial subsetting, the default topological relation is st_intersects() 66 .

    We use the Kansas irrigation well data (points) and Kansas county boundary data (polygons) for a demonstration. Our goal is to assign the county-level corn price information from the Kansas county data to wells. First let me create and add a fake county-level corn price variable to the Kansas county data.

    Here is the map of Kansas counties color-differentiated by fake corn price (Figure 3.16):

    Figure 3.16: Map of county-level fake corn price

    For this particular context, the following code will do the job:

    You can see from Figure 3.17 below that all the wells inside the same county have the same corn price value.

    Figure 3.17: Map of wells color-differentiated by corn price

    3.3.2 Case 2: polygons (target) vs points (source)

    Case 2, for each of the observations (polygons) in the target data, find which observations (points) in the source file it intersects, and then assign the values associated with the points to the polygon. We use the same function: st_join() 67 .

    Suppose you are now interested in county-level analysis and you would like to get county-level total groundwater pumping. The target file is KS_counties , and the source file is KS_wells .

    As you can see in the resulting dataset, all the unique polygon - point intersecting combinations comprise the observations. For each of the polygons, you will have as many observations as the number of wells that intersect with the polygon. Once you join the two layers, you can find statistics by polygon (county here). Since we want groundwater extraction by county, the following does the job.

    Of course, it is just as easy to get other types of statistics by simply modifying the summarize() part.

    However, this two-step process can actually be done in one step using aggregate() , in which you specify how you want to aggregate with the FUN option as follows:

    Notice that the mean() function was applied to all the columns in KS_wells , including site id number. So, you might want to select variables you want to join before you apply the aggregate() function like this:

    3.3.3 Case 3: polygons (target) vs polygons (source)

    For this case, st_join(target_sf, source_sf) will return all the unique intersecting polygon-polygon combinations with the information of the polygon from source_sf attached.

    We will use county-level corn acres in Iowa in 2018 from USDA NASS 68 and Hydrologic Units 69 Our objective here is to find corn acres by HUC units based on the county-level corn acres data 70 .

    We first import the Iowa corn acre data:

    Here is the map of Iowa counties color-differentiated by corn acres (Figure 3.18):

    Figure 3.18: Map of Iowa counties color-differentiated by corn planted acreage

    Now import the HUC units data:

    Here is the map of HUC units (Figure 3.19):

    Figure 3.19: Map of HUC units that intersect with Iowa state boundary

    Here is a map of Iowa counties with HUC units superimposed on top (Figure 3.20):

    Figure 3.20: Map of HUC units superimposed on the counties in Iowas

    Spatial joining will produce the following.

    Each of the intersecting HUC-county combinations becomes an observation with its resulting geometry the same as the geometry of the HUC unit. To see this, let’s take a look at one of the HUC units.

    The HUC unit with HUC_CODE ==10170203 intersects with four County.

    Figure 3.21 shows the map of the four observations.

    Figure 3.21: Map of the HUC unit

    So, all of the four observations have identical geometry, which is the geometry of the HUC unit, meaning that the st_join() did not leave the information about the nature of the intersection of the HUC unit and the four counties. Again, remember that the default option is st_intersects() , which checks whether spatial objects intersect or not, nothing more. If you are just calculating the simple average of corn acres ignoring the degree of spatial overlaps, this is just fine. However, if you would like to calculate area-weighted average, you do not have sufficient information. You will see how to find area-weighted average below.

    Note that we did not extract any attribute values of the railroads in Chapter 1.4. We just calculated the travel length of the railroads, meaning that the geometry of railroads themselves were of interest instead of values associated with the railroads.↩︎

    You can see a practical example of this case in action in Demonstration 1 of Chapter 1.↩︎

    While it is unlikely you will face the need to change the topological relation, you could do so using the join option.↩︎

    You can see a practical example of this case in action in Demonstration 2 of Chapter 1.↩︎

    See Chapter 9.1 for how to download Quick Stats data from within R.↩︎

    See here for an explanation of what they are. You do not really need to know what HUC units are to understand what’s done in this section.↩︎

    Yes, there will be substantial measurement errors as the source polygons (corn acres by county) are large relative to the target polygons (HUC units). But, this serves as a good illustration of a polygon-polygon join.↩︎


    Watch the video: Εισαγωγή - Εξαγωγή σημείων στο Autocad με το DelSurvey