Mathare Building import
From Map Kibera
(→Generate the OSM data to be imported) |
(→Generate the OSM data to be imported) |
||
Line 332: | Line 332: | ||
* integer values strangely starts with empty spaces. | * integer values strangely starts with empty spaces. | ||
Those can be fixed with sed: | Those can be fixed with sed: | ||
+ | $ cat > shapefile.sed << EOF | ||
s/building:t/building:type/g | s/building:t/building:type/g | ||
s/health_fac/health_facility/g | s/health_fac/health_facility/g | ||
s/motor_vehi/motor_vehicle/g | s/motor_vehi/motor_vehicle/g | ||
s/v=" *\([0-9]*\)"/v="\1"/g | s/v=" *\([0-9]*\)"/v="\1"/g | ||
+ | EOF | ||
+ | |||
+ | $ sed -f shapefile.sed mathare_building_import.osm > mathare_building_import_clean.osm |
Revision as of 09:36, 30 May 2011
In this page, I (Sébastien) will describe how to import the data of the building extraction of Mathare that we've got from AAAS. See Primož's blog for the initial analysis.
Changing the SRS
The data was made available as shapefiles in WGS84 Web Mercator Auxiliary Sphere projection, which is known as Web Mercator, spherical Mercator, Google Mercator and is referenced by various ids: EPSG:900913, EPSG:3785 (obsolete), EPSG:3857, ESRI WKID 102100... In QGIS, it's known as Google Mercator, EPSG:900913, ID 3644.
Before working with the data, we need to change the Spatial Reference System (SRS) to be compatible with OSM, that is moving to non-projected WGS84.
For some reason QGIS doesn't recognize the CRS of our shapefile and thus is unable to translate to another CRS. This can be fixed by manually setting the CRS to "Google Mercartor": menu Vector -> Data Management Tools -> Define current projection. Then export to a new projection: menu Vector -> Data Management Tools -> Export to new projection. Then choose the output CRS to be WGS84, EPSG:4326, ID 3452.
The two files that we're going to import are GroundTruthed-wgs84.shp and BuildingsOnTop-wgs84.shp
Setup PostGIS database
PostGIS is relatively easy to install on Ubuntu. I might describe it further later. Anyway, the current environment is PostgreSQL 8.4, PostGIS 1.5.
I found this script from GeoDjango very convenient to create a postgis db template:
$ ./create_template_postgis-debian.sh $ createdb -T template_postgis mathare
Import the shapefiles to the db
Each shapefile will be imported a separate table with shp2pgsql:
shp2pgsql -s 4326 -c -g the_geom -I -S -W UTF-8 -N skip Groundtruthed-wgs84.shp | psql mathare shp2pgsql -s 4326 -c -g the_geom -I -S -W UTF-8 -N skip BuildingsOnTop-wgs84.shp | psql mathare
Import OSM data to the db
In this case, we're only interested in buildings from OSM.
Get fresh data from OSM, keep only ways tagged with 'building' and their associated ways:
osmosis \ --read-api left=36.8425655 right=36.881876 top=-1.2480512 bottom=-1.2696752999999998 outPipe.0=1 \ --way-key keyList=building inPipe.0=1 outPipe.0=2 \ --used-node inPipe.0=2 outPipe.0=3 \ --write-xml file=mathare_building.osm inPipe.0=3
Import it to the database:
osm2pgsql -c -d mathare -l -s mathare_building.osm
The following tables will be created:
- planet_osm_line
- planet_osm_nodes
- planet_osm_point
- planet_osm_polygon
- planet_osm_rels
- planet_osm_roads
- planet_osm_ways
Tagging scheme
Every geometry imported will be tagged with:
source=AAAS_satellite_extraction_for_Mathare_2011
Shapefile column | OSM tags |
---|---|
id | ignored, since not used in shapefiles |
designated | see table below |
type=* | building:type=* |
name=* | name=* |
Shapefile Designated=* | OSM tags |
---|---|
Toilet | building=yes amenity=toilet |
NGO/CBO | building=office office=ngo |
Administration | building=office office=government |
CBO | building=office office=ngo |
Self-Help Group | building=yes note=Self-Help Group fixme=Find an appriopriate tag for "Self help group" |
Medical Clinic | building=yes amenity=hospital health_facility:type=medical_clinic |
Business | building=commercial |
Petrol Station | building=yes amenity=fuel motor_vehicle=yes |
House | building=residential |
Apartment Building | building=apartments |
Hardware Shop | building=commercial shop=hardware |
Church | building=church amenity=place_of_worship religion=christian |
Bar | building=yes amenity=bar |
Bathroom | building=yes amenity=shower |
Private Toilet | building=yes amenity=toilet access=private fixme=mathare preset scheme |
Garage | building=garage |
Public Toilet | building=yes amenity=toilet access=public |
Market Place |
building=yes |
Cyber Cafe | building=yes amenity=cyber_cafe |
Keys:
- access
- amenity
- building
- building:type
- fixme
- health_facility:type
- motor_vehicle
- name
- note
- office
- religion
- shop
- source
Preparing the database for the new data
DROP TABLE IF EXISTS mathare_building_import; CREATE TABLE mathare_building_import ( shp_gid integer, osm_id integer, layer integer, "access" text, "amenity" text, "building" text, "building:type" text, "fixme" text, "health_facility:type" text, "motor_vehicle" text, "name" text, "note" text, "office" text, "religion" text, "shop" text, "source" text, the_geom geometry, CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2), CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POLYGON'::text OR the_geom IS NULL), CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 4326) ) WITH ( OIDS=FALSE ); CREATE INDEX build_the_geom_gist ON mathare_building_import USING gist(the_geom);
Processing the data
INSERT INTO mathare_building_import ( shp_gid, "building", "building:type", "access", "amenity", "fixme", "health_facility:type", "motor_vehicle", "name", "note", "office", "religion", "shop", "source", the_geom ) SELECT g.gid, ( CASE WHEN g.designated IN ('NGO/CBO', 'Administration', 'CBO') THEN 'office' WHEN g.designated IN ('Business', 'Hardware Shop' ) THEN 'commercial' WHEN g.designated = 'House' THEN 'residential' WHEN g.designated = 'Apartment Building' THEN 'apartments' WHEN g.designated = 'Church' THEN 'church' WHEN g.designated = 'Garage' THEN 'garage' ELSE 'yes' END ) as building, lower(replace(g.type, ' ', '_')) as building_type, ( CASE WHEN g.designated = 'Public Toilet' THEN 'public' WHEN g.designated = 'Private Toilet' THEN 'private' ELSE NULL END ) as access, ( CASE WHEN g.designated IN ('Toilet', 'Private Toilet', 'Public Toilet') THEN 'toilet' WHEN g.designated = 'Petrol Station' THEN 'fuel' WHEN g.designated = 'Medical Clinic' THEN 'hospital' WHEN g.designated = 'Church' THEN 'place_of_worship' WHEN g.designated = 'Bar' THEN 'bar' WHEN g.designated = 'Bathroom' THEN 'shower' WHEN g.designated = 'Cyber Cafe' THEN 'cyber_cafe' ELSE NULL END ) as amenity, ( CASE WHEN g.designated = 'Self-Help Group' THEN 'Find an appriopriate tag for "Self help group"' WHEN g.designated IN ('Public Toilet', 'Private Toilet', 'Toilet', 'Bathroom') THEN 'MapMathare WATSAN-preset scheme missing' ELSE NULL END ) as fixme, ( CASE WHEN g.designated = 'Medical Clinic' THEN 'medical_clinic' ELSE NULL END ) as health, ( CASE WHEN g.designated = 'Petrol Station' THEN 'yes' ELSE NULL END ) as motor_vehicle, g.name as name, ( CASE WHEN g.designated = 'Self-Help Group' THEN 'Self-Help Group' ELSE NULL END ) as note, ( CASE WHEN g.designated = 'Administration' THEN 'government' WHEN g.designated in ( 'NGO/CBO', 'CBO') THEN 'ngo' ELSE NULL END ) as office, ( CASE WHEN g.designated = 'Church' THEN 'christian' ELSE NULL END ) as religion, ( CASE WHEN g.designated = 'Hardware Shop' THEN 'hardware' ELSE NULL END) as shop, 'AAAS building extraction from satellite imagery "MM128982" for Mathare 2011' as source, g.the_geom FROM "groundtruthed-wgs84" as g;
Find collisions between OSM and the import candidate
Very few buildings have already been mapped in OSM in Mathare (139 as of 2011.05.09). Yet, we should be careful whilst importing the dataset from aerial imagery extraction. While we can discard the geometry of the existing buildings, we shall preserve their attributes.
Basically, we shall check for geometries that overlaps and apply the attributes of the OSM element to the imported one.
Note: I can see two tricky corner cases, which I don't need to worry about right now since this import does not feature them. However, overlooking these issues in another import will likely result in a loss of data:
- nodes of the buildings bear attributes
- multipolygons
We can find the overlapping elements with this query:
mathare#= select g.gid, o.osm_id, from "groundtruthed-wgs84" as g, planet_osm_polygon as o where ST_Intersects(g.the_geom, o.way) ORDER BY g.gid;
This script removes the columns, which are empty:
#!/usr/bin/python import psycopg2 conn = psycopg2.connect("dbname=mathare") cur= conn.cursor() cur2 = conn.cursor() cur3 = conn.cursor() cur.execute("select attname from pg_attribute, pg_type where typname='planet_osm_polygon' and typrelid=attrelid ;") for record in cur: column = record[0] print column query = cur2.mogrify("SELECT count(*) from planet_osm_polygon WHERE \"" + column + "\" IS NOT NULL" ) cur2.execute(query) for c in cur2: if c[0] == 0L: query = "ALTER TABLE planet_osm_polygon DROP COLUMN \"" + column + "\" CASCADE;" print query cur3.execute(query) print repr(cur3.statusmessage) conn.commit() cur.close() cur2.close() cur3.close() conn.close()
It turns out only one of the pre existing buildings that overlap carries information that should be kept:
mathare=# SELECT DISTINCT m.shp_gid, o.osm_id, o.name, o.amenity, o.religion FROM planet_osm_polygon AS o, mathare_building_import AS m WHERE (o.name IS NOT NULL OR o.religion IS NOT NULL or o.amenity IS NOT NULL) AND ST_Intersects(m.the_geom, o.way); shp_gid | osm_id | name | amenity | religion ---------+----------+--------------------------------+------------------+----------- 517 | 96051625 | St. John's & St. Paul's Church | place_of_worship | christian (1 row)
Buildings on top of others
Some buildings sit on top of others, basically tin shacks built on the roof of apartment buildings.
Set the layer of the buildings that are below:
mathare=# UPDATE mathare_building_import SET layer = 0 WHERE shp_gid IN ( SELECT m.shp_gid from mathare_building_import AS m, "buildingsontop-wgs84" AS bot WHERE ST_Intersects(m.the_geom, bot.the_geom) ) RETURNING layer;
then import the buildings that are on top:
mathare=# INSERT INTO mathare_building_import (the_geom, layer, "building:type", name, building ) SELECT the_geom, 1, lower(replace(type, ' ', '_')), name, ( CASE WHEN designated = 'House' THEN 'residential' ELSE 'yes' END ) FROM "buildingsontop-wgs84";
Note: it turns out that the original shapefile contained duplicate. They can be found:
mathare=# SELECT DISTINCT m1.shp_gid, m2.shp_gid FROM mathare_building_import AS m1, mathare_building_import AS m2 WHERE m1.the_geom = m2.the_geom AND m1.shp_gid <> m2.shp_gid;
Generate the OSM data to be imported
To my knowledge, there's no tool to convert data from a postgis base to an OSM file.
So, we'll first export the data to a shapefile
pgsql2shp -g the_geom -k -f mathare_building_import.shp mathare
and then convert it to an OSM file with ogr2osm.py:
ogr2osm.py mathare_building_import.shp
The shapefile format has some limitations:
- the attribute name is limited to 10 characters, which is a problem for keys "building:type", 'health_fac', 'motor_vehi'.
- integer values strangely starts with empty spaces.
Those can be fixed with sed:
$ cat > shapefile.sed << EOF s/building:t/building:type/g s/health_fac/health_facility/g s/motor_vehi/motor_vehicle/g s/v=" *\([0-9]*\)"/v="\1"/g EOF $ sed -f shapefile.sed mathare_building_import.osm > mathare_building_import_clean.osm