Jul 15 2008

R is for Radiohead

Published by perrygeo under Uncategorized

Radiohead realeased their video for House of Cards yesterday. Besides being a big radiohead fan, I was also loving the LIDAR technology behind the video.

If you want to check it out yourself, there are code samples on the site as well as access to the raw data. The csv files have four columns (x, y, z, and intensity). For me the quickest way to visualize the data was through R and it’s OpenGL interface called rgl (which is a wonderful high-level 3D data visualization environment).

Assuming you have R installed, rgl is a simple add on through the CRAN repositories:

install.packages("rgl")

Then you need to load the library, load the csv, scale the intensity values from 0 to 1. Then it’s a simple rgl.points command to get an interactive 3D rendering:

library(rgl)
d < - read.csv("C:/temp/radiohead/22.csv", header=FALSE)

# scale intensity values from 0 to 1
d$int <- d[,4] / 255

# rgl.points(x,y,z,size=__,color=__)
# note y value is inverted
# color is a grayscale rgb based on intensity
rgl.points(d[,1],d[,2]*-1,d[,3], size=3, color=rgb(d$int,d$int,d$int))

That’s all it takes to render Thom Yorke in all his 3D digital glory:

2 responses so far

Jun 12 2008

Geospatial Reddit - 2 weeks later

Published by perrygeo under Uncategorized

So, despite frustrations with getting submitted URLs to appear, Geospatial Reddit is still puttering along. Not exactly a vibrant community yet but there are currently 133 subscribers. If you’re subscribed, take a minute to submit your favorite URLs. If you haven’t subscribed, check it out.

I thought 133 subscribers was a decent number until I found that the Bacon subreddit has over 500. Apparently the world would rather discuss their greasy breakfast food than maps.

5 responses so far

Jun 08 2008

Jabref - Open Source Alternative to EndNote

Published by perrygeo under LaTeX

For those of you that use EndNote to keep track of your bibliographies/references , there is an alternative. JabRef. I find the UI to be very intuitive and it has a range of customizable import/export formats. JabRef uses the BibTex format as it’s native file format so, of course, it integrates very well with LaTeX.

One of the neat features is the ability to create custom bibliographies in HTML, complete with javascript-based search capabilities. Here’s my reference list which I’ll be slowly adding to as I convert all my old text-based and EndNote reference lists over.

One response so far

May 28 2008

Posting to Geospatial Reddit

Published by perrygeo under Uncategorized

Some folks have had trouble submitting links so I figured I should post a bit more detail on that. To get articles to show up on the geospatial reddit (not the main reddit), go to http://reddit.com/r/geospatial/submit or click the “Submit a Link” button on the right - from the geospatial page. When you’re submitting the url, you should see “submit to geospatial” as the page header.

I know at least 2 of us have been successful at posting. If this doesn’t work for you, please let me know and I’ll try and figure it out.

One response so far

May 28 2008

Geospatial Reddit - A democratic solution to geo blog overload?

Published by perrygeo under Uncategorized

All the great GIS news/blog aggregators out there (planetgs, slashgeo, etc) are moderator driven - a few people act as the gatekeepers and inevitably have to decide what information is useful. This is not the ideal way to do things.

There’s a more democratic and distributed way to spread the role - it’s called reddit. More specifically, Geospatial Reddit. For those unfamiliar with reddit (or similar sites like digg), the idea is simple: users submit stories and users vote on stories. The most popular ones rise to the top and, theoretically, the best articles magically appear on the front page. Much like democracy itself, there are flaws in the theory but its the best thing we’ve got.

Geospatial Reddit is public so sign up, submit your favorite stories and vote. Lets see if we can make this work.

14 responses so far

May 25 2008

So you want to learn to learn about kriging …

Published by perrygeo under stats

Guides like Tomislav Hengl’s Practical Guide to Geostatistical Mapping of Environmental Variables and Rossiter’s Introduction to applied geostatistics do an excellent job of providing a grounded, relatively easy to understand, introduction to geostatical prediction and kriging.

But if you’re an experience learner (like me) you don’t absorb the mathematics fully without doing something with the knowledge; Seeing it in action brings the concepts to life. Unfortunately most geostats/kriging software is either too complex for exploratory learning (not enough immediate feedback) or too simplistic (making too many assumptions, disallowing access to the nitty-gritty details). Either way, you’re bound to produce output with fundamental flaws because you’re not aware of the finer details of variogram modelling. I speak from exerience!

Luckily Dennis J. J. Walvoort of the Wageningen University & Research Center saw the same problem and created an nifty learning to to explore varigoram models and spatial predictions using ordinary kriging - EZ-Kriging. No degree in math or statistical theory required. Just drag the points around, play with the parameters and alter the underlying data as a table and see the results immediately.

Its nothing more than a simulation so don’t expect to load your own datasets or produce any meaningful output with it. But it truly excels as a learning tool to understand the core concepts behind kriging and is a great complement to Hengl and Rossiter’s work. With that knowledge you can do the real deal in Surfer, R, ILWIS or your geostats software of choice.

EDIT: One complaint about this EZ-Kriging that I have: it doesn’t show the observed sample variogram cloud overlayed on the variogram model. Oh well still a nice tool.

EDIT2: It’s a windows .exe but it runs smoothly under wine in linux.

4 responses so far

May 14 2008

Ubuntu as a GIS workstation (updated for Hardy Heron)

Published by perrygeo under Linux

As a followup to my previous post on turning Ubuntu Gutsy into a GIS workstation, Here are the revised instructions for Ubuntu 8.04 (The Hardy Heron).

Note that there are a few additonal apps and changes in here:

  • Postgis
  • Mapnik
  • New version of QGIS installed via repository
  • OpenStreetMap tools (JOSM and osm2pgsql)
  • Geotiff utilities
  • Some nice python spatial libs (shapely, owslib, geopy and pyproj)

Run the following as root on your new Hardy installation, answer a few configuration questions and you’ll be ready to go.

echo 'deb http://ppa.launchpad.net/qgis/ubuntu hardy main' >> /etc/apt/sources.list

apt-get update

apt-get -y --force-yes install grass mapserver-bin \
gdal-bin cgi-mapserver python-qt4 python-sip4 python-gdal \
python-mapscript gmt gmt-coastline-data r-recommended gpsbabel \
shapelib qgis qgis-plugin-grass python-setuptools \
python-mapnik mapnik-plugins mapnik-utils osm2pgsql josm postgresql-8.3-postgis \
python-dev build-essential libgdal-dev geotiff-bin sun-java6-jre

easy_install shapely geopy owslib pyproj

EDIT: If you’re looking for more up to date packages for geos, gdal, etc, try adding deb http://les-ejk.cz/ubuntu/ hardy multiverse to your /etc/apt/sources.list

6 responses so far

Apr 21 2008

Hike of Doom #2- OGC KML

Published by perrygeo under Uncategorized

In commemoration of the OGC approval of KML as an open standard to share geographic content over the web, I’d like to share our recent “Hike of Doom #2″ (kml provided by Mark Dotson).


The first weekend to hit 90 degrees, my friends and I travel inland to dive and swim in the Santa Ynez river. It is billed as a “30 minute” hike to our favorite watering hole. It becomes much more than that.

Of course the road leading up to the trailhead is closed due to construction so we have 3 miles of hiking on pavement just to get to the former trailhead- the Red Rocks parking lot.

Then the fun begins. A decent rainy season and some dam releases make for high flows and we’ve got half-dozen major river crossings to contend with. The recent fires added a good deal of organic matter to the river and the algae has bloomed accordingly. It is a wet, hot, rocky and slimy hike.

We make it to the swimming hole and enjoy the day. We dive, laugh, have a few beers.


The sun sets and the fun really begins.

Klaus, the Bavarian cyclist whom we’d met at the swimming hole, met up with us just after my girlfriend, Joselyne, sprained her ankle on a rock. Her ankle hadn’t started to swell yet but I could tell, drawing from my basketball injuries from the past, that she was not putting weight on it any time soon. We fashioned crutches from some driftwood. We met up with some turkey hunters (dressed in more camouflage more effective than most military uniforms) who helped us out by providing us some ankle wrap.

David and Andy began the trek back to the car to get help. The rest of us could either go back via the river bed , a rocky and treacherous endeavor given the setting sun, or head up to the main road and get some help. We decided on the main road and Shaun took off to alert the others to our plans. The main fire road was a trek in the opposite direction - longer, more elevation changes but smooth enough for a bike or truck and more accessible to vehicles.

I carried Jos, over my shoulder fireman-style and/or piggy-back, over the river crossings.

On the flats, Mark and I pushed Jos on Klaus’ bike.

We pushed on up the trail until we reached the main road. Klaus, after drinking the last of our beer, biked up to the dam keeper’s residence at Gibraltar Dam while Christina, Sarah, Mark, Jos and I continued up the trail. A half-hour later, Klaus and the dam keeper arrived in a pickup and drove the rest of us back to the Red Rock “parking lot”.

But the construction and rebar on the causeway meant there was no way to cross with a normal vehicle so we went by foot. Jos got back on Klaus’ bike and we pushed.

Luckily the slight downhill grade allowed her to glide back for a good portion, graciously sparing Mark and I from permanent back injury.

Meanwhile the away team had gotten some semblance of cellular reception and attempted to call the authorities. The goal was to get a ranger truck to drive out to get us or at least unlock the gate to meet us half way at the Red Rock parking lot. The authorities response was fantastic if not a bit overzealous. By the time we had gotten within a 1/4 mile of our car, we spotted helicopters. Then a firetruck. Then an ambulance. Joselyne was coasting by on Klaus’ bike and they didn’t even stop for her on the first pass! Apparently expecting to rescue a mangled body from the wilderness, the EMTs were somewhat disappointed at the less challenging situation they faced - a girl, coasting down the road on a bike with a sprained ankle.

We were back in the car, on the road before dark and got home in time for pizza.

So what did we learn from this? Well as a Boy Scout, I am ashamed to say I wasn’t prepared. A well prepped emergency kit would have helped a lot. At least we had an LED headlamp. Some rope would have gone a long way towards making a stretcher. An instant-ice-pack, ankle wrap and some ibuprofen would have been handy. We were wet and the mercury was falling quickly; some emergency shelter and clothing would have assuaged my concerns about the nighttime chill.

But this was offset by the generosity of the many people we met for the first time - The hunters who lent us their medical supplies, the dam keeper who got up from his Sunday dinner to make sure we got back safely, the EMTs who put tremendous resources into organizing a military-scale search party, Klauss who so generously stuck with us and shared with us his bike, his wisdom and his company. Without their help and our group of friends, the story might have a less happy ending.

Never underestimate the power of human kindness, generosity and cooperation! And never believe me when I say it’s a short hike.

No responses yet

Apr 19 2008

A quick Cython introduction

Published by perrygeo under Python, Uncategorized

I love python for its beautiful code and practicality. But it’s not going to win a pure speed race with most languages. Most people think of speed and ease-of-use as polar opposites - probably because they remember the pain of writing C. Cython tries to eliminate that duality and lets you have python syntax with C data types and functions - the best of both worlds. Keeping in mind that I’m by no means an expert at this, here are my notes based on my first real experiment with Cython:

EDIT: Based on some feedback I’ve received there seems to be some confusion - Cython is for generating C extensions to Python not standalone programs. The whole point is to speed up an existing python app one function at a time. No rewriting the whole application in C or Lisp. No writing C extensions by hand. Just an easy way to get C speed and C data types into your slow python functions.


So lets say we want to make this function faster. It is the “great circle” calculation, a quick spherical trig problem to calculate distance along the earth’s surface between two points:

p1.py

import math

def great_circle(lon1,lat1,lon2,lat2):
    radius = 3956 #miles
    x = math.pi/180.0

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = math.acos((math.cos(a)*math.cos(b)) +
                  (math.sin(a)*math.sin(b)*math.cos(theta)))
    return radius*c

Lets try it out and time it over 1/2 million function calls:

import timeit  

lon1, lat1, lon2, lat2 = -72.345, 34.323, -61.823, 54.826
num = 500000

t = timeit.Timer("p1.great_circle(%f,%f,%f,%f)" % (lon1,lat1,lon2,lat2),
                       "import p1")
print "Pure python function", t.timeit(num), "sec"

About 2.2 seconds. Too slow!

Lets try a quick rewrite in Cython and see if that makes a difference:
c1.pyx

import math

def great_circle(float lon1,float lat1,float lon2,float lat2):
    cdef float radius = 3956.0
    cdef float pi = 3.14159265
    cdef float x = pi/180.0
    cdef float a,b,theta,c

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = math.acos((math.cos(a)*math.cos(b)) + (math.sin(a)*math.sin(b)*math.cos(theta)))
    return radius*c

Notice that we still import math - cython lets you mix and match python and C data types to some extent. The conversion is handled automatically though not without cost. In this example all we’ve done is define a python function, declare its input parameters to be floats, and declare a static C float data type for all the variables. It still uses the python math module to do the calcs.

Now we need to convert this to C code and compile the python extension. The best way to do this is through a setup.py distutils script. But we’ll do it the manual way for now to see what’s happening:

# this will create a c1.c file - the C source code to build a python extension
cython c1.pyx

# Compile the object file
gcc -c -fPIC -I/usr/include/python2.5/ c1.c

# Link it into a shared library
gcc -shared c1.o -o c1.so

Now you should have a c1.so (or .dll) file which can be imported in python. Lets give it a run:

    t = timeit.Timer("c1.great_circle(%f,%f,%f,%f)" % (lon1,lat1,lon2,lat2),
                     "import c1")
    print "Cython function (still using python math)", t.timeit(num), "sec"

About 1.8 seconds. Not the kind of speedup we were hoping for but its a start. The bottleneck must be in the usage of the python math module. Lets use the C standard library trig functions instead:

c2.pyx

cdef extern from "math.h":
    float cosf(float theta)
    float sinf(float theta)
    float acosf(float theta)

def great_circle(float lon1,float lat1,float lon2,float lat2):
    cdef float radius = 3956.0
    cdef float pi = 3.14159265
    cdef float x = pi/180.0
    cdef float a,b,theta,c

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = acosf((cosf(a)*cosf(b)) + (sinf(a)*sinf(b)*cosf(theta)))
    return radius*c

Instead of importing the math module, we use cdef extern which uses the C function declarations from the specified include header (in this case math.h from the C standard library). We’ve replaced the calls to some of the expensive python functions and are ready to build the new shared library and re-test:

    t = timeit.Timer("c2.great_circle(%f,%f,%f,%f)" % (lon1,lat1,lon2,lat2),
                     "import c2")
    print "Cython function (using trig function from math.h)", t.timeit(num), "sec"

Now that’s a bit more like it. 0.4 seconds - a 5x speed increase over the pure python function. What else can we do to speed things up? Well c2.great_circle() is still a python function which means that calling it incurs the overhead of the python API, constructing the argument tuple, etc. If we could write it as a pure C function, we might be able to speed things up a bit.

c3.pyx

cdef extern from "math.h":
    float cosf(float theta)
    float sinf(float theta)
    float acosf(float theta)

cdef float _great_circle(float lon1,float lat1,float lon2,float lat2):
    cdef float radius = 3956.0
    cdef float pi = 3.14159265
    cdef float x = pi/180.0
    cdef float a,b,theta,c

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = acosf((cosf(a)*cosf(b)) + (sinf(a)*sinf(b)*cosf(theta)))
    return radius*c

def great_circle(float lon1,float lat1,float lon2,float lat2,int num):
    cdef int i
    cdef float x
    for i from 0 < = i < num:
        x = _great_circle(lon1,lat1,lon2,lat2)
    return x

Notice that we still have a python function wrapper (def) which takes an extra argument, num. The looping is done inside this function with for i from 0 < = i < num: instead of the more pythonic but slower for i in range(num):. The actual work is done in a C function (cdef) which returns float type. This runs in 0.2 seconds - a 10x speed boost over the original python function.

Just to confirm that we’re doing things optimally, lets write a little app in pure C and time it:

#include <math .h>
#include <stdio .h>
#define NUM 500000

float great_circle(float lon1, float lat1, float lon2, float lat2){
    float radius = 3956.0;
    float pi = 3.14159265;
    float x = pi/180.0;
    float a,b,theta,c;

    a = (90.0-lat1)*(x);
    b = (90.0-lat2)*(x);
    theta = (lon2-lon1)*(x);
    c = acos((cos(a)*cos(b)) + (sin(a)*sin(b)*cos(theta)));
    return radius*c;
}

int main() {
    int i;
    float x;
    for (i=0; i < = NUM; i++)
        x = great_circle(-72.345, 34.323, -61.823, 54.826);
    printf("%f", x);
}

Now compile it with gcc -lm -o ctest ctest.c and test it with time ./ctest… about 0.2 seconds as well. This gives me confidence that my Cython extension is at least as efficient as my C code (which probably isn’t saying much as my C skills are weak).


Some cases will be more or less optimal for cython depending on how much looping, number-crunching and python-function-calling are slowing you down. In some cases people have reported 100 to 1000x speed boosts. For other tasks it might not be so helpful. Before going crazy rewriting your python code in Cython, keep this in mind:

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” — Donald Knuth

In other words, write your program in python first and see if it works alright. Most of the time it will… some times it will bog down. Use a profiler to find the slow functions and re-implement them in cython and you should see a quick return on investment.

Links:
WorldMill - a python module by Sean Gillies which uses Cython to provide a fast, clean python interface to the libgdal library for handling vector geospatial data.

Writing Fast Pyrex code (Pyrex is the predecessor of Cython with similar goals and syntax)

17 responses so far

Apr 15 2008

Spatial data in SQLite

Published by perrygeo under SQL, Uncategorized, postgis

Slashgeo pointed me to a very interesting set of projects - SpatiaLite and VirtualShape. They provide a spatial data engine for the sqlite database. Think of it as the PostGIS of SQLite. It looks like this extends sqlite’s spatial capabilities far beyond the sqlite OGR driver.

SpatiaLite provides many of the basic OGC Simple Features functions - transforming geometries between projections, spatial operations of bounding boxes, and some basic functions to disect, analyze and export geometries.

VirtualShape provides the really neat ability to access a shapefile using the SpatiaLite/SQlite interface without having to import a copy - it reads directly off the shapefile by exposing the shapefile and its attributes as a “virtual table”. I can think of a million uses for this. For example, lets say you have a shapefile of US counties and the number of voter in the 2004 election as an attribute in the dbf. You want to find the total voter count in each state:

$ ls -1 counties.*
counties.dbf
counties.prj
counties.shp
counties.shx
$ sqlite3 test.db
sqlite> .load 'SpatiaLite.so'
sqlite> .load 'VirtualShape.so'
sqlite> CREATE virtual table virtual_counties using VirtualShape(counties);
sqlite> select sum(voters) as total_voters, state_name
           from virtual_counties
           group by state_name
           order by total_voters desc;
9830550.0|California
7563055.0|Florida
7346779.0|Texas
...


Now this is fairly straightforward non-spatial SQL but the ability to run it against a shapfile without having to export to an intermediate data format is a very valuable tool.

Links:
When to use SQlite.
A video presentation by Richard Hipp (the author of sqlite).

6 responses so far

Next »