As announced earlier,
— Edzer Pebesma ((???)) January 20, 2017
the first openEO hackaton was held a few weeks ago. A short report follows.
OpenEO is an initiative to define an open API for accessing cloud-based Earth observation data processing engines, explained in this blog post.
The EODC had been so kind to set us up with access to a virtual machine in their data center, and this illustrates the problem with remote sensing data is that there are many and they are big. Part of the archive we looked at was found here:
xxxxxxxx@openeo1:~$ df -h /eodc/products/ Filesystem Size Used Avail Use% Mounted on XX.XXX.XX.XXX:/products 1.4P 1.3P 159T 89% /eodc/products
How big is 1.3 petabyte? I often try to relate to the latest fixed-size digital medium we used, the CD-R:
How many? Mmm.
1.3 * 1024^5 / (700 * 1024^2) ##  1994092
2 Million: a stack of 2.4 km, or 10 km when we put them in a case.
We first talked quite a while about what RESTful APIs are, how they work, their verbs (GET, PUT, POST, etc), resources, what microservices are, and how they can be standardised e.g. following the Open API initiative. Then, we tried to define a simple challenge that uses REST, and uses Earth observation data.
In the challenge, We didn’t want to work with the large amounts of data straight away. Instead, we tried to a challenge as simple as possible, namely adding two bands from a particular Sentinel-2 scene, and returning the result as a GeoTIFF.
We worked in three teams:
- team 2 worked on a Python solution,
- team 3 worked on a pure R solution.
After around 3-4 hours, all teams had managed to get a RESTfull service doing this:
team 1 had most man power, but also most work to do; the solution was in the end the fastest, because the SciDB backend is highly scalable, using 24 cores or so
We discussed quite at length what it will take to realize the OpenEO ambition. Adding two bands is easy, handling large collections of scenes is not. My respect for what the people from Google Earth Engine have realized has grown, their award from ASPRS is more than deserved. Nevertheless, it is isolated, closed source, practically impossible to verify.
We also drafted service calls, discussed coordinate reference systems of tile/scene collections, and looked at covjson. And one of the most attractive ideas (for some of us): to submit your Python or R function to the imagery server, and have it work, in parallel, over the scenes, returning the requested summary.
Certain terms (scene, tile, pixel) reoccurred many times, seemingly stretching across the different used back-ends but there were slight differences - a huge challenge for an API intending to be useful to many. Discussing the API design, we struggled with the user perspective: is the user the analyst, or the developer? Who do we accommodate with the used terms?