In order to better integrate my blog with my website, better manage comment spam, and reduce my dependence on Google, this blog has moved to http://www.deborahfitchett.com/blog/. In order to avoid broken links I won't be deleting content from here, but no new content will be added, so please update your bookmarks and feeds.

Tuesday 2 July 2013

Design patterns for lab #labpatterns; Research cloud - #nzes

A pattern language for organising laboratory knowledge on the web #labpatterns
Cameron McLean, Mark Gahegan & Fabiana Kubke, The University of Auckland
Google Site

Lots of lab data hard to find/reuse - big consequences for efficiency, reproducibility, quality.
Want to help researchers locate, understand, reuse, and design data. Particularly focused on describing semantics of the experiment (rather than semantics of data).

Design pattern concept originated in field of architecture. Describes a solution to a problem in a context. Interested in idea of forces - essential/invariant concepts in a domain.

Kitchen recipe as example of design pattern for lab protocol.
What are recurring features and elements? Forces for a cake-like structure include: structure providers (flour), structure modifier (egg), flavours; aeration and heat transfer.

Apply this to lab science, in a linked science setting. Take a "Photons alive" design pattern (using light to virtualise biological processes in an animal). See example paper. Can take a sentence re methodology and annotate eg "imaging" as diagnostic procedure. This using current ontologies gives you the What but not the Why. Need to tag with a "Force" concept eg "immobilisation". Deeper understanding of process - with role of steps. And can start thinking about what other methods of immobilisation there may be.

So how can we make these patterns? Need to use semantic web methods.
A wiki for lab semantics. (Wants to implement this.) Semantic form on wiki - a template. Wiki serves for attribution, peer review, publication - and endpoint to RDF store.

Q: How easy is this to use for a domain expert?
A: Semantic modeling is iterative process and not easy. But semantic wiki can hide complexity from enduser so domain expert can just enter data.

Q: We spend lots of time pleading with researchers to fill out webforms. How else can we motivate them, eg to do it during process rather than at end?
A: Certain types of people are motivated to use wiki. This is first step, proof of concept. Need a critical mass before self-sustaining.

Q: How much use would this actually be for domain experts? Would people without implicit knowledge gain from it?
A: Need to survey this and evaluate. It's valuable as a democratising process.

Q: What about patent/commercial knowledge?
A: Personally taking Open science / linked science approach - intended for research that's intended to be maximally shared.

A "Science Distribution Network" - Hadoop/ownCloud syncronised across the Tasman
Guido Aben, AARNet; Martin Feller, The University of Auckland; Andrew Farrell, New Zealand eScience Infrastructure; Sam Russell, REANNZ

Have preferred to do one-to-few applications rather than google-style one-to-billions. Now changing. Because themselves experiencing trouble sending large files. Scraped up own file transfer system, marketed as cloudstor though not in the cloud and doesn't store things. Expected couple hundred uses, got 6838 users over the last use. Why linear growth? "Apparently word of mouth is a linear thing..." Seem to be known by everyone who have file-sharing issues.

FAQs:
Can we keep files permanently?
Can I upload multiple files?
Why called cloudstor when it's really for sending?

"cloudstor+ beta" - looks like dropbox so why doing this if already there? They're slow (hosted in Singapore or US). Cloudstor+ 30MB/s cf 0.75MB/s as a maximum for other systems. Pricing models not geared towards large datasets. And subject to PRISM etc.

Built on a stack:
Anycast | AARNet
ownCloud - best OSS they've seen/tested so far - has plugin system and defined APIs
MariaDB
hadoop - but looking at substituting with XTREEMFS which seems to work with latencies.

Distributed architecture - can be extended internationally. Would like one in NZ, Europe, US, then scale up.

Bottleneck is from desktop to local node. Only way they can address this is to get as close to researcher as possible - want to build local nodes on campus.