Dilemmas and Thoughts about Web Services and Cishell

Dilemmas

Centralized vs Decentralized

  1. Should Cishell instances find remote algorithms through a centralized registry, or should each instance maintain its own list of remote algorithms?
  2. If each maintains its own, should we make it possible for registries to be used by other instances for discovery?

While a centralized registry is tempting for obvious reasons, we're going to need at least some sort of individual ability to subscribe to web services without one (which is equivalent to a local registry), otherwise we're creating a barrier to entry and shutting out groups unwilling to install (or use someone else's?) centralized registry.

Bruce and I had some thoughts regarding allowing registry information to propagate (via use-from lists that have discovery location and that location's public key, and allow-use-by lists that had the using instance's public key), but the problem we were running into was any insertion of questionable algorithms at a highly-trusted instance would immediately propagate to everyone trusting any instance downstream.

Perhaps that fear is unreasonable due to all trusting chains being short and/or well monitored at most links, but we can't count on that. My potential 'fix' is to only auto-add algorithms that are listed by directly-trusted computers, and allow users to optionally add algorithms that are trusted further up the chain by directly-trusted computers. This requires some user intervention to create propagation chains, but only minimally so, and essentially prevents the (rough) equivalent of DNS poisoning attacks.

Service Registration vs User Specification

  1. Should services be able to advertise themselves with a registry (perhaps using some sort of authentication), or should they need to be added to the registry directly?

For the most part, direct addition to the registry is preferable. It is simpler and easier to secure (and inherently more secure even if service registration is well-secured). The main question is whether we'll need to support a grid-independent service registration option or just make it possible for instances to register algorithms with one or more existing grid registration implementations.

Rough Outline

There are (at least) two separate 'silos' of functionality to consider: service consumption and service provision.

Service ConsumptionLevel OneEnter Algorithm WSDL Locations directly into a simple local registryLevel TwoTrust some other registries for finding algorithmsLevel ThreePlug into some sort ofMyProxysystem for auto-magic algorithm propagation?Level FourConsume Grid ServicesThis last isn't really a level above the other three -- it should be possible to consume grid services significantly more easily than setting up a MyProxy-based system; its almost a separate area of concern.

Service ProvisionLevel OnePublish a simple web service for an algorithm using no or simple u/p authentication.Level TwoPublish a simple web service for an algorithm using PKI auth, likely tied (optionally?) to aMyProxyserver.Level ThreeRegister an algorithm as a service with a grid registry.

Standards and Us

WSDL

Service consumption and publication should both be based around (a subset of) the WSDL standard. Registry-related operations will just be simple web services, the details aren't terribly important, there. Publishing algorithms, however, will involve intelligently creating WSDL files, and consuming them will involve intelligently parsing them. Getting this right will allow the consumption of semi-arbitrary web services as algorithms.

Basically, a simple subset of the WSDL SOAP data bindings should be used that uses some conventions to deal with input file formats and output file formats. Lists of parameters that are consistent with current algorithm parameters should be allowed, while any beyond the simplest structures should not be created by generated WSDL and not allowed for consumption as a remote algorithm specification.

Globus

This will be painful, but it seems necessary. Another possible grid infrastructure might be gLite, mentioned here: http://grid.ncsa.uiuc.edu/myproxy/delegation/

The Log

Logging output is very, very important. A pub-sub system like WS-Eventing (which appears to be quite elegant) will enable logging across services. Specifically, any request to run a service remotely should be able to also subscribe w/ WS-Eventing to receive all logs created by the use of that algorithm. Whether this request will be combined with the algorithm request is uncertain, and the capability to do this will definitely need to be specified somehow in WSDL. However, there must be a feasible guarantee of delivery. Perhaps a request will be made beforehand to subscribe to all logs generated by algorithms the subscription-requesting instance later requests.

Bizarre thoughts

Use AOP to insert thread stopping code in algorithms at commonly visited points (fors/whiles?)