RPCA Semantics (20 Jan 2008)

I'm currently writing an RPC layer in Haskell (and also in C since I expect that I'll need it). I'm using libevent's tagged datastructures (which is why you've see Haskell support for that from me), however I'm not using evrpc because of a number of reasons. Firstly, it uses HTTP as a transport. What troubles me about using HTTP directly as a transport layer is the in-order limits that it imposes. The server cannot deliver replies out of order, nor can it deliver multiple replies for a single request (at least, not with replies to other requests mixed in), nor can it send any unsolicited messages (e.g. lame-mode messages).

Also, evrpc has no support for lameness, although that's fixable (modulo the HTTP issues). Because of all that I decided to roll my own, called RPCA (because I'm not sufficiently self-centered just to call it Network.RPC :)). I'm including part of the RPCA documentation below for comments.

RPCA is an RPC system, but that's a pretty loose term covering everything from I2C messages to SOAP. So this is the definition of exactly what an RPCA endpoint should do.

RPCA RPCs are request, response pairs. Each request has, at most, one response and every response is generated by a single request. That means, at the moment, so unsolicited messages from a server and no streaming replies.

RPCs are carried over TCP connections and each RPC on a given connection is numbered by the client. Each RPC id must be unique over all RPCs inflight on that TCP connection. (Inflight means that a request has been send, but the client hasn't processed the reply yet.) A reply must come back over the same TCP connection as the request which prompted it. If a TCP connection fails, all RPCs inflight on that connection also fail.

An RPC request or reply is a pair of byte strings. The first is the header, which is specific to RPCA. The only part of the header which applications need be concerned with is the error code in the reply header. The second is the payload (either the arguments in the case of a request, or the result in the case of a reply). This may be in any form of the applications' choosing, but it expects that it'll be a libevent tagged data structure.

An RPC is targeted at a service, method pair. A server can export many services but each must have a unique name on that server. (A server is a TCP host + port number.) Each service can have many methods, the names of which need only be unique within that service.

A Channel is an abstract concept on the client side of a way of delivering RPCs, and getting the replies back from a given server, service pair. It's distinct from a connection in that a Channel can have many connections (usually only one at a time, though) and that a Channel targets a specific service on a server.

On a given server a service may be up, lame or down. There's no difference between a service which is down and a service which a server doesn't export. Services which are lame are still capable of serving requests, but are requesting that clients stop sending them because, for example, the server is about to shutdown. When a service becomes lame it sends special health messages along all inbound connections to the server, so that clients may be asynchronously notified. (Note that health messages aren't RPCs so this doesn't contradict the above assertion that there are no unsolicited RPC replies.)

If a Channel is targeted at a single server, service pair, then it's free to assume that the service is immediately up. If not, the server will set the error code in the RPC replies accordingly. If a Channel is load-balancing (i.e. is has multiple possible servers that a request could be routed to) it must wait to perform a health check before routing any requests to any server. A load-balancing Channel stops routing requests to any servers which report lameness.

Note that lameness is a per-service value so that some services on a server may be lame with others are up.