Paradigms for Global Computation - an Overview of the Hippo Project

Richard Connor and Keith Sibson

Department of Computing Science
University of Glasgow

{richard, sibsonk}@dcs.gla.ac.uk

Abstract

The Hippo project is an investigation into the use of high-level programming paradigms in the context of global computation. The intention is to discover abstractions which allow global applications to be written by an average applications programmer, rather than a specialist in networked computer systems. This will be achieved by the combination of the provision of high-level semantic constructs which can describe useful behaviour patterns within the global computer network, and abstraction over unnecessary low-level detail of the communication protocols.

This paper gives only a high-level overview of the project’s aims and objectives. Although at an early stage of research, many more technical issues and partial solutions have been identified which are not described here due to a lack of space.

1. Background

In [Car97] Luca Cardelli states: "Today the standard computing infrastructure consists of locally networked personal computers with poor global connections. Gradually, this infrastructure will be complemented and replaced by network terminals that rely heavily and transparently on global resources. This transformation will be associated with the development of new computation and programming paradigms."

A global programming system is defined as one which allows the construction of applications which rely upon global resources, including remote invocations of other applications. The essence of global computation is that the network includes a wide variety of autonomous nodes and suffers from various forms of temporary and permanent failure. These attributes are considered to be essential and permanent features of the global computer network.

Current programming systems do not provide good support in this context, putting the construction of global applications beyond the skills of the average applications programmer. The aim of the Hippo project is to investigate high-level programming paradigms which allow application writers both to define the total semantics of an application within this domain, and to avoid the distraction of having to provide unnecessary detail whilst working within it.

Global programming is feasible using current mechanisms and methodologies; however, there are a number of stumbling blocks for the programmer which make it difficult to implement such applications:

• Current abstractions over transport protocols are low-level and hard to use, requiring for example explicit memory and cache management schemes to be provided by the programmer [W3C].

• The basic unit of data transfer is the text file, which is not only a large grain abstraction, but also requires significant work for the programmer to flatten and restore data structures shared over a global computation. [Atk78, AM95]

• The computational models of traditional languages do not match the implicit semantics of the global network, where autonomous management decisions, network failures and significant delays figure largely, even in high-quality data collections. [Car97]

The primary aim of the project is to prove the concept that a single unified programming system which overcomes these problems will make the coding of global applications significantly easier than at present. At best the current state of the art relies upon the gluing together of disparate technologies, none of which was primarily designed for the global domain.

It should be stressed that the approach taken is by necessity somewhat of a compromise between the current state of the art and a purely academic solution, as proving the concept requires the implementation of a system which coexists with the current state of the internet. To this end the decision has been made to work within the confines of current http and html definitions, but to supplant any higher-level application building technology with the experimental system.

As a first stage in the technical investigation, the Hippo Core Language (HCL) was designed and implemented. HCL is a data acquisition language which takes the internet as its data domain and allows the automation of queries over it. The computational model is simple, allowing for the description of only a single, non-migratory process, and there is no provision for typed data transfer. However HCL does provide an initial experiment in the automation of information-gathering tasks which are currently performed almost exclusively by human beings interacting with web browsers. The main features of HCL are the ability to program the fetch of internet data, including from dynamically evaluated URLs, and non-deterministic constructs to allow the description of programs which behave differently according to aspects of their dynamic internet environment, including transmission delays and failures. A full description of HCL is given in [CS97a]. The main further areas of investigation of the Hippo project, outlined in this paper, are a computational model which allows multi-process computations and the transfer of typed data between such processes.

1.1. Related work

Cardelli and Davies [CD97] describe language features which allow the explicit inclusion of the concepts of non-determinism and transfer rates within computations, which gives the programmer the ability to explicitly code with respect to the internet semantics of delay and failure. The unit of transfer in this language is assumed to be the text file, and the transfer of typed data is not addressed.

Apart from this interest no published work on global computation paradigms, rather than internet computation, is known of. An example of the latter is a project at Southampton University [deR96] which uses a Lisp-based scripting system to allow a programmer to link together various internet protocols. Such work addresses the first of the problems identified above, but is not intended to address the problems of global application construction.

The language Telescript [Whi94] deals explicitly with locality and mobility of resources, but its agents can not communicate with each other. The Java [GJS96] system models program fragments passing between connected computers. This connectivity is achieved by passing (logical) source forms of program fragments across channels, and the activity is not modelled within the semantics of the language domain. Tool kits from Sun and other vendors address some of the problems, but will also introduce a new wave of even more ambitious applications to face those not addressed. For example javaspaces from Sun [JS97] and MQ from IBM [MQ97] are systems that allow programmers to connect components together whilst avoiding much unnecessary complexity. Such systems help by imposing some discipline and abstracting much low-level detail such as queuing, message management and synchrony. However they still do not give a high-level abstraction with which a programmer can model the organisation of an entire global computation.

The language Obliq [Car95] models distributed computations, and allows threads to migrate around a network whilst maintaining connections to data on other nodes. Obliq however requires a reliable network for a correct implementation, as communication failure is not part of its semantics. It is therefore a model of programming over a reliable sub-domain where engineering requirements can be guaranteed, rather than the global domain where autonomy and therefore partial reliability and are essential features.

Also related is a significant and growing interest in mobile agents [LN97, VT97]. The project described here is related to that interest, but once again has a significantly different viewpoint in that distribution paradigms are modelled within the computational semantics, rather than the opposing view of agents themselves being distributed among a network with a different semantics. It is interesting to note, however, that the Hippo language system, which includes distributed first-class procedures, provides a useful enabling architecture for many mobile agent systems that have been described.

There are many related topics that are not explicitly addressed in this project, such as repudiation, authentication, security, privacy, progress monitoring and resource management. Such issues are seem as secondary and also to some extent orthogonal to the development of global computation paradigms. However the investigators are aware of their importance, and their potential impact will be taken into account where necessary.

2. Project overview

The project contains no aspect of research into language design as such; research into computational models is largely expected to result in models which could be fitted to a wide variety of languages in different paradigms. The Hippo language is as far as possible paradigm-neutral, including a combination of first-class procedures with aggregation and type abstractions which allows the description of any of imperative, functional or object-oriented paradigms. It addresses the three core problem areas identified as follows:

• the inherent complexity of the communications mechanisms are hidden from the programmer by a high-level interface

• a high-level persistent object interface is presented within the language, allowing small-grain typed data to be transferred over the standard protocols

• non-deterministic constructs which match the internet semantics are provided, allowing programmers to specify non-deterministic choice, timeout, alternative computations, etc.

The underlying type system, rather than the language type system, fulfils two purposes. Primarily, it must give a way of describing Hippo typed data, to allow the dynamic inclusion of remote data into a computation without compromising soundness [ABC83, Con90]. More importantly, however, the common type system gives an implicit transfer protocol for typed data, and the intention is to design a language-neutral system which can be specialised for any high-level language. The effect of this is to give data repositories whose description is independent of any single language. It might be noted that this is a far less ambitious goal than a common type system which allows concurrent interworking of different high-level languages, the latter being notoriously hard and believed by many to be unachievable.

2.1. The Hippo Computational Model

A Hippo application consists of an arbitrary number (one, for many applications) of potentially communicating, distributed and autonomous processes. The model is implemented within the confines of commonly available internet protocols and standards, notably http and html, and is designed to fit with all aspects of these, particularly their autonomy and unreliability.

The Hippo universe consists of a set of Hippo systems. A system consists of a set of multi-threaded processes. Each system has a unique identifier, and each process within that system has its own locally unique identifier, giving each process also a unique global identifier. Process local identifiers may be requested by a process at startup time, rather than arbitrarily allocated by the system.

It may be helpful to think of the Hippo universe as equating to the whole internet, a Hippo system as a personal (possibly virtual) workstation, and a Hippo application as a set of processes spanning many systems but sharing the same local identifier. However this guide is not enforced and is intended only to give some idea as to the intended granularity of the concepts.

Each system has a public data space, and each process has a private data space. Each process may write only to its own private space or to its system’s public space, but may read the public data space of any other system. In addition to these data spaces, there is also support for lightweight message passing between processes. Fixed-size messages may be sent to Hippo systems, where they are placed in a message log readable by all processes local to that system. The intention is that such messages will be composed of a local process identifier plus a string message interpretable by that process, but no semantics are enforced at this level. The transfer of larger amounts of data may be achieved by the sending process placing the data in a local public space and then alerting the receiving process to its presence via a fixed-size message.

The data space may be thought of superficially as akin to a standard file system. The public space may be thought of as that part of a file system readable by, for example, an ftp or http server, and the private space as a combination of a processes’ normal private memory allocation and parts of the file system private to that process. The message passing protocol equates to a cgi interface in the http protocol. Again, this model is not enforced and is given as a rough guide to intended usage.

The data stored in both private and public data spaces includes typed, structured data, including first-class functions and procedures. Embedded references may span arbitrary areas. The privacy of the private address space relates only to the roots of the data , and it is quite feasible to create data in the public space which contains references to other data only previously reachable from private roots. These are conceptual and not physical spaces, with the physical storage being automatically managed by the language implementation.

2.2. Persistence

Orthogonal persistence is the property whereby the treatment of data is entirely orthogonal to its lifetime [AM95]. One model of this is to replace traditional file system access primitives by two language meta-constructs, intern and extern. Given a value x, of type t, and a global external namespace ranged over by n, then

extern( n, x )

causes a conceptual link to the value x to be placed in the global namespace, such that the call

intern( n, t )

results in the same value, formerly denoted by x. intern will fail dynamically if the identifier n does not have a binding of type t in the external namespace. The crucial property of this model is that the semantics of the value is unaffected, even in the presence of identity and mutability. The observation that the intern and extern calls can take place in different program invocations gives rise to the sharing of typed data among programs in different contexts, and therefore subsumes much of the activity traditionally associated with untyped or partially-typed file and database management systems.

To unify this language model with the architecture described above, the Hippo language uses two external namespaces. That used for the extern command corresponds to the namespace of the local file system where the program is executing. There are two forms of intern, one of which operates over this namespace also, and one which operates over the namespace of URLs. Hippo supports text as a base type, and if these commands are used with text parameters than they capture traditional file system and internet data transfer.

To fit with the architectural model, writes to a processes’ private data space correspond to calls of extern where the name given corresponds to a file creation within some private directory, and public writes correspond to filenames which resolve to be within the domain of a public server, such as ftp or http. To give a concrete example, the following code creates a new object of class person and stores it in an external namespace within the domain of an http server:

fred = new person( "fred", 31 )
extern( "/local/www/hippo/fredInfo", fred )

This object may subsequently be retrieved by a remote application executing the following code, and the identifier thisFred is typed locally as person.

thisFred = internURL( "http://www.dcs.gla.ac.uk/~hippo/fredInfo", person )

If the data is not accessible, or has some other type, an exception is raised instead.

2.3 Data formats

Typed data is physically represented as text files to ensure compatibility with other text protocols. A mapping is established from each data type to an html representation of instances of that type, making it possible for typed data to be read by other html tools such as browsers. To achieve this, system housekeeping information such as common type representations and compiled procedure code is kept in comment fields, and an html view of the data value is generated for each constructor. A paper on this topic is currently in preparation [CS97b].

This is an inefficient way to store data, and the mapping applies only to data transported over the internet; a different storage architecture is required on each node to achieve reasonable performance. The flattened text file format may be dynamically constructed per client request so long as local object store performance allows. In terms of the data transfer, it is noted that bandwidth costs are expected to continue to reduce dramatically, and eventually become insignificant compared to connection costs. The html format allows multi-object files, through the use of html anchors, and therefore much data can be transported for a single connection.

2.4 Type system

The Hippo type universe provides a fairly standard range of data constructors and first class functions. As well as these, however, the language has an explicitly typed model of mutability, and (seperately) explicit reference type constructors, which are designed to give support to global paradigms. Once again only a brief outline of these types is given here, as the issues still require extensive investigation. We are however convinced that the schemes outlined will eventually form the basis of a sound and useful type system.

The location type is used to control the problem of generalised network update. The system contains a subtyping rule which states that a location of any type is a subtype of that type. Based on the subtype relation, we specify a further relation, called the immutable supertype relation, with the property that if A is an immutable supertype of B, then B is a subype of A, and the type of A contains no location type constructors. This means that statically it is not possible to describe a legal update to any location contained within the run-time value of any denotation typed as A. When remote data is typed at an intern command, the dynamic test for success is that the statically asserted type is an immutable supertype of the type associated with the physical data, thus preventing remote updates from occurring in well-typed code.

The reference type constuctors include explicit types for remote and local references, corresponding to the physical location of values. Other than these types all values are scalar. A type which is a reference of T is a subtype of the corresponding type T; however, any dynamic use of such supertyping in the program text corresponds to a logical fetch of the value referred to. This is intended to allow the programmer to specify the manipulation of referenced values without forcing their fetch, while at the same time avoiding unecessary noise by explicitly causing the fetch. Problems remain, of course, with the referential integrity of both local and remote typed data, and a further area of research in into acceptable exception mechanisms and other failure models.

3. Conclusions

We have outlined the main ideas behind a system we believe will simplify global programming, given the current state of the internet, and make it accessible to applications programmers. Our current aims are to fully implement a system of this nature, and observe its use by programmers, to prove the concept of its utility.

Major implementation challenges are the construction of the distributed typechecker, and the distributed object store. The extent of these challenges should not be underestimated; however the problems are at least well understood. Various solutions are available, and are well documented in, for example, the proceedings of the POS and DBPL series of workshops. The transfer of existing knowledge to the new context still remains a challenge, but a much lesser one that the design of suitable algorithms and methodologies from scratch.

4. References

[ABC83] M.P. Atkinson, P. Bailey, K.J. Chisholm, W.P. Cockshott and R. Morrison An Approach to Persistent Programming The Computer Journal 26, 4 ( 1983 ) pp 360 - 365

[AM95] M.P Atkinson and R Morrison Orthogonally Persistent Object Systems VLDB Journal 4, 3 pp 319 - 401

[Atk78] M.P. Atkinson Programming Languages and Databases Proc. 4th International Conference on Very Large Data Bases, Berlin In S.P. Yao (editor), IEEE (September 1978) pp 408 - 419

[Car95] Luca Cardelli A language with distributed scope Computing Systems 8, 1 pp 27 - 59 MIT Press, 1995

[Car97] Luca Cardelli Mobile Computation In Mobile Object Systems - Towards the Programmable Internet, J. Vitek and C. Tschudin Eds., 3-6, LNCS 1222, Springer-Verlag, 1997.

[CD97] Luca Cardelli and Rowan Davies. Service Combinators for Web Computing. SRC Research Report 148, Digital Equipment Corporation Systems Research Center. 1997.

[Con90] Connor, R. C. H. (1990) Types and Polymorphism in Persistent Programming Systems. Ph.D. Thesis, University of St Andrews.

[Con96] R.C.H Connor Hippo: a Web Programming Language 5th International World-Wide Web Conference, Workshop on Application Programmer Interfaces, http://www.cs.vu.nl/~eliens/WWW5/program.html

[CS97a] Richard Connor and Keith Sibson HCL - a Core Language for Global Computation In preparation

[CS97b] Richard Connor and Keith Sibson HTML representation of high-level data types In preparation

[deR96] David de Roure Programming the Internet http://vim.ecs.soton.ac.uk/lispw3.html

[GJS96] Gosling, J, Joy, B & Steele, G. The Java Language Specification Sun Microsystems.

[Hip97] http://www.dcs.glasgow.ac.uk/~hippo

[JS97] Javaspaces: http://chatsubo.javasoft.com/javaspaces/

[LN97] Proceedings of the 1st int workshop on mobile agents Springer-Verlag LNCS series volume 1219

[MBC94] Morrison, R., Brown, A. L., Connor, R. C. H. et al. (1994) The Napier88 Reference Manual (Release 2.0). University of St Andrews.

[MQ97] http://www.hursley.ibm.com/mqseries/index.html

[W3C] World-Wide Web Consortium, http://www.w3.org/pub/WWW/

[VT97] Mobile Object Systems - Towards the Programmable Internet, J. Vitek and C. Tschudin Eds., LNCS 1222, Springer-Verlag, 1997.

[Whi94] White, J.E. Telescript technology: the foundation for the electronic market-place White paper, General Magic Inc 1994