аятп я рь ь ср с о ь ж кщр жср ч ь ср хж я ь з л ь кспз × р хжс вв

advertisement
Implementation of a Wait-Free Synhronization Primitive
That Solves
n-Proess Consensus
Dror G. Feitelson
Larry Rudolph
Department of Computer Siene
The Hebrew University of Jerusalem
91904 Jerusalem, ISRAEL
Abstrat
Several synhronization primitives were ompared
in a reent paper by Herlihy [7th PODC, 1988, pp.
276{290℄, using their ability to implement waitfree shared data objets as a measure of relative
strength. Wait-free n-proess onsensus was shown
to be a useful generi problem, whose solution for
dierent values of n reates a hierarhy of primitives. Well known primitives suh as test-and-set,
swap, and feth-and-add, an only solve 2-proess
onsensus, and thus annot be used to implement
arbitrary wait-free objets. Stronger primitives that
solve the n-proess version were also suggested in
the paper; these inlude memory-to-memory swap,
ompare-and-swap, and feth-and-ons. However,
it is not lear that these primitives an be implemented in a satisfatory manner. In this note a new
primitive that solves the wait-free n-proess onsensus problem is presented: the feth-and-onditionalswap. This primitive is easily implementable by a
ombining network.
Keywords: onsensus, read-modify-write, waitfree synhronization
1
Introdution
Shared memory allows multiple proesses to interat using few data objets; for example, a broadast an be implemented by having the transmitter
write to a shared ell and then having the reeivers
read from it. The problem with suh a sheme is
that if the transmitter is delayed for some internal
reason, the reeivers must wait for it. The solution
is to try not to use operations suh as broadast,
where many proesses are dependent on the ativities of one other proess; rather, symmetri operations are used in whih all proesses play a similar non-ritial role. In addition, the interations
should be wait-free, meaning that any proess is
guaranteed to omplete its part in the interation
regardless of the progress of the others.
A omparison of some well-known primitives,
based on the wait-free objets that they an implement, was reported reently by Herlihy [2℄. At the
heart of his work lies the problem of wait-free nproess onsensus: n proesses eah start out with
individual preferred values, and they must reah
an agreement on one of these values. A naive solution would be to deompose the problem into two
parts: rst hoose a leader, and then broadast the
leader's preferred value. However, the hosen leader
might go to a party to elebrate its vitory and forget to broadast the value, thus leaving his followers in suspense. Therefore the ations of hoosing
the value and broadasting it must be ombined in
some way into one atomi operation.
Herlihy uses n-proess onsensus in two ways.
First, he uses it to form a hierarhy of operations:
1. Read/write memory ells annot solve waitfree 2-proess onsensus.
2. Any non-trivial read-modify-write operation
an solve wait-free 2-proess onsensus, but
most annot solve 3-proess onsensus. This
inludes test-and-set, swap, feth-and-add, and
FIFO queues.
3. Several primitives do solve the general n-proess
onsensus problem. These inlude memoryto-memory move and swap, ompare-and-swap,
n-register assignment, feth-and-ons, and an
augmented FIFO queue.
Seond, he shows that primitives that solve the
wait-free n-proess onsensus problem are universal, in the sense that they an implement any waitfree objet.
The main problem with Herlihy's results is that
the primitives he suggests are not so primitive; in
Page 1
fat, it seems doubtful whether they an be implemented eÆiently at all. The purpose of this note is
to present a new primitive that is simple to implement, and solves the wait-free n-proess onsensus
problem; this is done in the next setion. The nal
setion evaluates the signiane of this result.
2
Feth-and-Conditional-Swap
The only way to implement a shared objet that
supports highly onurrent aess by multiple proesses is by use of a ombining multi-stage interonnetion network [3℄. Suh networks are indeed
utilized by researh prototypes like the NYU ultraomputer [1℄ and IBM RP3 [4℄. Other methods use
arbitration mehanisms to perform only one of a
set of onurrent requests addressed at the same
objet, and thus fore serial exeution when onurrent aess is attempted.
Combining proeeds as follows. The interonnetion network is a multi-stage paket-swithed
network that onnets proessors to memory modules. Aess requests are taken to be read-modifywrite operations of the form < addr; f >, where
addr is an address in the shared memory and f is
a funtion that speies how to update the ontents of that address. Suh requests return the
old ontents of the address. When two requests
< addr; f > and < addr; g > direted at the same
address arrive at the same network node, they are
ombined and the unied request < addr; g Æ f > is
forwarded to the relevant memory module (where
g Æ f is the omposition of g and f, i.e. g Æ f(x) =
g(f(x))). When the return value v is reeived by
the node, it is propagated bak as a response to the
rst request. Then f(v) is alulated, and sent as
a response to the seond request. The nal eet is
equivalent to a senario in whih the rst request
was served before the seond one [3℄.
Not all types of funtions an be ombined, however. Using the symbols of the above example, we
an identify the following two requirements that
must be fullled for ombining to be pratial:
1. The omposition of funtions g Æ f must be
easy to ompute.
2. The representation of g Æ f should not require
more spae than the representations of f or
of g.
The synhronization primitives suggested by Herlihy are not suitable for ombining:
Memory-to-memory swap: this instrution ad-
dresses two memory loations, and thus undermines the whole basis of a ombining network's operation. These networks an handle multiple requests that propagate from the
proessors to the memory modules, but annot handle diret interation among the proessors or among the memory modules.
Compare-and-swap: eah ompare-and-swap
requires the address ontents to be ompared
with another speied value; when suh requests are ombined all the values must propagate on, and atually all the work must be
done serially at the memory module.
Feth-and-ons: eah feth-and-ons requires
another element to be added to the head of
a list. Therefore when many are ombined,
many new elements must be transferred with
the request.
The proposed primitive, feth-and-onditionalswap, is based on the observation that a restrited
version of ompare-and-swap is suÆient for solving the n-proess onsensus problem. The initial
value of the shared memory loation is ?. All the
proesses hek for this same value, to see if they
are rst; if so, they substitute it with their preferred value. If the value they see is not ?, they
infer that another proess arrived before them; in
this ase they just read its preferred value.
Formally, feth-and-onditional-swap is dened
as a read-modify-write operation suh that the funtion f assoiated with a memory aess request is
x y: if x = ? then y else x;
where x is the ontents of memory and y is the
preferred value of the proess. Two requests with
preferred values y1 and y2 are ombined by arbitrarily hoosing one of them and sending it on; assume
it is y1 . When the return value v arrives, the two
return values are generated as follows: if v = ?,
return ? to the proess that had sent y1 and return y1 to the proess that had sent y2 . If it is not,
return v to both. Obviously, the hardware needed
to implement the ombining is small, and there is
no overhead in the size of the ombined request.
It should be noted that feth-and-onditionalswap is a restrited version of the data-level synhronization primitives suggested by Kruskal, Rudoplph, and Snir [3℄. These are primitives that perform an operation on a shared variable, onditioned
on the state that the variable is in. If the number of
Page 2
possible states is small, as it is in our ase, eÆient
ombining is possible. This also explains why the
full ompare-and-swap annot be ombined: it effetively uses the whole memory-word ontents as a
state variable, and thus has an exponential number
of states.
3
Evaluation
The feth-and-onditional-swap primitive introdued
in this note allows for a wait-free solution to the nproess onsensus problem; this is interesting in its
own right, as onsensus is a basi problem in onurrent systems. In addition, it is a simple and
pratial universal wait-free primitive, using Herlihy's onstrution for implementing feth-and-ons
based on onsensus and then inplementing any waitfree objet by use of feth-and-ons. As Herlihy
notes, however, this onstrution is too ineÆient to
be onsidered useful for pratial purposes. Therefore is seems that the universality of the new primitive is of aademi interest only.
It is natural to ask whether this deieny is
spei to feth-and-onditional-swap, in whih ase
other primitives ought to be sought, or maybe it is
inherent in the onept of universal wait-free primitives. The answer probably lies in between, speifially in the use of feth-and-ons as an intermediate step in the redution of an arbitrary objet
to a wait-free implementation using n-proess onsensus. The problem is that feth-and-ons implies
aess to two shared memory loations at one: one
is the pointer to the head of the list, whih is made
to point at the new element, and the other is the
\next" pointer of the new element, whih is made to
point at the element whih was previously pointed
at by the head. Thus feth-and-ons is similar to a
memory-to-memory swap at a very basi level.
Realization of a pratial universal wait-free primitive therefore requires a solution to one of the following problems:
Referenes
[1℄ A. Gottlieb, R. Grishman, C. P. Kruskal,
K. P. MAulie, L. Rudolph, and M. Snir, \The
NYU Ultraomputer | designing an MIMD
shared memory parallel omputer ". IEEE
Trans. Computers C-32(2), pp. 175{189, Feb
1983.
[2℄ M. P. Herlihy, \Impossibility and universality
results for wait-free synhronization". In Pro.
7th Symp. Priniples of Distributed Computing,
pp. 276{290, 1988.
[3℄ C. P. Kruskal, L. Rudolph, and M. Snir, \Efient synhronization on multiproessors with
shared memory ". ACM Trans. Programming
Languages and Systems 10(4), pp. 579{601,
Ot 1988.
[4℄ G. F. Pster, W. C. Brantley, D. A. George,
S. L. Harvey, W. J. Kleinfelder, K. P. MAulie,
E. A. Melton, V. A. Norton, and J. Weiss,
\The IBM researh parallel proessor prototype
(RP3): introdution and arhiteture ". In Intl.
Conf. Parallel Proessing, pp. 764{771, 1985.
1. Find an eÆient and pratial implementation of memory-to-memory swap.
2. Find a diret method to generate a wait-free
implementation of a onurrent objet, using
n-proess onsensus but avoiding feth-andons.
Page 3
Download