Clock Synchronization in Distributed System | Process Synchronization in Distributed System

Introduction:

Synchronization in distributed systems is often much more difficult compared to synchronization in uni-processor or multi-processor systems. Synchronization is all about doing the right thing at the right time. In process synchronization we make sure that one process waits for another to complete its operation.

A problem in distributed systems, and computer networks in general, is that there is no notion of a globally shared clock. In other words, processes on different machines have their own idea of what time it is.

There are various way to synchronize clocks in a distributed system, but all methods are essentially based on exchanging clock values, while taking into account the time it takes to send and receive messages. Variations in communication delays and the way those variations are dealt with, largely determine the accuracy of clock synchronization algorithms.

An important class of synchronization algorithms is that of distributed mutual exclusion. These algorithms ensure that in a distributed collection of processes, at most one process at a time has access to a shared resource. Distributed mutual exclusion can easily be achieved if we make use of a coordinator that keeps track of whose turn it is.

Synchronization between processes often requires that one process acts as a coordinator. In those cases where the coordinator is not fixed, it is necessary that processes in a distributed computation decide on who is going to be that coordinator. Such a decision is taken by means of election algorithms. Election algorithms are primarily used in cases where the coordinator can crash.

Time in a Distributed System:

Time is an important practical issue in distributed system. For example, we require computers around the world to timestamp electronic commerce transactions consistently. Time is also an important theoretical construct in understanding how distributed executions unfold. But time is problematic in distributed systems.

Local clocks invariably drift and need periodic resynchronization to support a common notion of time across the entire distributed system. Each computer may have its own physical clock, but the clocks typically deviate, and we cannot synchronize them perfectly. Time is an important and interesting issue in distributed systems, for several reasons.

First, time is a quantity we often want to measure accurately. In order to know at what time of day a particular event occurred at a particular computer it is necessary to synchronize its clock with an authoritative, external source of time. For example, an eCommerce transaction involves events at a merchant’s computer and at a bank’s computer. It is important, for auditing purposes that those events are timestamped accurately.

Second, algorithms that depend upon clock synchronization have been developed for several problems in distribution [Liskov 1993]. These include maintaining the consistency of distributed data, checking the authenticity of a request sent to a server, and eliminating the processing of duplicate updates.

Clock, Event and Process States:

A distributed system consist of a collection P of N processes p_i, i= 1, 2,..., N. Each process p_i in P has a state, s_i, which in general, it transforms as it executes. The process’s state includes the values of all the variables within it. Its state may also include the values of any objects in its local operating system environment that it affects, such as files.

As each process p_i executes it takes a series of actions, each of which is either a message send or receive operation, or an operation that transforms p_i’s state – one that changes one or more of the values in s_i .

We define an event to be the occurrence of a single action that a process carries out as it executes – a communication action or a state-transforming action. The sequence of events within a single process p_i can be placed in a single, total ordering, which we denote by the relation à_ibetween the events. That is, eà_i e' if and only if the event e occurs before e' at p_i.

Now we can define the history of process p_i to be the series of events that take place within it, ordered as we have described by the relation à_i

history (p_i) = h_i= <e_i⁰, e_i¹, e_i², ……… >

Computer Clocks and Timing Events:

Each computer in a Distributed System has its own internal clock used by local processes to obtain the value of the current time. Processes on different computers can timestamp their events. But clocks on different computers may give different times.

Even if clocks on all computers in a distributed system are set to the same time, their clocks will eventually vary quite significantly unless corrections are applied. Computer clocks drift from perfect time and their drift rates differ from one another.

Clock Skew and Clock Drift:

Clock Skew is the difference between the times on two clocks (at any instant). The instantaneous difference between the readings of any two clocks is called their skew. Also, the crystal-based clocks used in computers are, like any other clocks, subject to clock drift, which means that they count time at different rates, and so diverge.

Clock drift rate- the relative amount that a computer clock differs from a perfect clock. A clock’s drift rate is the change in the offset (difference in reading) between the clock and a nominal perfect reference clock per unit of time measured by the reference clock.

Coordinated Universal Time:

All the computers are generally synchronized to a standard time called Coordinated Universal Time. Coordinated Universal Time – abbreviated as UTC is an international standard for timekeeping. UTC is the primary time standard by which the world regulates clocks and time.

The time keeping in UTC is based on atomic clocks. UTC signals are regularly broadcast from satellites as well as many radio stations.

Computer servers and online services with UTC receivers can be synchronized by satellite broadcasts. Many popular synchronization protocols in distributed systems use UTC as a reference time to synchronize clocks of computers.

Clock Synchronization:

Clock synchronization is the mechanism to synchronize the time of all the computers in the distributed environments or system.

Assume that there are three systems present in a distributed environment. To maintain the data i.e. to send, receive and manage the data between the systems with the same time in synchronized manner you need a clock that has to be synchronized. This process to synchronize data is known as Clock Synchronization.

Synchronization in distributed system is more complicated than in centralized system because of the use of distributed algorithms. As the distributed systems has its own clock. The time among the clocks may also vary. So, it is possible to synchronize all the clocks in distributed environment.

Types of Clock Synchronization

· Physical clock synchronization

· Logical clock synchronization

Synchronizing Physical Clocks:

In physical clock synchronization, all the computers will have their own clocks. The physical clocks are needed to adjust the time of nodes. All the nodes in the system can share their local time with all other nodes in the system. The time will be set based on UTC (Universal Coordinate Timer).

The availability of synchronized clocks simplifies many problems in distributed systems. Air-traffic control systems rely on accurate timekeeping to monitor flight paths and avoid collisions. Some security mechanisms depend on the physical times of events, so a loss of synchronization may be a potential security lapse.

Three main problems have been studied in the area of physical clock synchronization:

External Synchronization:

External Synchronization synchronizes each clock in the distributed system with a UTC. The goal of external synchronization is to maintain the reading of each clock as close to the UTC as possible. A time server is a machine that provides accurate time information to be used as a reference by other machines. The NTP (Network Time Protocol) is an external synchronization protocol that runs on the Internet and coordinates a number of time servers. This enables a large number of computers connected to the Internet to synchronize their local clocks to within a few milliseconds from the UTC. NTP takes appropriate recovery measures against possible failures of one or more servers as well as the failure of links connecting the servers

Internal synchronization:

Internal synchronization synchronizes the clocks in the distributed system with one another. The goal of internal synchronization is to keep the readings of a system of autonomous clocks closely synchronized with one another, despite the failure or malfunction of one or more clocks. These clock readings may not have any connection with UTC or GPS time—mutual consistency is the primary goal.

Phase synchronization:

Many distributed computations run in phases: in a given phase, all processes execute some actions, which are followed by the next phase. A phase clock is an integer-valued variable that is incremented each time a phase completes. Each process has its own copy of the phase clock.

In the clock phase synchronization problem, we assume a synchronous model where all phase clock variables are incremented in unison, as if all of them are driven by the same clock. Clearly, once all the phase variables are equal, they remain so forever, and synchronization becomes unnecessary. However, due to transient failures, phase clocks may occasionally differ, so that while all the nonfaulty clocks tick as 1,2,3,4,…, the faulty clock might tick as 6,7,8,9,… during the same time.

A clock phase synchronization algorithm guarantees that starting from an arbitrary configuration, eventually the values of all the phase clocks become identical.

Algorithms for Physical Clock Synchronization:

Clocks are one of the most important components of computers and other devices. However, for various factors, these clocks may drift from standard frequency or degrade and may gain or loose time with respect to the reference clock and this time difference between the two clocks is called clock skew.

This clock skew may gradually increase and eventually cause de-synchronization of the computer clock from the reference clock, which could affect their normal operation. Therefore it requires synchronization of the computer clock with the reference clock to minimize clock skew.

Several algorithm and protocols proposed for synchronizing physical clocks:

· Cristian Algorithm
· Berkeley Algorithm
· Network Time Protocol (NTP)

Cristian Algorithm:

Cristian [1989] suggested the use of a time server, connected to a device that receives signals from a source of UTC, to synchronize computers externally.

In this method, a client obtains the data from a special host (called the time server) that contains the reference time obtained from some precise external source. Upon request, the server process supplies the time according to its clock.

Cristian observed that while there is no upper bound on message transmission delays in an asynchronous system, the round-trip times for messages exchanged between pairs of processes are often reasonably short – a small fraction of a second. He describes the algorithm as probabilistic: the method achieves synchronization only if the observed round-trip times between client and server are sufficiently short compared with the required accuracy.

Ø Basic Idea of Cristian Algorithm: If client wants to correct its time as per server time, then it will make the request to the time server and correct accordingly.

Ø Cristian Algorithm is based on client-server concept makes use of RPC (remote Procedure Call)

Ø It makes use of UTC, i.e., Co-ordinated Universal Time.

Ø The process on the client issues RPC to the time server at time T₀ to obtain the time.

Ø The client process fetches the response from the clock server at time T₁and calculates the new synchronized client clock time by-

T_CLIENT = T_SERVER + (T₁ – T₀)/2

Where T_CLIENT denotes the synchronized clock time, T_SERVER denotes the clock time returned by the server, T₀ denotes the time at which the client process sent the request and T₁ denotes the time at which the client process received the response.

Synchronized time on the client:

Here,

T₀ = 10:25:10

T_SERVER= 10:25:13

T₁= 10:25:14

T_CLIENT= T_SERVER + (T₁-T₀)/2

= 10:25:13 + (10:25:14-10:25:10)/2

= 10:25:13 + 00:00:04/2

= 10:25:13 + 00:00:02

= 10:25:15

Discussion of Cristian’s algorithm:

Cristian’s method suffers from the problem associated with all services implemented by a single server: that the single time server might fail and thus render synchronization temporarily impossible. Cristian suggested, for this reason, that time should be provided by a group of synchronized time servers, each with a receiver for UTC time signals. For example, a client could multicast its request to all servers and uses only the first reply obtained. The problem of dealing with faulty clocks is partially addressed by the Berkeley algorithm.

Berkley Algorithm:

Berkley Algorithm is a physical clock synchronization algorithm used in distributed system. A well-known algorithm for internal synchronization is the Berkeley algorithm.

In Berkeley algorithm, an individual node is chosen as the master node. This node is the main node in the network which acts as a master and rest of the nodes act as a slave. If master node fails any slave in the network can take over.

Master node periodically request time from all its slave nodes. When the slave nodes send their responses, master nodes calculates average time difference between all the clock times received and the clock time given by master’s system itself.

This average time difference is added to the current time at master’s system clock and broadcasted over the network. Thus synchronization is achieved.

Pass 1: The master node requests timestamps from all the slave nodes.

Pass 2: Slave nodes respond their timestamps to master node.

Pass 3: Master node calculates average time difference between all the clock times received and the clock time given by master’s system itself.

= (+10+20+0-10)/4

= 20/4

= 5

Pass 4: This average time difference is added to the current time at master’s system clock and broadcasted over the network.

The Berkeley algorithm eliminates readings from faulty clocks. Such clocks could have a significant adverse effect if an ordinary average was taken so instead the master takes a fault-tolerant average. That is, a subset is chosen of clocks that do not differ from one another by more than a specified amount, and the average is taken of readings from only these clocks.

Network Time Protocol (NTP):

NTP Algorithm is a physical clock synchronization algorithm used in distributed system. It is a protocol that helps the computer clock times to be synchronized in a network. NTP is an elaborate external synchronization mechanism designed to synchronize clocks on the internet with the UTC.

It is not practical to equip every computer with atomic clocks or GPS satellite receivers. Cost is a major factor. So, these computers use the NTP to synchronize the clocks.

The NTP service is provided by a network of servers located across the Internet. Primary servers are connected directly to a time source such as a radio clock receiving UTC; secondary servers are synchronized, ultimately with primary servers. The servers are connected in a logical hierarchy called a synchronization subnet whose levels are called strata. Primary servers occupy stratum 1: they are at the root. Stratum 2 servers are secondary servers that are synchronized directly with the primary servers; stratum 3 servers are synchronized with stratum 2 servers, and so on. The lowest-level (leaf) servers execute in users’ workstations.

NTP architecture is a tiered structure of clocks, whose accuracy decreases as its level (defined by a stratum number) increases. Stratum 0 comprises the most accurate reference clocks (atomic clocks). The stratum 0 clocks are directly connected to the computers in the next level (i.e., stratum 1). The stratum 1 computers act as time servers for the computers belonging to the next level (i.e., stratum 2). In general, stratum i computers act as time servers for the stratum (i + 1) computers.

The synchronization subnet can reconfigure as servers become unreachable or failures occur. If, for example, a primary server’s UTC source fails, then it can become a stratum 2 secondary server. If a secondary server’s normal source of synchronization fails or becomes unreachable, then it may synchronize with another server.

Logical Time and Logical Clocks:

In distributed systems, there is no global clock exists rather it uses logical clocks to synchronize the events in the system. Logical Clocks refer to implementing a protocol on all machines within distributed system, so that the machines are able to maintain consistent ordering of events within some virtual time span.

A logical clock is a mechanism for capturing chronological and causal relationships in a distributed system. Distributed systems may have no physically synchronous global clock, so a logical clock allows global ordering on events from different processes in such systems.

The Logical Time in distributed system is used to maintain the consistent ordering of events. Logical time is not based on timing but on the ordering of events.

Logical clocks do not need exact time. So, absolute time is not a concern in logical clocks. Logical clocks just bothers about the message to be delivered or not about the timings of the events occurred.

The most common logical clock synchronization algorithm for distributed system is Lamport’s algorithm. It is used in the situation where ordering is important not the time.

In a classic paper, Lamport (1978) showed that although clock synchronization is possible, it need not be absolute. If two processes do not interact, it is not necessary that their clocks be synchronized because the lack of synchronization would not be observable and thus could not cause problems. Furthermore, he pointed out that what usually matters is not that all processes agree on exactly what time it is, but rather that they agree on the order in which events occur.

Lamport’s Logical Clock:

To synchronize logical clocks, Lamport defined a relation called happens-before. The expression a à b is read "a happens before b" and means that all processes agree that first event a occurs, then afterward, event b occurs. The happens-before relation can be observed directly in two situations:

1. If a and b are events in the same process, and a occurs before b, then a à b is true.

2. If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a à b is also true.

A message cannot be received before it is sent, or even at the same time it is sent, since it takes a finite, nonzero amount of time to arrive.

Happens-before is a transitive relation, so if a à b and b à c, then a à c. If two events, x and y, happen in different processes that do not exchange messages (not even indirectly via third parties), then xày is not true, but neither is y à x. These events are said to be concurrent, which simply means that nothing can be said (or need be said) about when the events happened or which event happened first.

The happened-before relation, denoted by à is illustrated for the case of three processes p₁, p₂ and p₃ in figure above. It can be seen that aàb, since the events occur in this order at process p₁ and similarly càd. Furthermore, bàc, since these events are the sending and reception of message m1, and similarly dàf. Combining these relations, we may also say that, for example, aàf.

Happened–Before Relation:

In distributed system, the happened-before relation (denoted: →à) is a relation between the result of two events, such that if one event should happen before another event, the result must reflect that, even if those events are in reality executed out of order (usually to optimize program flow). This involves ordering events based on the potential causal relationship of pairs of events in a concurrent system, specially asynchronous distributed systems. It was formulated by Leslie Lamport.

The happened-before relation is formally defined as the least strict partial order on events such that:

· If a and b are events in the same process, and a occurs before b, then a à b is true.

· If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a à b is also true. A message cannot be received before it is sent, or even at the same time it is sent, since it takes a finite, nonzero amount of time to arrive.

If two events happen in different isolated processes (that do not exchange messages directly or indirectly via third-party processes), then the two processes are said to be concurrent, that is neither aàb nor bàa is true

Vector Clock versus Lamport Clock:

Lamport timestamps and vector clocks are both logical clocks and both provide a total ordering of events consistent with causality. Vector clocks allow us to determine if any two arbitrarily selected events are causally dependent or concurrent. Lamport timestamps cannot do this. Lamport timestamps are more compact.

Both logical clocks allow one to totally order events in a way that is consistent with causality; this is true because every causal dependency results in an increased timestamp. For both clocks, we can assert that if a "happens before" b, then Clock(a) < Clock(b).

Vector clocks take this a step further by allowing one to compare any two events and check to see if they are causally dependent or concurrent. Put another way, not only can we show that if a "happens before" b, then VectorClock(a)<VectorClock(b), we can also assert the converse: if VectorClock(a)<VectorClock(b) then a "happened before" b.

Lamport Clocks:

Back in the late 1970's Leslie Lamport wrote a paper in which he introduced logical clocks. The paper had three main points: firstly, if a distributed system is built up of multiple independent servers, and some of the servers never interact with each other, then there is little theotical need for the non-interacting server clocks to be sychronised since any difference would never be observed. Secondly, in distributed systems, he showed that time is less important than agreement across components of the order of events in which things occur. And finally, he provided algorithms for partial causal ordering and an extension which provides total ordering.

Lamport clocks are event counters that are incremented with every interaction. They are incremented and sent as a part of a message by a sending process and are in turn incremented by a receiving process before it sends it on to the business logic. The value produced by a Lamport clock is a Lamport timestamp.

A Lamport clock offers us a single guarantee – if we have two timestamp values a and b, and when comparing them we see that a < b, then we can guarantee that the clock value of a was smaller than the clock value of b. When different processes have a < b, it means that they all agree that a happened before b.

Lamport clocks cannot tell us if a message was concurrent, and cannot be used to infer causality between events. Vector clocks are a more sophisticated variant which gives us more guarantees, including knowledge of concurrency & causal history.

Vector Clocks

Vector Clocks extend the capabilities of Lamport Clocks to allow us to understand the ordering across multiple processes which cross communicate. They can also be invaluable in understanding the flow of messages in a distributed system.

As a data level, Vector clocks are vectors of event counters. The vectors are stored in positionally consistent offsets within the vectors - each node in the system occupies a fixed position within the vector.

Advantages of vector clock:

· Vector Clocks are used in distributed systems to determine whether pairs of events are causally correlated.

· Using Vector Clocks, timestamps are created for each event in the system, and their fundamental relationship is determined by comparing those timestamps.

Global States:

The global state of a distributed system consists of the local states of its component processes. Any computation that needs to compute the global state at a given time has to read the local states of every component process at that time.

The global state of a distributed system is the set of local states of each individual processes involved in the system plus the state of the communication channel.

Global Snapshot= Global State=

Individual state of each process in the distributed system

Individual state of each communication channel in the distributed system

Global State-

· Capture the instantaneous state of each process

· Capture the instantaneous state of each communication channel, i.e., messages in transit on the channels

It is often desirable to determine whether a particular property is true of a distributed system as it executes. We'd like to use logical time to construct a global view of the system state and determine whether a particular property is true. A few examples are as follows:

Distributed garbage collection: An object is considered to be garbage if there are no longer any references to it anywhere in the distributed system. The memory taken up by that object can be reclaimed once it is known to be garbage. To check that an object is garbage, we must verify that there are no references to it anywhere in the system.

In above figure process p₁ has two objects that both have references – one has a reference within p₁ itself, and p₂has a reference to the other. Process p₂ has one garbage object, with no references to it anywhere in the system. It also has an object for which neither p₁ nor p₂ has a reference, but there is a reference to it in a message that is in transit between the processes. This shows that when we consider properties of a system, we must include the state of communication channels as well as the state of the processes.

Distributed termination detection: The problem here is how to detect that a distributed algorithm has terminated. Detecting termination is a problem that sounds deceptively easy to solve: it seems at first only necessary to test whether each process has halted. To see that this is not so, consider a distributed algorithm executed by two processes p₁ and p₂, each of which may request values from the other. A process is either active or passive – a passive process is not engaged in any activity of its own but is prepared to respond with a value requested by the other. Suppose we discover that p₁is passive and that p₂ is passive (Figure below). To see that we may not conclude that the algorithm has terminated, consider the following scenario: when we tested p₁ for passivity, a message was on its way from p₂, which became passive immediately after sending it. On receipt of the message, p₁ became active again – after we had found it to be passive. The algorithm had not terminated.

Cut in a Distributed System:

Because physical time cannot be perfectly synchronized in a distributed system it is not possible to gather the global state of the system at a particular time. Cuts provide the ability to "assemble a meaningful global state from local states recorded at different times".

Chandy Lamport Algorithm:

Chandy and Lamport [1985] describe a ‘snapshot’ algorithm for determining global states of distributed systems. The goal of the algorithm is to record a set of process and channel states (a ‘snapshot’) for a set of processes pi (i =1 , 2, …., N ) such that, even though the combination of recorded states may never have occurred at the same time, the recorded global state is consistent.

The algorithm records state locally at processes; it does not give a method for gathering the global state at one site.

The algorithm assumes that:

· Neither channels nor processes fail – communication is reliable so that every message sent is eventually received intact, exactly once.

· Channels are unidirectional and provide FIFO-ordered message delivery.

· The graph of processes and channels is strongly connected (there is a path between any two processes).

· Any process may initiate a global snapshot at any time.

The processes may continue their execution and send and receive normal messages while the snapshot takes place.

Clock Synchronization in Distributed System | Process Synchronization in Distributed System

Types of Clock Synchronization

Vector Clocks

Posted by: Learn CS with Eliza

Post a Comment

0 Comments

Popular Posts

C Programming Practical Important Program Questions | C Programming Coding Questions and Answers

Introduction to C | C Language Introduction | Learn C Programming

Search your topic

Labels

Menu Footer Widget

Clock Synchronization in Distributed System | Process Synchronization in Distributed System

Types of Clock Synchronization

Vector Clocks

Posted by: Learn CS with Eliza

You may like these posts

Post a Comment

0 Comments

Popular Posts

C Programming Practical Important Program Questions | C Programming Coding Questions and Answers

Introduction to C | C Language Introduction | Learn C Programming

Search your topic

Labels

Menu Footer Widget