A transaction defines a sequence of server operations that is guaranteed by the server to be atomic in the presence of multiple clients and server crashes.
Transactions originate from database management systems. In that context, a transaction is an execution of a program that accesses a database.
The goal of transactions is to ensure that all of the objects managed by a server remain in a consistent state when they are accessed by multiple transactions and in the presence of server crashes.
A transaction is specified by a client as a set of operations on objects to be performed as an indivisible unit by the servers managing those objects. The servers must guarantee that either the entire transaction is carried out and the results recorded in permanent storage or, in the case that one or more of them crashes, its effects are completely erased.
A client’s transaction is also regarded as indivisible from the point of view of other clients’ transactions in the sense that the operations of one transaction cannot observe the partial effects of the operations of another.
Example:
A client’s banking transaction:
Transaction T:
a.withdraw(100);
b.deposit(100);
c.withdraw(200);
b.deposit(200)
Above example shows a simple client transaction specifying a series of related actions involving the bank accounts A, B and C. The first two actions transfer $100 from A to B and the second two transfer $200 from C to B. A client achieves a transfer operation by doing a withdrawal followed by a deposit.
Desirable Properties of Transactions
Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation, and Durability.
Atomicity:
A transaction must be all or nothing. A transaction either completes successfully or has no effect at all. The atomicity property of transactions requires that the servers participating in a distributed transaction either all commit it or all abort it.
This property states that a transaction is an atomic unit of processing, that is, either it is performed in its entirety or not performed at all. No partial update should exist.
Consistency:
A transaction takes the system from one consistent state to another consistent state.
Isolation:
Concurrent transactions do not interfere with each other. Each transaction must be performed without interference from other transactions; in other words, the intermediate effects of a transaction must not be visible to other transactions. There should not be any interference from the other concurrent transactions that are simultaneously running.
Durability:
After a transaction has completed successfully, all its effects are saved in permanent storage. We use the term ‘permanent storage’ to refer to files held on disk or another permanent medium. Data saved in a file will survive if the server process crashes. Once a transaction commits, the changes are permanent.
Introduction to Distributed Transactions:
Distributed transactions involve more than one server. A distributed transaction is any transaction whose activity involves several different servers. Distributed transactions may be either flat or nested.
A client transaction becomes distributed if it invokes operations in several different servers. There are two different ways that distributed transactions can be structured: as flat transactions and as nested transactions.
In a flat transaction, a client makes requests to more than one server. A flat client transaction completes each of its requests before going on to the next one. Therefore, each transaction accesses servers’ objects sequentially. When servers use locking, a transaction can only be waiting for one object at a time.
In a nested transaction, the top-level transaction can open sub-transactions, and each sub-transaction can open further sub-transactions down to any depth of nesting.
When a distributed transaction comes to an end, the atomicity property of transactions requires that either all of the servers involved commit the transaction or all of them abort the transaction. To achieve this, one of the servers takes on a coordinator role, which involves ensuring the same outcome at all of the servers. The manner in which the coordinator achieves this depends on the protocol chosen.
Concurrency control in distributed transactions is based on the methods: Locking, Timestamp Ordering and Optimistic concurrency control. Each server manages a set of objects and is responsible for ensuring that they remain consistent when accessed by concurrent transactions. Therefore, each server is responsible for applying local concurrency control to its own objects. The members of a collection of servers of distributed transactions are jointly responsible for ensuring that they are performed in a serially equivalent manner.
Flat and Nested Distributed Transactions:
A client transaction becomes distributed if it invokes operations in several different servers. There are two different ways that distributed transactions can be structured: as flat transactions and as nested transactions.
Flat Transaction:
In a flat transaction, a client makes requests to more than one server. A flat client transaction completes each of its requests before going on to the next one. Therefore, each transaction accesses servers’ objects sequentially. When servers use locking, a transaction can only be waiting for one object at a time.
Fig: Flat Transaction
Nested Transaction:
Nested transactions are structured from sets of other transactions. They are particularly useful in distributed systems because they allow additional concurrency.
Nested transactions are formed by structuring transactions from other sub-transactions. Nesting is particularly useful in distributed systems because it allows concurrent execution of sub-transactions in separate servers. Nesting also has the advantage of allowing independent recovery of parts of a transaction.
In a nested transaction, the top-level transaction can open sub-transactions, and each sub-transaction can open further sub-transactions down to any depth of nesting.
The outermost transaction in a set of nested transactions is called the top-level transaction. Transactions other than the top-level transaction are called sub-transactions.
A sub-transaction appears atomic to its parent with respect to transaction failures and to concurrent access. Each sub-transaction can fail independently of its parent and of the other sub-transactions. When a sub-transaction aborts, the parent transaction can sometimes choose an alternative sub-transaction to complete its task.
1. Sub-transactions at one level (and their descendants) may run concurrently with other sub-transactions at the same level in the hierarchy. This can allow additional concurrency in a transaction. When sub-transactions run in different servers, they can work in parallel.
2. Sub-transactions can commit or abort independently. In comparison with a single transaction, a set of nested sub-transactions is potentially more robust.
Atomic
Commit Protocols:
An atomic commit protocol is a cooperative procedure used
by a set of servers involved in a distributed transaction. It enables the
servers to reach a joint decision as to whether a transaction can be committed or aborted.
Transaction commit protocols were devised in the early
1970s, and the two-phase commit protocol appeared in Gray [1978]. The atomicity
property of transactions requires that when a distributed transaction comes to
an end, either all of its operations are carried out or none of them.
In the case of a distributed transaction, the client has
requested operations at more than one server. A transaction comes to an end
when the client requests that it be committed or aborted. A simple way to
complete the transaction in an atomic manner is for the coordinator to
communicate the commit or abort request to all of the participants in the
transaction and to keep on repeating the request until all of them have
acknowledged that they have carried it out. This is an example of a one-phase
atomic commit protocol.
This simple one-phase atomic commit protocol is
inadequate, though, because it does not allow a server to make a unilateral
decision to abort a transaction when the client requests a commit. Reasons that
prevent a server from being able to commit its part of a transaction generally
relate to issues of concurrency control.
For example, if locking is in use, the resolution of a
deadlock can lead to the aborting of a transaction without the client being
aware unless it makes another request to the server. Also if optimistic
concurrency control is in use, the failure of validation at a server would
cause it to decide to abort the transaction. Finally, the coordinator may not
know if a server has crashed and been replaced during the progress of a
distributed transaction – such a server will need to abort the transaction.
Two-Phase Commit Protocol:
Two phase commit protocol is the most commonly used
atomic commit protocol. This protocol allows the servers to communicate with
one another to reach a joint decision as to whether to commit or abort.
The two-phase commit protocol is designed to allow any
participant to abort its part of a transaction. Due to the requirement for
atomicity, if one part of a transaction is aborted, then the whole transaction
must be aborted.
In the first phase of the two-phase commit protocol the
coordinator asks all the participants if they are prepared to commit; in the
second phase, it tells them to commit (or abort) the transaction. If a participant
can commit its part of a transaction, it will agree as soon as it has recorded
the changes it has made (to the objects) and its status in permanent storage
and is therefore prepared to commit.
Phase 1 (voting phase):
1. The coordinator sends a canCommit? request to each of the
participants in the transaction.
2. When a participant receives a canCommit? request it replies with its
vote (Yes or No) to the coordinator.
Before voting Yes, it prepares to
commit by saving objects in permanent storage. If the vote is No, the participant aborts immediately.
Phase
2 (completion according to outcome of vote):
3. The coordinator collects the votes
(including its own).
a. If there are no failures and all the
votes are Yes, the coordinator
decides to commit the transaction and sends a doCommit request to each of the participants.
b. Otherwise, the coordinator decides to
abort the transaction and sends doAbort
requests to all participants that voted Yes.
4. Participants that voted Yes are waiting for a doCommit or doAbort request from the coordinator. When a participant receives
one of these messages it acts accordingly and, in the case of commit, makes a haveCommitted call as confirmation to
the coordinator.
The two-phase commit protocol consists of a voting phase and a completion phase, as shown above. By the end of step 2, the coordinator and all the participants that voted Yes are prepared to commit. By the end of step 3, the transaction is effectively completed. At step 3a the coordinator and the participants are committed, so the coordinator can report a decision to commit to the client. At step 3b the coordinator reports a decision to abort to the client. At step 4 participants confirm that they have committed so that the coordinator knows when the information it has recorded about the transaction is no longer needed.
Two phase commit protocol has a blocking disadvantage in which either the co-ordinator or some participating site is blocked, Three phase commit protocol was introduced as a remedy to the blocking disadvantage of two phase commit protocol. It introduces an extra phase which ensures the non blocking property of this protocol.
Three Phase Commit Protocol:
Three phase commit protocol is used for concurrency control in distributed systems. It is an extension of two phase commit protocol. It was introduced as a remedy to the blocking disadvantage of two phase commit protocol. This protocol has three phases—
Phase 1 (Voting
Phase):
At first the site at which the transaction originates becomes the coordinator and it asks the other sites to vote to either commit or abort . The other sites send their votes. If all sites have voted to commit the transaction, it decides to commit the transaction and if even if one of the sites has voted to abort the transaction it decides to abort.
Phase 2 (Prepare to commit):
The coordinator tells its decision to all of the sites. If it has decided to commit then ―Enter into ready to commit stage‖ message is sent.
Phase 3 (Decision Phase):
If the coordinator has decided to commit the transaction it sends a global_commit to all sites and waits for their acknowledgement. Only after receiving acknowledgement it decides to commit the transaction. If the coordinator has decided to abort the transaction it sends global_abort to all the sites and aborts the transaction. Only after receiving the acknowledgement it decides the fate of the transaction.
Concurrency Control in Distributed Transactions:
Each server manages a set of objects and is responsible
for ensuring that they remain consistent when accessed by concurrent
transactions. Therefore, each server is responsible for applying local concurrency control to its own objects.
The members of a collection of servers of distributed transactions are jointly
responsible for ensuring that they are performed in a serially equivalent
manner.
This implies that if transaction T is before transaction
U in their conflicting access to objects at one of the servers, then they must
be in that order at all of the servers whose objects are accessed in a
conflicting manner by both T and U.
All of the concurrency control protocols are based on the
criterion of serial equivalence and are derived from rules for conflicts
between operations.
Concurrency control in distributed transaction is based
on the three methods:
· Locking:
Locks are used to order transactions that access the same
objects according to the order of arrival of their operations at the objects.
· Timestamp
ordering concurrency control: Timestamp ordering uses timestamps to order transactions that
access the same objects according to their starting times.
· Optimistic
concurrency control: Optimistic
concurrency control allows transactions to proceed until they are ready to
commit, whereupon a check is made to see whether they have performed
conflicting operations on objects.
Locking:
In a distributed transaction, the locks on an object are
held locally (in the same server). The local lock manager can decide whether to
grant a lock or make the requesting transaction wait. However, it cannot
release any locks until it knows that the transaction has been committed or
aborted at all the servers involved in the transaction. When locking is used
for concurrency control, the objects remain locked and are unavailable for
other transactions during the atomic commit protocol, although an aborted
transaction releases its locks after phase-1 of the protocol.
As lock managers in different servers set their locks
independently of one another, it is possible that different servers may impose
different orderings on transactions.
Timestamp
Ordering Concurrency Control:
In a single server transaction, the coordinator issues a
unique timestamp to each transaction when it starts. In distributed transactions, we require that
each coordinator issue globally unique timestamps. A globally unique
transaction timestamp is issued to the client by the first coordinator accessed
by a transaction. The transaction timestamp is passed to the coordinator at
each server whose objects perform an operation in the transaction.
The servers of distributed transactions are jointly
responsible for ensuring that they are performed in a serially equivalent
manner. For example, if the version of an object accessed by transaction U
commits after the version accessed by T at one server, if T and U access the
same object as one another at other servers they must commit them in the same
order.
When timestamp ordering is used for concurrency control,
conflicts are resolved as each operation is performed. If the resolution of a
conflict requires a transaction to be aborted, the coordinator will be informed
and it will abort the transaction at all the participants.
Optimistic
Concurrency Control:
With optimistic concurrency control, each transaction is
validated before it is allowed to commit. Transaction numbers are assigned at
the start of validation and transactions are serialized according to the order
of the transaction numbers. A distributed transaction is validated by a
collection of independent servers, each of which validates transactions that
access its own objects. This validation takes place during the first phase of
the two-phase commit protocol.
In distributed optimistic transactions, each server
applies a parallel validation protocol. This is an extension of either backward
or forward validation to allow multiple transactions to be in the validation
phase at the same time.
Deadlocks:
A deadlock is a state where a set of processes request resources that
are held by other processes in the set. The use of locks can lead to deadlock. Deadlock is a
particularly common situation when clients are involved in an interactive
program, for a transaction in an interactive program may last for a long period
of time. This can result in many objects being locked and remaining so, thus
preventing other clients using them.
Deadlock is a state in which each member of a group of
transactions is waiting for some other member to release a lock. A wait-for
graph can be used to represent the waiting relationships between current
transactions.
In a wait-for graph the nodes represent transactions and the edges represent wait-for relationships between transactions – there is an edge from node T to node U when transaction T is waiting for transaction U to release a lock. A system is deadlocked if and only if there exists a directed cycle or knot in the WFG.
There are three
strategies for handling deadlocks, viz., deadlock prevention, deadlock
avoidance, and deadlock detection.
Deadlock
Prevention:
A simple but not
very good way to overcome the deadlock problem is to lock all of the objects
used by a transaction when it starts. This would need to be done as a single
atomic step so as to avoid deadlock at this stage. Such a transaction cannot
run into deadlocks with other transactions, but this approach unnecessarily
restricts access to shared resources. In addition, it is sometimes impossible
to predict at the start of a transaction which objects will be used. This is
generally the case in interactive applications, for the user would have to say
in advance exactly which objects they were planning to use – this is
inconceivable in browsing-style applications, which allow users to find objects
they do not know about in advance.
Deadlocks can also be prevented by requesting locks on
objects in a predefined order, but this can result in premature locking and a
reduction in concurrency
Deadlock
Detection:
Distributed
Deadlocks:
Deadlock is a fundamental problem in distributed systems. A deadlock is a state where a set of processes request resources that are held by other processes in the set. Deadlocks can arise within a single server when locking is used for concurrency control. Servers must either prevent or detect and resolve deadlocks. With deadlock detection schemes, a transaction is aborted only when it is involved in a deadlock. Most deadlock detection schemes operate by finding cycles in the transaction wait-for graph.
The use of locking schemes can lead to distributed deadlocks. In a distributed
system involving multiple servers being accessed by multiple transactions, a
global wait-for graph can be constructed from the local ones. There can be a
cycle in the global wait-for graph
that is not in any single local one – that is, there can be a distributed deadlock.
Detection of a distributed deadlock requires a cycle to
be found in the global transaction wait-for graph that is distributed among the
servers that were involved in the transactions.
The wait-for graph
is a directed graph in which nodes represent transactions and objects, and
edges represent either an object held by a transaction or a transaction waiting
for an object. There is a deadlock if and only if there is a cycle in the
wait-for graph.
There are three strategies for handling deadlocks, viz., deadlock prevention, deadlock avoidance, and deadlock detection.
Handling of deadlock becomes highly complicated in distributed systems because no site has accurate knowledge of the current state of the system and because every inter-site communication involves a finite and unpredictable delay.
Deadlock prevention is commonly achieved either by having a process acquire all the needed resources simultaneously before it begins executing or by preempting a process which holds the needed resource. This approach is highly inefficient and impractical in distributed systems.
In deadlock avoidance approach to distributed systems, a resource is granted to a process if the resulting global system state is safe (note that a global state includes all the processes and resources of the distributed system). However, due to several problems, deadlock avoidance is impractical in distributed systems.
Deadlock detection requires examination of the status of process-resource interactions for presence of cyclic wait. Deadlock detection in distributed systems seems to be the best approach to handle deadlocks in distributed systems.
Deadlock handling using the approach of deadlock detection entails addressing two basic issues: First, detection of existing deadlocks and second resolution of detected deadlocks.
Detection of deadlocks involves addressing two issues: Maintenance of the WFG and searching of the WFG for the presence of cycles (or knots).
Phantom
Deadlocks:
A deadlock that is ‘detected’ but is not really a
deadlock is called a phantom deadlock.
In distributed deadlock detection, information about wait-for relationships
between transactions is transmitted from one server to another. If there is a
deadlock, the necessary information will eventually be collected in one place
and a cycle will be detected. As this procedure will take some time, there is a
chance that one of the transactions that holds a lock will meanwhile have
released it, in which case the deadlock will no longer exist.
Edge
Chasing Algorithm:
A distributed approach to deadlock detection uses a
technique called edge chasing or path pushing. In this approach, the global wait-for graph is not
constructed, but each of the servers involved has knowledge about some of its
edges. The servers attempt to find cycles by forwarding messages called probes, which follow the edges of the
graph throughout the distributed system. A probe message consists of
transaction wait-for relationships representing a path in the global wait-for
graph.
Edge-Chasing Algorithms have three steps- initiation, detection and resolution:
Initiation:
When a server notes that a transaction T starts waiting for another transaction U, where U is waiting to access an object at another server, it initiates detection by sending a probe containing the edge <TÃ U> to the server of the object at which transaction U is blocked. If U is sharing a lock, probes are sent to all the holders of the lock. Sometimes further transactions may start sharing the lock later on, in which case probes can be sent to them too.
Detection consists of receiving probes and deciding whether a deadlock has occurred and whether to forward the probes.
For example, when a server of an object receives a probe < TÃ U >(indicating that T is waiting for a transaction U that holds a local object), it checks to see whether U is also waiting. If it is, the transaction it waits for (for example, V) is added to the probe (making it <TÃ UÃ V>) and if the new transaction (V) is waiting for another object elsewhere, the probe is forwarded. In this way, paths through the global wait-for graph are built one edge at a time. Before forwarding a probe, the server checks to see whether the transaction (for example, T) it has just added has caused the probe to contain a cycle (for example, <TÃ U Ã VÃ T>). If this is the case, it has found a cycle in the graph and a deadlock has been detected.
Resolution:
0 Comments
if you have any doubts plz let me know...