Thursday, September 29, 2011

Review: Megastore

Megastore is a storage system developed to meet the requirements of interactive online services developed at google. It combines the scalability offered by a NoSQL datastore with the convenience of a traditional RDBMS in a novel way that provides a strong ACID semantics. It also provides wide-area replication with Paxos for synchronous wide area replication. The interesting part of MegaStore that may contribute to its success in my opinion is its applicability to a variety of workloads that could be evenly divided between entity groups; For example: emails, etc. Although data partitioning into complex entity groups such as social networking graph will be harder since interactions between individuals are hard to be sorted out.

Review: GFS

Google File System is a scalable distributed file system for large distributed data-intensive applications that has been implemented and used in Google. The goal of the file system has been the same as any previous distributed file system; to achieve a compromise between performance, scalability, reliability & scalability. To do that, they made a different fundamental assumption. Firstly, component failures are the norm than the exception. Secondly, files are very big (In orders of multi-GB). Finally, there are barely any random writes in the system; Thus, most writes exist by appending new data to the system. Architecture-wise, their system is very interesting too. There is a single master with a number of chunkservers. The master node is the application "contact point" for meta-data business such as retrieving the location of the data they would like to retrieve, etc. The paper then goes on to describe on how snapshots are implemented, etc. Some of the things that I'm wondering is that how will the assumptions differ if Google were to build their file system now (eg: Is it going to be also mostly append-only system? Is it only 1 master system?). However, clearly we have seen that GFS has been very successful as it has been utilized as the storage system within Google and their model is very good in the distributed system literature.

Review: Paxos & Chubby Lock Service

Paxos is a consensus protocol in a network of unreliable computers for implementing a fault-tolerant distributed system to ensure that one consistent value is chosen. There are three basic roles for the nodes that are participating in it: proposers, acceptors & learners. Proposers are nodes that are trying to get this "chosen" value across all of the participating nodes, acceptors are nodes that receive proposals and could vote on which proposals should be accepted while learners just learn the correct value that has been voted by the majority of the acceptors.

The following is how the algorithm will work on a typical system. Firstly, we enter the proposal phase in which a proposer selects a value n and sends it to the acceptors. This proposal could be rejected if an acceptor has received a proposal with a value greater than n. Else, it will be accepted. In the second phase, the proposer receives a response to its proposal from a majority of acceptors in which it will response with an accept request to each of those acceptors with a value v where v is the value of the highest-numbered proposal among the responses. In my opinion, the structure of this paper makes Paxos harder to understand than it actually is since the author proves each subsection of the Paxos algorithm before leading up to the overview of the protocol. If the author had given us the overview of the protocol in the beginning, that will be very helpful. However, the simplicity of the Paxos itself makes it, as we know today, very popular and have many papers that are written to further optimize it. So I'd say that this paper is very influential.

The paper "Paxos Made Practical" takes one more step on how to actually implement the algorithm in a real-life situation, how to handle nodes that failed and how to add a state machine to maintain replicas of the data since previous works that claim to use Paxos as a consensus protocol leaves many details unspecified. However, I feel that for this paper, its influential for itself is not that much since it's too specific on one implementation although I guess, it will help people who will like to design a system that utilizes Paxos Protocol.

On the other hand, we have Chubby lock service that is intended to provide coarse grained-locking in a loosely-coupled distributed system (such as one that exists in Google). It is interesting to note that, as the paper explicitly mentioned, Chubby lock service is an engineering effort that claims no new algorithms and techniques, not research. As shown in figure 1 of the paper, Chubby is running distributed consensus protocol (Paxos?) on a small cell of five machines to elect a master in which the master is a "contact point" that receives requests from the client until the master is down or the lease expire. The paper then goes on to describe other features of Chubby; such as how it does backups, mirroring, handling fail-overs, etc.

Thursday, September 22, 2011

Review: Cluster-Based Scalable Network Services

This paper identifies the 3 requirements for an internet service: Scalability, availability & cost-effectiveness although for some internet services, it doesn't need the strong guarantee of ACID, the service may just require high availability of data (ie: using BASE property). It then goes on to describe the cluster architecture which is made of commodity building blocks. As the figure 1 described, the cluster has a pool of workers in whih some might be caches that's being used everytime jobs are being issued. This pool of workers are being load balanced by the manager which collects load information from the workers which asks to the front ends to adjust the current scheduling decision. The front-end mentioned here is an interface to the outside world, such as a HTTP server that matches incoming requests with one or more worker nodes.

This leads to an interesting argument of should we need ACID or BASE? It could be easily seen that if the data is not changed a lot of times, such as the Internet, then the property of BASE is sufficient. However, if it involves something critical such as stock market trading, then ACID may be necessary to be implemented. This paper is really good since as we know today, many of the cloud providers today use a similar infrastructure

Review: Brewer's CAP Conjecture

In an invited talk, Professor Eric Brewer brought up a conjecture that it is impossible to guarantee all 3 of the following: Consistency, Availability & Partition Tolerance. Consistency is defined as whatever the value of a variable is on one node, the value of that variable must be the same across all of the other nodes. Availability means that every request received by a non-failing node in the system must result in a response while partition tolerance means that the network will be allowed to lose arbitrarily many messages sent from one node to another.

The prove itself is pretty direct. Since there's an arbitrary message loss, if you want to have consistency, then you'll need to sacrifice availability since you need some time before the messages propagate through the nodes to make them consistent. If you want to have availability, then, consistency is sacrificed through a similar reasoning.

I feel that there's no need for a formal proof for this paper because the conjecture is pretty direct. This will definitely impact the programmers whenever they want to design a distributed system. Do they want consistency, in the case of something important like a bank transaction? Or do they want availability, where people don't really care about the accurate result, like the search engine? An important point to be noted is that consistency & availability is not something discrete; It's a whole spectrum, which is why we have terms such as eventual consistency and so on. Thus, the programmers need to think harder about the system that they need to design

Wednesday, September 14, 2011

Review: The Datacenter Needs an Operating System

As the title states, this paper is arguing for the need for an OS in the datacenters due to multiple reasons. For resource sharing between programs and users, data sharing between programs through abstractions like pipes, programming abstractions to simplify software developments for applications in data centers and for debugging and monitoring facilities.

The paper is very unique in the sense that it presents a lot of open-ended questions to be answered in cloud computing such as what kind of standardized interfaces that need to be presented in a datacenter OS so that developments could exist in both the platforms and the application developers in the datacenters. The two problems that I think are very important is finding the right abstractions in separating system programs and application (which is still highly relevant in the "traditional" OS today). Secondly, the role of virtualization in this whole datacenter business. The benefit of VMs (virtual migration) is too tantalizing to be missed in humongous distributed systems such as the datacenters.

This paper is highly relevant since as time passes, I feel that cloud is only becoming more and more ubiquitous. Thus, management of programs and frameworks that are running in it is needed which is done by the OS.

Review: Graphic Processing Units (GPUs)

This is the review of chapter 4 of Henessy & Patterson: Computer Architecture, Fifth Edition that presents the internals and details about GPU. Recently, GPU is being more and more put into uses besides sole graphic applications since they provide much better performance for data parallel computation although there are multiple problems that need to be solved before GPU can be used in the mainstream by typical programmers. Firstly, the code that runs GPU looks way different than what runs in CPU. OpenCL tries to mediate that but it has not had any chance of success yet. Related to that, debugging tools in GPU has not matured too and thus, it is very hard to debug applications in GPU unlike the convenient gdb and other debuggers that exist in CPU. Additionally, since we could not virtualize GPU yet (VMware has attempted at it recently, but I'm unclear of the success), it's hard to put GPU into a commercial use.

However, I believe that as time flows, GPUs will be used more and more for applications such as bioinformatics, query processings, etc and that more GPGPUs will exist in the clouds where we could specify where to run our applications at; CPU or GPU.

Review: Performance Modeling and Analysis of Flash-based Storage Devices

This paper talks about utilizing black-box modelling approach to analyze and evaluate SSD performance using various inputs, such as read size, write ratio, read stride, etc and measure the throughput and latency. These experiments are done on 2 different SSDs and 1 HDD and their performance is compared at.

The paper is very detailed since it has a lot of graphs and results that it is presenting to us with various parameters that it's looking at. However, it will be nicer if they could look it from higher-level perspective since instead of just feeding in random reads & writes, look at how it interacts with the current file system implementation, etc.

Things that need to be researched in this area is that how to reduce the cost of SSD because with the current rate, it's hard to deploy it commercially. We also need to consider OS support for SSD since it's fundamentally different from HDD (faster than HDD read, sometimes slower than HDD writes). Improving the Flash Translation Layer needs to be looked at too since we've seen in the paper that having a bad FTL can cause quite a significant performance differnece between SSDs.

Review: Amdahl's Law in the Multicore Era

This short paper talks about application of Amdahl's Law in the Multicore Era. According to the Amdahl's formula, there are multiple ways on speeding up your program. First and most obvious, you can run the serial part faster. Secondly, you could parallelize more of your code (the f term in the equation). There are 3 types of multicore chips that this paper looks at; symmetric, asymmetric & dynamic multicore chips but mostly, asymmetric & dynamic multicore chips are more preferred because they offer more potential speedups that are much greater than symmetric multicore chips. Also, the paper demonstrated that it's more efficient costly & computationally to have heterogeneous, asymmetric systems.

Monday, September 12, 2011

Review: An Introduction to the Design of Warehouse-Scale Machines (part 2)

Chapter 3 of the book talks about the trade-offs of using low-end servers and high-end servers. The text argues that we don't need a server that's too high-end, because the price per performance is too high according to the TPC-C benchmark. In addition to that, slower CPUs are generally more power efficient and thus, you won't need as much cooling system if the datacenters are being build using a lower-end servers (Thus, slower CPU). However, the text continues and state that we don't want a server that's too low because there may be the need to spend a substantial amount of effort to parallelize the code to meet the requirements. Thus, a middle ground that takes the best of both worlds are needed.

Chapter 4 of the book talks about the power consumption of datacenter and the characteristics of its cooling systems. The low-voltage power typically enters the datacenters through Uninterruptible Power System (UPS) systems that also conditions the incoming power to remove abnormalities such as voltage spikes. Then, the text talks about the various techniques that could be utilized for cooling systems. For example, one technique is called in-rack cooler which adds an air-to-water heat exchanger at the back of a rack so that the hot air exiting the servers immediately flows over coils cooled by water. These techniques are very important since datacenters generate a lot of heats.


Chapter 7 of the book talks about failures and repairs in the WSCs. Since hardware failures happens all the time (as the book says, MTBF of 30 years per computer typically results in 1 failure per day in servers), software must be fault-tolerant. The typical cause of failures for WSCs are due to bad network connectivities, application faults and system software faults. It's rarely due to faulty firmwares or kernels. The book then goes on to argue that cheaper hardware generally means that it breaks more often and the cost of reapiring them may even offset the hardware price.

Sunday, September 11, 2011

Review: "Warehouse-Scale Computing: Entering the Teenage Decade"

This is a review of the talk given by Luiz Andre Barroso on the comparison between the current warehouses and those of 10 years ago. He also talks about the various challenges of the current Warehouse Scale Computing (WSC).

Firstly, power management. It is quite a feat manage/regulate the power consumption for a WSC because it's not rare to have fluctuations that may cause an overload which is dangerous. On the other hand, not utilizing the power to its fullest capacity is cost expensive. One solution that Barroso offers is that having UPS per server so as to allow the servers can temporarily produce powers that are above its specs.

Storage is also a major component in WSC. Having a decent storage systems will give the cloud providers an edge over its competitors to provide better user experience in general. With the birth of new classes of storage devices, flash-based ones, that's worth considering. Although it has a very high performance of random reads compared to its traditional hard-drive counterparts, sometimes sequential writes is better performed in traditional hard-drives.

Moreover, the networking aspects of WSC is also essential to be improved upon since having a very high performance storage system but a subpar networking connectivity performance is of no use. On the other hand, providing a high networking bandwidth in the whole datacenter is exorbitant, and thus undesirable.

Review: A Berkeley View of Cloud Computing

This paper aims to clarify the definition of cloud computing, identify the significance of cloud computing and scrutinize its obstacles. Unlike what many influential people think of the cloud (e.g.: Richard Stallman thinks that this is just a marketing scheme; Larry Ellison who basically states that the definition is unclear. ), this paper clarifies it, stating that it refers to both the application delivered as services over the internet and the hardware and systems software in the datacenters that provide those services.

I believe that this problem (ie: cloud computing, itself) is real, since there are a lot of advantages brought about by it. It gives the illusion of infinite computing resources, the elmination of an up-front commitment by cloud users and the ability to pay for use of computing resources. This is really advantageous because new startups will not experience the problem of underprovisioning or overprovisioning of servers which is going to be very detrimental to its future.

This paper also clearly identifies the top 10 problems and its possible solutions that are going to be faced by organizations that are involved in cloud computing. I think this is one of the most interesting part in the papers since the obstacles make sense. For example, availability of service. It's not unreasonable that we have multiple servers down and that will impact the users experience a lot

Review: An Introduction to the Design of Warehouse-Scale Machines (part 1)

Chapter 1 of the book highlights the trends towards Warehouse Scale Computers (WSC) and important considerations that needs to be noted when building one. It begins the chapter by introducing us to the trends toward server-side computings due to the advantages that it offers such as ease of management and faster application development time which brings us to the construction of various WSCs. It then emphasizes the architectural overview of WSCs. For example, there are two general ways to implement the storage part of WSC; Using global distributed file system or that the disk drives could be part of the Network Attached Storage.

Chapter 2 of the book highlights the software stacks that are running in WSC. The book divides the software stacks into three parts. Firstly, we have the platform level software. This is the common operating system part that exist in a server. Secondly, we have cluster-level infrastructures, which could be thought of as the distributed system manager. This is what does the load balancing, health monitoring, data corruption, eventual consistency that exist in the server. Finally, the book talks about the importance of performance debugging tools in cloud computing so as to have a well-performing infrastructures that meet the requirements.