|How does cloud computing provides on-demand functionality?
Cloud computing is a metaphor used for internet. It provides on-demand access to virtualized IT resources that can be shared by others or subscribed by you. It provides an easy way to provide configurable resources by taking it from a shared pool. The pool consists of networks, servers, storage, applications and services.What is the difference between scalability and elasticity?
Scalability is a characteristic of cloud computing through which increasing workload can be handled by increasing in proportion the amount of resource capacity. It allows the architecture to provide on demand resources if the requirement is being raised by the traffic. Whereas, elasticity is being one of the characteristic provide the concept of commissioning and decommissioning of large amount of resource capacity dynamically. It is measured by the speed by which the resources are coming on demand and the usage of the resources.What are the different layers of cloud computing?
Cloud computing consists of 3 layers in the hierarchy and these are as follows:
1. Infrastructure as a Service (IaaS) provides cloud infrastructure in terms of hardware like memory, processor speed etc.
2. Platform as a Service (PaaS) provides cloud application platform for the developers.
3. Software as a Service (SaaS) provides cloud applications which are used by the user directly without installing anything on the system. The application remains on the cloud and it can be saved and edited in there only.
What resources are provided by infrastructure as a service?
Infrastructure as a Service provides physical and virtual resources that are used to build a cloud. Infrastructure deals with the complexities of maintaining and deploying of the services provided by this layer. The infrastructure here is the servers, storage and other hardware systems.
How important is platform as a service?
Platform as a Service is an important layer in cloud architecture. It is built on the infrastructure model, which provides resources like computers, storage and network. This layer includes organizing and operate the resources provided by the below layer. It is also responsible to provide complete virtualization of the infrastructure layer to make it look like a single server and keep it hidden from the outside world.
What does software as a service provide?
Software as Service is another layer of cloud computing, which provides cloud applications like google is doing, it is providing google docs for the user to save their documents on the cloud and create as well. It provides the applications to be created on fly without adding or installing any extra software component. It provides built in software to create wide varieties of applications and documents and share it with other people online.
What are the different deployment models?
Cloud computing supports many deployment models and they are as follows:
– Private Cloud
Organizations choose to build there private cloud as to keep the strategic, operation and other reasons to themselves and they feel more secure to do it. It is a complete platform which is fully functional and can be owned, operated and restricted to only an organization or an industry. More organizations have moved to private clouds due to security concerns. Virtual private cloud is being used that operate by a hosting company.
– Public Cloud
These are the platforms which are public means open to the people for use and deployment. For example, google, amazon etc. They focus on a few layers like cloud application, infrastructure providing and providing platform markets.
– Hybrid Clouds
It is the combination of public and private cloud. It is the most robust approach to implement cloud architecture as it includes the functionalities and features of both the worlds. It allows organizations to create their own cloud and allow them to give the control over to someone else as well.
What are the different datacenters deployed for this?
Cloud computing is made up of various datacenters put together in a grid form. It consists of different datacenters like:
– Containerized Datacenters
These are the traditional datacenters that allow high level of customization with servers, mainframe and other resources. It requires planning, cooling, networking and power to access and work.
– Low-Density Datacenters
These datacenters are optimized to give high performance. In these datacenters the space constraint is being removed and there is an increased density. It has a drawback that with high density the heat issue also creeps in. These datacenters are very much suitable to develop the cloud infrastructure.
What is the use of APIï¿½s in cloud services?
API stands for Application programming interface is very useful in cloud platforms as it allows easy implementation of it on the system. It removes the need to write full fledged programs. It provides the instructions to make the communication between one or more applications. It also allows easy to create application with ease and link the cloud services with other systems.
What are the different modes of software as a service?
Software as a Service provides cloud application platform on which user can create application with the tools provided. The modes of software as a service are defined as:
1. Simple multi-tenancy: in this each user has its own resources that are different from other users. It is an inefficient mode where the user has to put more time and money to add more infrastructure if the demand rises in less time to deliver.
2. Fine grain multi-tenancy: in this the functionality remains the same that the resources can be shared to many. But it is more efficient as the resources are shared not the data and permission within an application.
What is the security aspects provided with cloud?
Security is one of the major aspects which come with any application and service used by the user. Companies or organizations remain much more concerned with the security provided with the cloud. There are many levels of security which has to be provided within cloud environment such as:
– Identity management: it authorizes the application service or hardware component to be used by authorized users.
– Access control: permissions has to be provided to the users so that they can control the access of other users who are entering the in the cloud environment.
– Authorization and authentication: provision should be made to allow the authorized and authenticated people only to access and change the applications and data.
What is the difference between traditional datacenters and cloud?
Cloud computing uses the concept of datacenter as it is the datacenter is based on the tradition one so the difference between them are as follows:
– Cost of the traditional datacenter is higher, due to heating issues and other hardware/software related issues but this is not the case with the cloud computing infrastructure.
– It gets scaled when the demand increases. Most of the cost is being spent on the maintenance being performed on the datacenters, whereas cloud platform requires minimum maintenance and not very expert hand to handle them.
What are the three cost factors involves in cloud data center?
Cloud data center doesn’t require experts to operate it, but it requires skilled people to see the maintenance, maintain the workloads and to keep the track of the traffic. The labor cost is 6% of the total cost to operate the cloud data center. Power distribution and cooling of the datacenter cost 20% of the total cost. Computing cost is at the end and is the highest as it is where lots of resources and installation has to be done. It costs the maximum left percentage.
How the cloud services are measured?
What are the optimizing strategies used in cloud?
To optimize the cost and other resources there is a concept of three-data-center which provides backups in cases of disaster recovery and allows you to keep all the data intact in the case of any failure within the system. System management can be done more efficiently by carrying out pre-emptive tasks on the services and the processes which are running for the job. Security can be more advanced to allow only the limited users to access the services.
What are different data types used in cloud computing?
Cloud computing is going all together for a different look as it now includes different data types like emails, contracts, images, blogs, etc. The amount of data increasing day by day and cloud computing is requiring new and efficient data types to store them. For example if you want to save video then you need a data type to save that. Latency requirements are increasing as the demand is increasing. Companies are going for lower latency for many applications.
What are the security laws which take care of the data in the cloud?
The security laws which are implements to secure data in the cloud are as follows: Input validation: controls the input data which is being to any system. Processing: control that the data is being processed correctly and completely in an application. File: control the data being manipulated in any type of file. Output reconciliation: control the data that has to be reconciled from input to output. Backup and recovery: control the security breaches logs and the problems which has occurred while creating the back.
How to secure your data for transport in cloud?
Cloud computing provides very good and easy to use feature to an organization, but at the same time it brings lots of question that how secure is the data, which has to be transported from one place to another in cloud. So, to make sure it remains secure when it moves from point A to point B in cloud, check that there is no data leak with the encryption key implemented with the data you sending.
What do you understand from VPN?
VPN stands for virtual private network; it is a private cloud which manages the security of the data during the transport in the cloud environment. VPN allows an organization to make a public network as private network and use it to transfer files and other resources on a network.
What does a VPN consists of?
VPN is known as virtual private network and it consists of two important things:
1. Firewall: it acts as a barrier between the public network and any private network. It filters the messages that are getting exchanged between the networks. It also protects from any malicious activity being done on the network.
2. Encryption: it is used to protect the sensitive data from professional hackers and other spammers who are usually remain active to get the data. With a message always there will be a key with which you can match the key provided to you.
Name few platforms which are used for large scale cloud computing
There are many platforms available for cloud computing but to model the large scale distributed computing the platforms are as follows:
1. MapReduce: is software that is being built by Google to support distributed computing. It is a framework that works on large set of data. It utilizes the cloud resources and distributes the data to several other computers known as clusters. It has the capability to deal with both structured and non-structured data.
2. Apache Hadoop: is an open source distributed computing platform. It is being written in Java. It creates a pool of computer each with hadoop file system. It then clusters the data elements and applies the hash algorithms that are similar. Then it creates copy of the files that already exist.
What are some examples of large cloud providers and their databases?
Cloud computing has many providers and it is supported on the large scale. The providers with their databases are as follows:
– Google bigtable: it is a hybrid cloud that consists of a big table that is spilt into tables and rows. MapReduce is used for modifying and generating the data.
– Amazon SimpleDB: is a webservice that is used for indexing and querying the data. It allows the storing, processing and creating query on the data set within the cloud platform. It has a system that automatically indexes the data.
– Cloud based SQL: is introduced by Microsoft and it is based on SQL database. it provides data storage by the usage of relational model in the cloud. The data can be accessed from the cloud using the client application.
What are some open source cloud computing platform databases?
Cloud computing platform has various databases that are in support. The open source databases that are developed to support it is as follows:
1. MongoDB: is an open source database system which is schema free and document oriented database. It is written in C++ and provides tables and high storage space.
2. CouchDB: is an open source database system based on Apache server and used to store the data efficiently
3. LucidDB: is the database made in Java/C++ for data warehousing. It provides features and functionalities to maintain data warehouse.
What essential things a user should know before going for cloud computing platform?
A user should know some parameters by which he can go for the cloud computing services. The parameters are as follows:
1. User should know the data integrity in cloud computing: It is a measure to ensure integrity like the data is accurate, complete and reasonable.
2. Compliance: user should make sure that proper rules and regulations are followed while implementing the structure.
3. Loss of data: user should know about the provisions that are provided in case of loss of data so that backup and recovery can be possible.
4. Business continuity plans: user should think about does the cloud services provide him uninterrupted data resources.
5. Uptime: user should know about the uptime the cloud computing platform provides and how helpful it is for the business.
6. Data storage costs: user should find out about the cost which you have to pay before you go for cloud computing.
What are system integrators?
Systems integrators are the important part of cloud computing platform. It provides the strategy of the complicated process used to design a cloud platform. It includes well defined architecture to find the resources and the characteristics which have to be included for cloud computing. Integrators plan the users cloud strategy implementation. Integrators have knowledge about data center creation and also allow more accurate private and hybrid cloud creation.
What is the requirement of virtualization platforms in implementing cloud?
Virtualization is the basis of the cloud computing and there are many platforms that are available like VMware is a technology that provides the provision to create private cloud and provide a bridge to connect external cloud with private cloud. There are three key features that have to be identified to make a private cloud that is:
– Cloud operating system.
– Manage the Service level policies
– Virtualization keeps the user level and the backend level concepts different from each other so that a seamless environment can be created between both.
What is the use of eucalyptus in cloud computing environment?
Eucalyptus stands for Elastic Utility Computing Architecture for Linking Your Programs to Useful Systems and provides an open source software infrastructure to implement clusters in cloud computing platform. It is used to build private, public and hybrid clouds. It can also produce your own datacenter into a private cloud and allow you to extend the functionality to many other organizations. Eucalyptus provides APIs to be used with the web services to cope up with the demand of resources used in the private clouds.
Explain different layers which define cloud architecture
Cloud computing architecture consists of many layers which help it to be more organized and can be managed from one place. The layers are as follows:
1. Cloud controller or CLC is the top most level in the hirerachy which is used to manage the virtualized resources like servers, network and storage with the user APIs.
2. Walrus is used for the storage and act as a storage controller to manage the demands of the users. It maintains a scalable approach to control the virtual machine images and user data.
3. Cluster Controller or CC is used to control all the virtual machines for executions the virtual machines are stored on the nodes and manages the virtual networking between Virtual machines and external users.
4. Storage Controller or SC provides a storage area in block form that are dynamically attached by Virtual machines.
5. Node Controller or NC is at the lowest level and provides the functionality of a hypervisor that controls the VMs activities, which includes execution, management and termination of many instances.
How user will gain from utility computing?
Utility computing allow the user to pay per use means whatever they are using only for that they have to pay. It is a plug in that needs to be managed by the organizations on deciding what type of services has to be deployed from the cloud. Utility computing allows the user to think and implement the services according to them. Most organizations go for hybrid strategy that combines internal delivered services that are hosted or outsourced services.
Is there any difference in cloud computing and computing for mobiles?
Mobile cloud computing uses the same concept but it just adds a device of mobile. Cloud computing comes in action when a task or a data get kept on the internet rather then individual devices. It provides users on demand access to the data which they have to retrieve. Applications run on the remote server, and then given to the user to be able to, store and manage it from the mobile platform.
Cloud Computing Amazon Interview Questions and Answers
What are the different components used in AWS?
The components that are used in AWS are:
1. Amazon S3: it is used to retrieve input data sets that are involved in making a cloud architecture and also used to store the output data sets that is the result of the input.
2. Amazon SQS: it is used for buffering requests that is received by the controller of the Amazon. It is the component that is used for communication between different controllers.
3. Amazon SimpleDB: it is used to store intermediate status log and the tasks that are performed by the user/
4. Amazon EC2: it is used to run a large distributed processing on the Hadoop cluster. It provides automatic parallelization and job scheduling.
What are the uses of Amazon web services?
Amazon web services consist of a component called as Amazon S3 that acts as a input as well used as an output data store. It is used in checking the input and according to that gives the output. The input consists of the web that is stored on Amazon S3 as object and it is update frequently to make the changes in the whole architecture. It is required due to the on demand growing of the data set and to provide persistent storage.
How to use Amazon SQS?
Amazon SQS is a message passing mechanism that is used for communication between different connectors that are connected with each other. It also acts as a communicator between various components of Amazon. It keeps all the different functional components together. This functionality helps different components to be loosely coupled, and provide an architecture that is more failure resilient system.
How buffer is used in Amazon web services?
Buffer is used to make the system more resilient to burst of traffic or load by synchronizing different component. The components always receive and process the requests in unbalanced way. Buffer keeps the balance between different components and makes them work at the same speed to provide faster services.
What is the need of the feature isolation in Amazon web services?
Isolation provides a way to hide the architecture and gives an easy and convenient way to the user to use the services without any difficulty. When a message is passed between two controllers then a queue is maintained to keep the message. No controller calls any other controller directly. The communication takes place between the controllers by storing their messages in a queue. It is a service that provides a uniform way to transfer the messages between different application components. This way all the controllers are kept isolated from each other.
What is the function of a Amazon controller?
The functions that are involved with an Amazon controller are:
– Controllers are used to control the flow in which the messages between the other system components has to be passed.
– It controls the overall structure of the Amazon and all to retrieve the message, process the message, execute a function and store the message in other queue that are completely isolated from other controllers.
– It manages and monitors the messages passed between the systems.
What is the function of Amazon Elastic Compute Cloud?
Amazon Elastic compute cloud is also known as Amazon EC2 is an Amazon web service that provides scalable resources and makes the computing easier for developers. The main functions of Amazon EC2 are:
– It provides easy configurable options and allow user to configure the capacity.
– It provides the complete control of computing resources and let the user run the computing environment according to his requirements.
– It provides a fast way to run the instances and quickly book the system hence reducing the overall time.
– It provides scalability to the resources and changes its environment according to the requirement of the user.
– It provides varieties of tools to the developers to build failure resilient applications.
What are the different types of instances used in Amazon EC2?
The instances that can be used in Amazon EC2 are:
1. Standard Instances: It provides small instances, large instances, extra large instances that give various configuration options from low range to very high range like Computing power unit, memory, processor, etc.
2. Micro Instances: It provides small consistent resources like CPU, memory and computing unit. It provides the resources to the applications that consume less amount of computing unit.
3. High Memory Instances: It provides large memory sizes for high end application and it includes memory caching applications as well.
What are cluster compute instances?
The cluster compute instances consist of the high CPU with network performance and are suited with high end applications. It provides network bound application and provide extra large computing resources like 23 GB memory, 33.5 EC2 compute units. It provide general purpose graphics unit to allow user with high end configuration. It also provide highly parallelized processing application that user can use and modify the server accordingly.
How to use SimpleDB with Amazon?
Every architecture rely on a database that is easy to maintain and gets easily configured Amazon uses the database by the name SimpleDB. This is the database that is used for cloud architecture to track the statuses of the components. The component of the system are asynchronous and discrete, it requires capturing the state of the system so that in any failure the user can easily revert back to its normal configuration. SimpleDB is schema-less database and there is no need to define the structure before the creation of any data. Every controller in the database defines their own structure and link the data to a job.
How does component services used for Amazon SimpleDB?
Component services allow the controllers to independently store the states of the virtual machines and the database that is in use. It creates asynchronous highly available services. It stores active requests according to the unique ID that are associated with each system. It stores the status of the entire database that is having different states for different components in a log database file.
How to upload files in Amazon S3?
Amazon S3 provides uploading of large files and retrieve small offsets for end-to-end transfer data rates. The large file gets stored into small files that are smaller in size. Amazon S3 stores multiple of files together in a bundle or in a compressed form for example in .gzip or .gz format and then convert them into Amazon S3 objects. The files get uploaded on the Amazon server by the use of FTP or another protocol and then retrieved through the HTTP GET request. The request includes the defined parameters like URL, offset (byte-range) and size (length).
What is the use of multi-threaded fetching in Amazon S3?
– Multi-threading fetching in Amazon S3 is used to fetch the objects concurrently using the multiple threads and map the task so that fetching can be made simpler.
– It is not a good practice to increase the threading for a particular object as every node on the server has some bandwidth constraints.
– It provides user the ease with which they can upload the files and upload the threads in parallel.
– It provides high speed of data transfer and easy maintenance of the sever as well.
What is the difference between on demand and reserved instances?
– On demand instance allow user to pay for the computing capacity according to their use every hour, whereas reserved instances provide user to pay for every instance which they use and they want to reserve.
– On demand instance provide user a free working environment in which there is no need for too much of planning related to complexities, whereas reserved instances provide user with discounts on the hourly charge of an instance and provide a easy way to manage the instances as well.
– On demand instance provide maintenance of hardware and transforms fixed cost into much smaller variable costs, whereas reserved instance provide easy way to balance the pay package.
What are the provisions provided by Amazon Virtual Private cloud?
Amazon private cloud provides a provision to create a private and isolated networking infrastructure to give easily the Amazon web services. – Virtual network topologies define the traditional data-center approach to control and mange the files from one place.
– It provides complete control over IP address range, creation of sub-nets and configuring the network gateways and route tables.
– It provides easy to customize network configuration like creation of public sub-net to access the Internet easily.
– It allow to create multiple security layers and provide network control list by which you can control the access to Amazon EC2 instances.
Cloud Computing Architecture Interview Questions and Answers
What is the use of defining cloud architecture?
Cloud architecture is a software application that uses on demand services and access pool of resources from the cloud. Cloud architecture act as a platform on which the applications are built. It provides the complete computing infrastructure and provides the resources only when it is required. It is used to elastically scale up or down the resources according to the job that is being performed.
How does cloud architecture overcome the difficulties faced by traditional architecture?
Cloud architecture provide large pool of dynamic resources that can be accessed any time whenever there is a requirement, which is not being given by the traditional architecture. In traditional architecture it is not possible to dynamically associate a machine with the rising demand of infrastructure and the services. Cloud architecture provides scalable properties to meet the high demand of infrastructure and provide on-demand access to the user.
What are the three differences that separate out cloud architecture from the tradition one?
The three differences that make cloud architecture in demand are:
1. Cloud architecture provides the hardware requirement according to the demand. It can run the processes when there is a requirement for it.
2. Cloud architecture is capable of scaling the resources on demand. As, the demand rises it can provide infrastructure and the services to the users.
3. Cloud architecture can manage and handle dynamic workloads without failure. It can recover a machine from failure and always keep the load to a particular machine to minimum.
What are the advantages of cloud architecture?
Cloud architecture uses simple APIs to provide easily accessible services to the user through the internet medium. It provides scale on demand feature to increase the industrial strength. It provides the transparency between the machines so that users don’t have to worry about their data. Users can just perform the functionality without even knowing the complex logics implemented in cloud architecture. It provides highest optimization and utilization in the cloud platform
What is the business benefits involved in cloud architecture?
1. Zero infrastructure investment: Cloud architecture provide user to build large scale system with full hardware, machines, routers, backup and other components. So, it reduces the startup cost of the business.
2. Just-in-time Infrastructure: It is very important to scale the infrastructure as the demand rises. This can be done by taking cloud architecture and developing the application in the cloud with dynamic capacity management.
3. More efficient resource utilization: Cloud architecture provides users to use their hardware and resource more efficiently and utilize it in a better way. This can be done only by applications request and relinquish resources only when it is needed (on-demand).
What are the examples of cloud architectures on which application can run?
There are lot of examples that uses cloud architecture for their applications like:
1. Processing Pipelines
Uses like document processing pipelines that convert documents of any form into raw searchable text.
– Image processing pipelines: Create thumbnails or low resolution image
– Video transcoding pipelines: Convert video from one form to another online
– Indexing: Create an index of web crawl data
– Data mining: Perform search over millions of records
2. Batch Processing Systems
– Systems that uses log management or generate reports.
– Automated Unit Testing and Deployment Testing
– Instant Websites: websites for conferences or events
– Promotion websites
What are the different components required by cloud architecture?
There are 5 major components of cloud architecture.
1. Cloud Ingress:
Provides a mean to communicate with the outside world. This can be done with the help of communication methods such as:
– Queue based communications
– HTTP communications
– Service Bus
2. Processor Speed:
Processor speed is the major section on which the whole cloud architecture is based. It provides on demand resources that can be dynamically allocated to the user. It saves lots of cost and has many benefits of virtualization.
3. Cloud storage services:
Cloud services provide means to store data to user’s applications. It is used to provide services for different types of storages like: table data, files.
4. Cloud provided services:
Additional services are provided by the cloud, like data services, payment processing services, and search or web functionality services.
5. Intra-Cloud communications:
it provides a way to communicate with other systems that are using cloud architecture. Providers usually provide services so that one user can communicate easily with another user by being on cloud.
What are the different phases involves in cloud architecture?
There are four phases that basically gets involved in the cloud architecture:
1. Launch phase: It launches the basic services and makes the system ready for communication and for application building
2. Monitor phase: It monitors the services that is being launched and then manages them so that on demand the user will be able to get what he wants.
3. Shutdown phase: It shutdown the services that are not required first and after all the services gets shutdown, and then it closes the system services.
4. Cleanup phase: It clean up the left out processes and services that is being either broken or didn’t get shutdown correctly.
What is the relationship between SOA and cloud architecture?
Service oriented architecture (SOA) is an architectural style that supports service oriented methodology that is being added in the cloud architecture as a mandatory component. Cloud architecture support the use of on-demand access to resources and it provides lots of other facilities that are being found in SOA as well. SOA makes these requirements optional to use. But, to get the full functionality and more performance based efficiency there is a requirement for the mixture of SOA and cloud architecture.
How does the Quality of service is being maintained in the cloud architecture?
Cloud architecture mainly focuses on quality of service. It is a layer that manages and secures the transmission of the resources that is being acquired by on-demand access. Quality of service is being maintained such that it increases the performance, automated management, and support services. Cloud architecture provides easy to use methods and proper ways to ensure the quality of service. It is represented by a common cloud management platform that delivers many cloud services based on the same foundation.
What are the different roles defined by cloud architecture?
Cloud architecture defines three roles:
– Cloud service consumer: it is used to provide different services to the consumer on demand.
– Cloud service provider: here provider provides the services to meet the requirements of the user by monitoring the traffic and demands that are coming.
– Cloud service Creator: here creator is used to create the services and provide the infrastructure to the user to use and give the access to the resources. The roles that are being defined can be performed by one person or it can be performed by many people together. There can be more roles defined depending on the cloud architecture and the complexity with which it will scale.
What are the major building blocks of cloud architecture?
The major building blocks of cloud architecture are:
1. Reference architecture: it is used for documentation, communication, designing and defining various types of models
2. Technical Architecture: defines the structured stack, structure the cloud services and its components, show the relationship that exist between different components, management and security
3. Deployment Operation Architecture: it is used to operate and monitor the processes and the services.
What are the different cloud service models in cloud architecture?
There are 4 types of cloud service models available in cloud architecture:
1. Infrastructure as a service:
It provides the consumer with hardware, storage, network and other resources on rent. Through this consumer can deploy and run software using dedicated software. This includes the operating system and the applications that are associated with it.
2. Platform as a service:
it provides the user to deploy their software and application on the cloud infrastructure using the tools that are available with the operating system.
3. Software as a service:
it provides the users the ability to run its application on the cloud infrastructure and can access it from any client device using any interface like web browser.
4. Business Process as a service:
it provides any business process that is delivered through cloud service model using the internet and accesses the resources through the web portal.
What is the difference between vertical scale up and Horizontal scale out?
– Vertical scale up provides more resources to a single computational unit, whereas horizontal scale out provides additional computational unit and run them in parallel.
– Vertical scale up provides a provision to move a workload to other system that doesn’t have workload, whereas horizontal scale out split the workload among various computational units.
– Vertical scale up doesn’t have a database partitioning concept, whereas horizontal scale out provides the database partitioning.
How does cloud architecture provide performance transparency and automation?
There are lots of tools that are being used by the cloud architecture to provide the performance transparency and automation. The tools allow the user to monitor report and manage the cloud architecture. It also allows them to share the applications using the cloud architecture. Automation is the key component of cloud architecture as it provides the services to increase the degree of the quality. It brings the capacity on demand and allows the requirements of the user to be met.
Cloud Computing MapReduce Interview Questions and Answers
What do you understand by MapReduce?
MapReduce is a software framework that was created by Google. It`s prime focus was to aid in distributed computing, specifically large sets of data on a group of many computers. The frameworks took its inspiration from the map and reduce functions from functional programming.
Explain how mapreduce works.
The processing can occur on data which are in a file system (unstructured ) or in a database ( structured ). The mapreduce framework primarily works on two steps:
1. Map step: During this step the master node accepts an input (problem) and splits it into smaller problems. Now the node distributes the small sub problems to the worker node so that they can solve the problem.
2. Reduce step: Once the sub problem is solved by the worker node, the node returns a solution to the master node which accepts all the solutions of the worker node and re-compiles them into a solution. This solution is for the input that was provided to the master node.
What is an input reader in reference to mapreduce?
The input reader as the name suggests primarily has two functions:
1. Reading the Input
2. Splitting it into sub-parts
The input reader accepts a user entered problem and then it divides/splits the problem into parts which then each are assigned a map function. Also an input reader will always read data from a stable storage source only to avoid problems.
Define the purpose of the Partition function in mapreduce framework
In mapreduce framework each map function generates key values. The partition function accepts these key values and in return provides the index for a reduce. Generally the key is hashed and a modulo is done to the number of reducers.
Combiners codes are used to increase the efficiency of a mapreduce process. They basically help by reducing the amount of data that needs to be shifted across to reducers. As a safe practice the mapreduce jobs should never depend upon combiners execution.
Explain what you understand by speculative execution
Mapreduce works on the basis of large number of computers connected via a network also known as node. In a large network there is always a possibility that a system may not perform as quickly as others. This results in a task being delayed. By speculative execution this can be avoided as multiple instances of the same map are run on different systems.
When do reducers play their role in a mapreduce task?
The reducers in a mapreduce job do not begin before all the map jobs are completed. Once all the map jobs are completed the reducers begin copying the intermediate key-value pairs from the mappers. Overall reducers start working as soon as the mappers are ready with key-value pairs.
How is mapreduce related to cloud computing?
The mapreduce framework contains most of the key architecture principles of cloud computing such as:
– Scale: The framework is able to expand itself in direct proportion to the number of machines available.
– Reliable: The framework is able to compensate for a lost node and restart the task on a different node.
– Affordable: A user can start small and over time can add more hardware.
Due to the above features the mapreduce framework has become the platform of choice for the development of cloud applications.
How does fault tolerance work in mapreduce?
In a mapreduce job the master pings each worker periodically. In case a worker does not respond to that system then the system is marked as failed. Even completed tasks are rescheduled because the output was stored in a in a local disk of a worker which failed. Hence mapreduce is able to handle large-scale failures easily by simply restarting a task. The master node always saves itself at checkpoints and in case of any failure it simply restarts from that checkpoint.
In mapreduce what is a scarce system resource? Explain?
A scarce resource is one which is available in limited quantities for the system. In mapreduce the network band-with is a scarce resource. It is conserved by making use of local disks and memory in cluster to store data during tasks. The function uses the location of the input files into account and aims to schedule a task on a system which has the input files.
What are the various input and output types supported by mapreduce?
Mapreduce framework provides a user with many different output and input types.
– Ex. Each line is a key/value pair. The key is the offset of the line from the beginning of the file and the value are contents of the line. It is up-to the will of the user. Also a user can add functionality at his will to support new input and output types.
Explain task granularity
In mapreduce the map phase if subdivided into M pieces and the reduce phase into R pieces. Each worker is assigned a group of tasks this improves dynamic load balancing and also speeds up the recovery of a worker in case of failures.
With the help of two examples name the map and reduce function purpose
– Distributed grep: A line is emitted by the map function if it matches a pattern. The reduce function is an identity function that copies supplied intermediate data for output.
– Term-vector per host: In this the map function emits a hostname, vector pair for every document (input). The reduce function adds all the term vectors pairs generated and discards any infrequent terms.
Explain the general mapreduce algorithm:
The mapreduce algorithm has 4 main phases:
3. Shuttle and sort
4. Phase output
Mappers simply execute on unsorted key/values pairs.They create the intermediate keys. Once these keys are ready the combiners pair the key/value pairs with the right key. The shuttle/sort is done by the framework their role being to group data and transfer it. Once completed, it will proceed for the output via the phase output process.
Write a short note on the disadvantages of Mapreduce
Some of the shortcomings of mapreduce are:
– One-input two-phase data flow is rigid i.e. it does not allow for multiple step processing of records.
– Being based on a procedural programming model this framework requires code for simple operations.
– The map and reduce functions being opaque does not allow for optimization easily.