Cluster Management using Apache Mesos & Marathon

By August 21, 2017 November 6th, 2020 No Comments

Cluster Management using Apache Mesos & Marathon

21 AUG, 2017

Cluster Management using Mesos as cluster manager, Marathon as its framework.

Before looking at what is cluster management or task scheduling let’s look at the challenges we faced before using Mesos, static partitioning, dynamic partitioning, etc.

Static Partitioning:
Cluster management

In the above mentioned diagram we have racks of servers, and we can see that fixed servers are assigned to different applications – MySQL, Cassandra, Rails, Hadoop and memcached.

Now let’s take a scenario where Hadoop servers are fully utilized and more servers are needed for Hadoop. Let’s assume that other servers are under-utilized, so we cannot assign other existing servers to Hadoop as they are assigned to other applications.

What are the disadvantages here?

  • Memory Utilization as we are not fully utilizing our existing servers.
  • Higher cost

Challenges before Mesos

  • How to manage distribution of Modern apps? Of course manually and with Ops team.
  • Static partitioning of cluster, leads to Low utilization [30%].
  • Developers’ challenge is to build apps that scale elastically and also handle inevitable faults.
  • Ops have to manage and scale all Apps individually
  • Managing out-f-the-box NFR (Non Functional Requirement)

Types of Resource-utilization:

  • Intra machine resource sharing: Share a single machine’s resources between multiple apps (multi-tenancy).
  • Intra-datacenter resource sharing: Share multiple machines’ resources between multiple apps.

What is a Cluster Manager?

A cluster manager is a piece of software doing the mediation. It provides a level of indirection between hardware resources (machines) and applications/jobs. Humans will not manually schedule Apps/Jobs on your cluster nodes.

What is  Mesos?

Apache Mesos is an open-source cluster manager that was developed at the University of California, Berkeley. It “provides efficient resource isolation and sharing across distributed applications, or frameworks”. The software enables fine-grained resource sharing, improving cluster utilization.

Mesos the Cluster Manager:

cluster manager

Dynamic Partitioning Using Mesos for all applications

In the above diagram, we have racks of servers and applications. But what is the difference between this diagram and the previous one? Here we have common servers that are being utilized by all applications as per their requirement. So here, if Hadoop needs more servers, it can utilize the existing servers as per availability so we can achieve maximum memory utilization and reduce the cost.

Components of Mesos:

  • Modern general purpose Cluster manager:
  • Mesos Master
  • Mesos Slave

Mesos  — os  — Dephilosophy:

  • Kernel does the resource allocation and sharing
  • Frameworks for task scheduling, execution & Fault tolerance

Mesos is like an OS/Datacenter kernel that helps with:

  • User space scheduling
  • Level of Abstraction
  • Building and running distributed systems using Resources
  • Building & running a PAAS on top of Mesos kernel
  • Running Mesos on top of Physical machines or EC2 or OpenStack [IAAS]

Solution Architecture: (The most important part of this page)

In this architecture we will be using Mesos as cluster manager and Marathon as its task scheduler.

MESOS master

Solution Architecture for Mesos using Marathon as its scheduler


  1. Mesos slaves reports resource-availability (offers) to master.
  2. Mesos master sends these offers to marathon framework.
  3. Framework scheduler replies to the master with information about tasks to run on slave using required CPUs and RAMs for different tasks.
  4. The Mesos master sends task details to load balancer, which distributes the task to different slaves.
  5. Finally the task is sent to slaves, which allocate resources to framework, and launch tasks.
  6. Resource will be free after task completion.

Mesos — Functionality for every Distributed System:

Mesos master provides common functionality, which is required by every Distributed system . Here’s a look at their functionalities:

  • Failure detection
  • Package distribution
  • Resource isolation
  • Task distribution
  • Task starting
  • Task monitoring
  • Task killing
  • Task cleanup

Master — The kernel:

  • Allocating resources to different frameworks
  • Flexibility — accommodate diverse frameworks
  • Scalability – scheduler can scale as number of machines & apps increases
  • Fairness — in allocating resources to users/frameworks
  • Fine grained resource sharing using resource offers
  • Manage Task lifecycle for frameworks
  • An offer represents some resources available on slave.


  • Have resources
  • Responsible for executing tasks — assigned by Frameworks
  • Isolation for each Task
  • Each Task should get exact resources — not more, nor less
  • Master manages resources on Slaves
  • Resources of Slaves are consumed by Tasks
  • Slave resources are managed by Master and allocated to Frameworks
  • Slave will send offers to Mesos Master along with Key-Value pairs [ attributes ] of the Slave
  • Frameworks use the Slave attributes in Task management

Resources — offers :

  • Resource attributes — cpus, mem, disk, ports
  • Mem and disk are in MB

Frameworks (Marathon):

  • Distributed apps which run on Mesos cluster are called Frameworks
  • 2 components – Scheduler & Executor
  • Runs number of Tasks
  • Tasks consume resources
  • Tasks Lifecycle & management — functions
  • Schedulers – coordinating the execution
  • Executors — control Task execution, run multiple tasks
  • Provide API to communicate with Scheduler and Executor


  • Determines what Computation should run
  • Determines where should Computation run
  • Communicates with Master
  • Master is responsible for allocation of Resources


  • Starts the task on the scheduled slave.

Mesos — os — value add:

  • Provides a Data center kernel
  • High level abstraction to develop apps that treat Distributed Infra, just like a Single large computer
  • Devs can only focus on App logic & not worry about Infrastructure
  • Helps in resource allocation, deployment, monitoring and isolation
  • Devs need to know what Resources are needed and not how to get resources

Apache Mesos Installation:

Please refer the link mentioned below to install Mesos.

Author Bio: This blog was written by Kailash Verma, an innovative DevOps Consultant at Tavisca Solutions who believes in maximizing his productivity to quickly respond to the changing business needs. He is highly passionate about Amazon Web Services, Docker: a container platform and brings his technological ideas to life through his write-ups. 

Leave a Reply