Resource Manager Overview
Moab Workload Manager®

13.1 Resource Manager Overview

For most installations, the Moab Workload Manager uses the services of a resource manager to obtain information about the state of compute resources (nodes) and workload (jobs). Moab also uses the resource manager to manage jobs, passing instructions regarding when, where, and how to start or otherwise manipulate jobs.

Moab can be configured to manage more than one resource manager simultaneously, even resource managers of different types. Using a local queue, jobs may even be migrated from one resource manager to another. However, there are currently limitations regarding jobs submitted directly to a resource manager (not to the local queue.) In such cases, the job is constrained to only run within the bound of the resource manager to which it was submitted.


13.1.1 Scheduler/Resource Manager Interactions

Moab interacts with all resource managers using a common set of commands and objects. Each resource manager interfaces, obtains, and translates Moab concepts regarding workload and resources into native resource manager objects, attributes, and commands.

Information on creating a new scheduler resource manager interface can be found in the Adding New Resource Manager Interfaces section.

13.1.1.1 Resource Manager Commands
For many environments, Moab interaction with the resource manager is limited to the following objects and functions:

Object Function Details
Job Query Collect detailed state, requirement, and utilization information about jobs
Job Modify Change job state and/or attributes
Job Start Execute a job on a specified set of resource
Job Cancel Cancel an existing job
Job Preempt/Resume Suspend, resume, checkpoint, restart, or requeue a job
Node Query Collect detailed state, configuration, and utilization information about compute resources
Node Modify Change node state and/or attributes
Queue Query Collect detailed policy and configuration information from the resource manager

Using these functions, Moab is able to fully manage workload, resources, and cluster policies. More detailed information about resource manager specific capabilities and limitations for each of these functions can be found in the individual resource manager overviews. (LL, PBS, LSF, SGE, Condor, BProc, or WIKI).

Beyond these base functions, other commands exist to support advanced features such as dynamic job support, provisioning, and cluster level resource management.

13.1.1.2 Resource Manager Flow

In general, Moab interacts with resource managers in a sequence of steps each scheduling iteration. These steps are outlined in what follows:

  1. load global resource information
  2. load node specific information (optional)
  3. load job information
  4. load queue/policy information (optional)
  5. cancel/preempt/modify jobs according to cluster policies
  6. start jobs in accordance with available resources and policy constraints
  7. handle user commands

Typically, each step completes before the next step is started. However, with current systems, size and complexity mandate a more advanced parallel approach providing benefits in the areas of reliability, concurrency, and responsiveness.

Reliability

A number of the resource managers Moab interfaces to were unreliable to some extent. This resulted in calls to resource management APIs which exited or crashed taking the entire scheduler with them. Use of a threaded approach would cause only the calling thread to fail allowing the master scheduling thread to recover. Additionally, a number of resource manager calls would hang indefinitely, locking up the scheduler. These hangs could likewise be detected by the master scheduling thread and handled appropriately in a threaded environment.

Concurrency

As resource managers grew in size, the duration of each API global query call grew proportionally. Particularly, queries that required contact with each node individually became excessive as systems grew into the thousands of nodes. A threaded interface allowed the scheduler to concurrently issue multiple node queries resulting in much quicker aggregate RM query times.

Responsiveness

Finally, in the non-threaded serial approach, the user interface was blocked while the scheduler updated various aspects of its workload, resource, and queue state. In a threaded model, the scheduler could continue to respond to queries and other commands even while fresh resource manager state information was being loaded resulting in much shorter average response times for user commands.

Under the threaded interface, all resource manager information is loaded and processed while the user interface is still active. Average aggregate resource manager API query times are tracked and new RM updates are launched so that the RM query will complete before the next scheduling iteration should start. Where needed, the loading process uses a pool of worker threads to issue large numbers of node specific information queries concurrently to accelerate this process. The master thread continues to respond to user commands until all needed resource manager information is loaded and either a scheduling-relevant event has occurred or the scheduling iteration time has arrived. At this point, the updated information is integrated into Moab's state information and scheduling is performed.

13.1.2 Resource Manager Specific Details (Limitations/Special Features)

13.1.3 Synchronizing Conflicting Information

Moab does not trust resource manager information. Node, job, and policy information is reloaded on each iteration and discrepancies are detected. Synchronization issues and allocation conflicts are logged and handled where possible. To assist sites in minimizing stale information and conflicts, a number of policies and parameters are available.

  • Node State Synchronization Policies (see NODESYNCTIME)
  • Job State Synchronization Policies (see JOBSYNCTIME)
  • Stale Data Purging (see JOBPURGETIME)
  • Thread Management (preventing resource manager failures from affecting scheduler operation)
  • Resource Manager Poll Interval (see RMPOLLINTERVAL)
  • Node Query Refresh Rate (see NODEPOLLFREQUENCY)

13.1.4 Evaluating Resource Manager Availability and Performance

Each resource manager is individually tracked and evaluated by Moab. Using the mdiag -R command, a site can determine how a resource manager is configured, how heavily it is loaded, what failures, if any, have occurred in the recent past, and how responsive it is to requests.

See Also