Workload management supports the applications critical to enterprise business objectives and is concerned with the effective management of all types of processing tasks and transactions, as well as the maximum utilization of hardware and software resources. In functional terms, workload-management software schedules, analyzes, and monitors the processing of the enterprise-application environment.
Workload management assumes many of the traditional operating-system responsibilities at the network level. It dynamically allocates user activities across the enterprise to fully harness distributed heterogeneous computing resources.
While related to data and system management, workload management differs in key areas. Whereas data management provides a distributed data platform for all applications, workload management provides a distributed computational platform for all applications. Where system management focuses on hardware and software resources and is used by the operational staff, workload management focuses on supporting the computing workload of enterprise systems, and it is used directly or indirectly by the end users who are entrusted with running the business.
In this sense, workload management picks up where system management disciplines leave off. System management ensures that distributed-computing resources function properly, whereas workload management ensures that those computing resources are matched effectively with the requirements of the business-critical processes at the heart of the modern enterprise. Traditionally, workload management was not considered a distinct segment of the system-software market. With the strong shift toward heterogeneous enterprise-computing, however, workload management has become increasingly important. What makes this so important is the ability to harness the full power of distributed heterogeneous hardware and software.
The most important and fundamental function of workload management is the dynamic scheduling of jobs using the best resources available. Depending on job requirements, scheduling can be based on one or more of the following criteria:
Resource availability. A job may require a specific system platform, a certain amount of memory, or a specific software license. Instead of having the user specify which host should process the job, workload scheduling dynamically matches the job requirements with the best-available computing resources. If requisite resources are not available, jobs can be deferred for batch execution.
Policies and priorities. Like other corporate assets, computing resources should be dedicated to the jobs that add value to an enterprise. Thus, the priorities associated with the jobs and the resource-sharing policies of the site must be considered in workload scheduling. User sites specify priorities and policies that are then enforced by the scheduling system.
Calendar-based scheduling. A job may need to be run periodically according to a calendar. This scheduling functionality is needed for repetitive jobs such as nightly database consolidation, weekly manufacturing planning, or biweekly payroll processing. While mainframe systems commonly support single-host job schedulers, job scheduling in distributed heterogeneous environments is done on a network-wide basis, involving the dynamic selection of the most appropriate hosts for the jobs.
Workflow and events. A set of interrelated jobs can form a natural flow of work that accomplishes a business task collectively. Job execution might depend on data availability, completion of other jobs, job-processing errors, and system-error conditions, but it should be independent of the hosts to which individual jobs are assigned.
Because jobs frequently must use a combination of scheduling criteria, the four preceding criteria of workload scheduling must be supported and integrated into a unified set of products.
Whereas workload scheduling optimizes workload performance while ensuring reliable workload processing, workload analysis evaluates workload data to assess overall system performance. Workload analysis uses data pertaining to:
Workload analysis can be used for capacity planning, system bottleneck identification and removal, system-performance tuning, system-upgrade planning, and future workload performance and requirement forecasting. As part of workload analysis, charge-back accounting calculates appropriate charges for resource usage. Workload analysis is a powerful means of implementing service-level agreements (SLAs), so IT departments and their users can establish a consensus on how technology supports business processes.
As workload flows through the networked enterprise, and as the installed base of available resources evolves and expands, dynamic-monitoring capabilities are needed to track workload processing, so adjustments and corrective actions can be taken immediately, before a problem causes a major disruption. Similarly, workload-management policies must be configurable and flexible, and the use of system resources must be controlled across the network through a set of administrative tools. Often, such workload monitoring and administration tools can be integrated into system management. Taken collectively, the above three core functions of workload management provide enterprise-applications with the availability, performance, reliability, and scalability that advance real-world business objectives.
To accommodate the complexity of distributed heterogeneous systems and enterprise-application demands, effective workload management must satisfy the following needs:
Dynamic scheduling. In a distributed environment, resources and their load conditions change. Rather than depending on manual selection of hosts by users, workload scheduling should be dynamic. Dynamic scheduling exploits the full potential of all resources, scheduling jobs as quickly as possible to deliver optimal performance. To do so, dynamic scheduling reacts to incoming higher-priority jobs by suspending or migrating other jobs to free up resources. At the same time, it achieves a high degree of fault tolerance by redirecting job flow in the event of host failures.
General-purpose applicability. A good workload-management solution should be applicable to all varieties of workload, such as interactive transactions, batch processing, parallel processing, real-time process control, and routine production jobs. This broad applicability ensures that workload management can benefit the entire enterprise. Standard interfaces for third-party software products also must be provided to further extend the wide applicability of workload management.
Automated and transparent operation. Once established, workload management does not require end-user intervention. End users simply submit processing tasks and resource requirements to systems, and workload management automatically handles the job scheduling, analysis, monitoring, and administration. Just like host operating systems, workload management should be transparent to users.
Flexible and policy-based. Enterprise-computing resources should be employed in accordance with continually evolving business policies and priorities. For this reason, workload management must be highly adaptable to enterprise and departmental requirements. Workload management should provide a rich set of mechanisms through which site-specific resource policies can be established and revised.
Enterprise-wide scalability. Workload-management techniques must scale to hundreds of thousands of components spread throughout a global infrastructure. Workload management should be equally applicable to small, medium, and large enterprise environments, and must deliver value on a single server, a cluster of servers, across divisions and departments, and throughout a global enterprise.
Support for all architectures and operating systems. Heterogeneity is here for the long-term. Different types of systems are suitable for different workloads. Workload management is the glue that integrates heterogeneous operating systems transparently to fully harness their respective strengths. The end result is a heightened degree of resource abstraction that hides operational complexities from users, allowing them to concentrate on their work.
Robust and fault tolerant. The entire enterprise workload is processed by workload management, just as all voice communication is processed by the telephone network and all data communication is supported by the corporate network. Workload management must be available at all times, since a major failure in the execution of application workload can paralyze the enterprise. Fault tolerance should ensure that partial failures are addressed resiliently, with no meaningful impact on business-critical processes.
Application integration. The processing modes and requirements of applications vary greatly. Thus, workload management must be integrated with applications that fill their needs. For example, though SAP applications need to be started in an SAP environment, they also might need to be driven by a job scheduler. This means workload scheduling must be integrated with SAP or other enterprise resource planning (ERP) software.
Open and standards-based. To enable workload management to be integrated with other system software and all types of applications, a general-purpose framework with well-defined APIs is needed for all aspects of workload management. Such a framework would allow software developers to provide a variety of workload-management products, while users could rest assured that all products would work together effectively.
A strong workload management solution that satisfies these criteria is essential to the advancement of the business-driven deployment of heterogeneous distributed computing.
For workload management to deliver its greatest value, it should be integrated with the overall enterprise system environments. The goal is to make workload management an essential part of the system.Operating Systems.
Just as an effective manager must know the strengths, interests, and personalities of the people she manages, enterprise workload management must be fully integrated with the host operating systems to make the best of their capabilities. Workload management should be integrated with operating-system facilities such as process scheduling, fair-share scheduling (which divides the processing of a compute cluster among users and groups to provide fair access to resources for all jobs in a queue), resource limits, resource partitioning, user authentication, network file system, and job checkpointing. For example, operating-system priority scheduling can be used to control the priorities of multiple concurrent jobs allocated to a host by workload management. The resource-usage data collected by the operating system can be saved in files and analyzed later.
The workload data can identify troublesome jobs and take corrective action. An example might involve system management requesting that workload management suspend, kill, or migrate offending jobs. The resource loading and job information maintained by workload management can be integrated into system management through SNMP agents, substantially expanding the range of information available to system management and enabling it to be more sensitive to dynamic aspects of the networked system. Workload management also can play a critical role in scheduling system-management activities. For example, periodic system-checking tasks can be scheduled by a distributed job scheduler, and routine administration tasks can be initiated in parallel on all servers. In the other direction, system-management frameworks can monitor and administer the workload. Scheduling configuration, for instance, can be done through a set of GUIs integrated into system management. What's more, workload-management capabilities, such as workload alarms and events, can be integrated into system-management frameworks through existing interfaces.Data Management.
The dynamic scheduling of database tasks, such as transaction processing and data mining, requires the inclusion of database-specific load indices in workload scheduling. Instead of considering CPU run-queue length and the amount of free memory maintained by the operating system to achieve load balancing, it is probably more effective to consider the outstanding number of transactions and the buffer space on each database server. The emergence of the distributed parallel data server goes hand-in-hand with enterprise workload management. Parallel computing and dynamic scheduling should be integrated with data mining and data warehousing to enable them to run on commodity hardware and thus gain a deeper understanding of business dynamics and customer requirements.Middleware.
As middleware technologies (such as CORBA, DCOM, messaging, and Java) mature, workload management must integrate them to enhance and extend their cross-platform functionality and scalability. For example, the architecture-independent execution of Java applications affords workload management the full flexibility of selecting the most capable resources to process tasks.Applications.
Perhaps the greatest benefit comes through an application's interface. Users can leverage the power of workload management fully without having to leave the familiar contexts of their application environments. They don't have to match their job requirements with available networked resources: workload management automates the process and does it for them. Just as an operating system must integrate with applications to deliver true value to users, workload management must integrate with distributed enterprise applications to make them network-enabled. Applications can take advantage of workload-management capabilities, such as distributed job scheduling to "farm" tasks across the network, parallel processing to run a large job on many computers simultaneously, and job checkpointing and migration to ensure job completion even if computers and networks fail. Such capabilities deliver unprecedented levels of application availability, flexibility, performance, and reliability.
Workload management is essential for distributed computing to achieve the robustness and maturity of the legacy mainframe while also delivering the accessibility, cost-effectiveness, and openness of distributed environments. A simple extension of mainframe-based job scheduling to accommodate the network will not meet the requirements of enterprise workload management. The overwhelming paradigm shift to distributed computing necessitates the adoption of new architectures and methodologies that fully address the complexity of distributed computing. While workload management is relatively new to the marketplace, it has proven its ability to deliver value in a variety of distributed-computing environments. In commercial and technical computing, workload management provides a powerful return on investment.