Back to Jobs

[Remote] IBM Workload Scheduler Administration / Infrastructure Engineer

Remote, USA Full-time Posted 2026-06-20

Note: The job is a remote job and is open to candidates in USA. Kastech Software Solutions Group is seeking a highly skilled IBM Workload Scheduler Administration / Infrastructure Engineer with 3–5+ years of experience. The role involves managing, maintaining, and optimizing enterprise batch scheduling infrastructure, ensuring high availability and reliable execution of critical business workloads.

Responsibilities

  • IBM Workload Scheduler Administration
  • Administer Production IBM Workload Scheduler (formerly Tivoli Workload Scheduler) environment:
  • 28,000 unique daily jobs
  • Approximately 350,000 daily job runs
  • 44 servers
  • Three additional change-control environments
  • Install, configure, administer, patch, and upgrade IWS components:
  • Master Domain Manager (MDM)
  • Dynamic Agents
  • Dynamic Pools
  • Dynamic Workload Console (DWC)
  • Change Management & Governance
  • Work closely with Product Owners and communicate workstreams through Jira
  • Manage job promotions using a Workload Application Template-based process
  • Perform safety and stability assessments for all job promotions
  • Manage change control across four separate environments
  • Enforce change management standards, policies, and governance
  • Platform Availability & Operations
  • Maintain and continuously improve Production platform uptime target of 99.17% per month
  • Follow SOPs, DevOps practices, and disciplined change-control processes
  • Coordinate platform-impacting communications to a user community of approximately 500 developers and data engineers
  • Support Production infrastructure consisting of:
  • 44 servers
  • MDM, DWC, and Agent environments
  • Troubleshooting & Support
  • Resolve:
  • Complex job failures
  • Performance bottlenecks
  • Agent-related issues
  • Infrastructure-related issues
  • Provide guidance on complex job scheduling designs to less experienced team members
  • Monitoring, Security & Compliance
  • Monitor scheduler platform health and performance
  • Manage database maintenance activities
  • Perform backup, disaster recovery, and monthly failover testing
  • Define and maintain:
  • Security policies
  • User authorizations
  • Authentication for Dynamic Workload Console (DWC)
  • Respond to:
  • Cybersecurity vulnerability assessments
  • PCI compliance audits
  • Other regulatory audit requests
  • Automation & DevOps
  • Design and implement Ansible-based automation solutions
  • Develop self-healing mechanisms to reduce unplanned outages
  • Coordinate with offshore teams performing SOP activities during non-business hours
  • Develop automation scripts using:
  • Python
  • IWS REST APIs

Skills

  • Ability to modernize, implement, install, configure, upgrade, migrate, develop, or design IBM Workload Scheduler (IWS) / IBM Workload Automation (IWA) solutions
  • Support migration activities across pre-production and production environments
  • Participate in knowledge transfer and documentation to enable team self-sufficiency
  • 3–5+ years of dedicated IBM Workload Scheduler administration experience
  • Responsible for managing, maintaining, and optimizing enterprise batch scheduling infrastructure
  • Primary environment hosted on Red Hat Enterprise Linux (RHEL)
  • Strong expertise in: IBM Workload Scheduler (IWS), Linux System Administration, Scripting and Automation
  • Focus on ensuring high availability and reliable execution of critical business workloads
  • Administer Production IBM Workload Scheduler (formerly Tivoli Workload Scheduler) environment: 28,000 unique daily jobs, Approximately 350,000 daily job runs, 44 servers, Three additional change-control environments
  • Install, configure, administer, patch, and upgrade IWS components: Master Domain Manager (MDM), Dynamic Agents, Dynamic Pools, Dynamic Workload Console (DWC)
  • Work closely with Product Owners and communicate workstreams through Jira
  • Manage job promotions using a Workload Application Template-based process
  • Perform safety and stability assessments for all job promotions
  • Manage change control across four separate environments
  • Enforce change management standards, policies, and governance
  • Maintain and continuously improve Production platform uptime target of 99.17% per month
  • Follow SOPs, DevOps practices, and disciplined change-control processes
  • Coordinate platform-impacting communications to a user community of approximately 500 developers and data engineers
  • Resolve: Complex job failures, Performance bottlenecks, Agent-related issues, Infrastructure-related issues
  • Provide guidance on complex job scheduling designs to less experienced team members
  • Monitor scheduler platform health and performance
  • Manage database maintenance activities
  • Perform backup, disaster recovery, and monthly failover testing
  • Define and maintain: Security policies, User authorizations, Authentication for Dynamic Workload Console (DWC)
  • Respond to: Cybersecurity vulnerability assessments, PCI compliance audits, Other regulatory audit requests
  • Design and implement Ansible-based automation solutions
  • Develop self-healing mechanisms to reduce unplanned outages
  • Coordinate with offshore teams performing SOP activities during non-business hours
  • Develop automation scripts using: Python, IWS REST APIs
  • Strong experience with IBM Workload Scheduler architecture, especially Dynamic Workload Broker, V10.1+, high availability of MDM's managing Fault Tolerant Agent and Dynamic Agent agent architectures
  • Strong conceptual understanding of Master Domain Manager (MDM), Backup MDM (BMDM), Dynamic Workload Console (DWC), Fault Tolerant Agent (FTA), Dynamic Agent (DA)
  • Strong grasp of conman CLI to monitor and control production plan, check job/job stream/resource status
  • Strong grasp of composer CLI to define, modify and extract scheduling objects
  • Strong grasp of planman CLI to control pre-production plan and GUI mirroring
  • Strong grasp of lifecycle of daily production planning process, phases of JNextplan/FINAL
  • Proficiency in navigating the DWC web-based GUI to monitor workloads, manage user access security, and define scheduling objects
  • Experience installing IWS components, applying Fix Packs, and Interim Fixes
  • Troubleshooting with logs under TWSDATA/stdlist, adjusting trace level for netman, batchman, writer, mailman, etc
  • Strong experience with IBM WebSphere Liberty
  • Strong grasp of reading messages.log, traces.log, FFDC logs
  • Strong grasp of configuring JVM heap sizes
  • Strong grasp of configuring tracing scope, tracing levels, tracing retention
  • Strong experience with Red Hat Enterprise Linux 8+
  • Deep familiarity with bash/shell commands for text processing (for example, grep, awk, sed), file manipulation, and system navigation
  • Ability to manage, start, stop, and troubleshoot SystemD services using systemctl and journalctl for IWS agents and MDM
  • Managing user accounts, groups, service accounts and deep knowledge of Linux file permissions (chmod, chown, ACL on local filesystems and NFS)
  • Ability to monitor system performance using tools like top, htop, vmstat, iostat, and sar to troubleshoot bottlenecks and platform unresponsiveness
  • Understanding of Logical Volume Manager (LVM) and filesystem usage
  • Checking TCP port availability, firewall rules (firewalld/iptables), and connectivity between MDM and Dynamic Agents using netstat, ss, ping, curl, etc
  • Managing SSL/TLS certificates, private keystores, public truststores, and working with Certificate Authority
  • Strong experience with scripting (Bash Shell, Python, etc.) for automation
  • Understanding of networking principles
  • Understanding of basic Oracle database administration, enough to troubleshoot with DBA's to prove when an issue is in Oracle
  • Understanding of basic SQL to query job metadata
  • Understanding of checking database connectivity
  • Understanding of AWS cloud infrastructure
  • Experience with using secrets manager (CyberArk PPM, Hashicorp Vault, or similar)

Company Overview

  • Kastech Software Solutions Group, incorporated in 2007 and headquartered in Richmond, Texas, is a leading global IT services and consulting company delivering technology-driven solutions to organizations across industries. It was founded in 2008, and is headquartered in Houston, Texas, USA, with a workforce of 1001-5000 employees. Its website is https://www.kastechssg.com.
  • Company H1B Sponsorship

  • Kastech Software Solutions Group has a track record of offering H1B sponsorships, with 13 in 2026, 94 in 2025, 65 in 2024, 101 in 2023, 124 in 2022, 171 in 2021, 119 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Similar Jobs

    [Remote] Task Order Project Manager

    Remote, USA Full-time

    [Remote] Product Sales Director

    Remote, USA Full-time

    [Remote] Sr. Full Stack Engineer

    Remote, USA Full-time

    [Remote] PeopleSoft Administrator

    Remote, USA Full-time

    [Remote] Staff Analytics Engineer

    Remote, USA Full-time

    [Remote] Data Analyst IV- #26-14118

    Remote, USA Full-time

    [Remote] SEO Account Director

    Remote, USA Full-time

    [Remote] SEO Account Director

    Remote, USA Full-time

    [Remote] RATE ANALYST

    Remote, USA Full-time

    [Remote] Director of Marketing (SF/LA/NYC)

    Remote, USA Full-time

    Require Instructional Assistant-Student Services (Grant) in Illinois

    Remote, USA Full-time

    Experienced Data Entry Specialist I – Office Operations

    Remote, USA Full-time

    Experienced Work-from-Home Customer Service Representative – Full-Time & Part-Time Opportunities at arenaflex

    Remote, USA Full-time

    Experienced Online Data Entry Specialists – Remote Full-time, Part-time, and Freelance Opportunities for Detail-Oriented Individuals

    Remote, USA Full-time

    Walmart SameDay Delivery Partner

    Remote, USA Full-time

    Immediate Hiring: Work From Home (WFH) Customer Service

    Remote, USA Full-time

    Multiclient Bookkeeper, FT Remote with Benefits, 401k

    Remote, USA Full-time

    Global Supplier Expert (Monterrey, MX)

    Remote, USA Full-time

    Experienced Remote Data Entry Specialist – arenaflex (Hiring Now)

    Remote, USA Full-time

    Experienced Customer Service Representative - Client-Facing Team Lead, Answer Team Consultant for Healthcare Services at blithequark

    Remote, USA Full-time