# Scheduling Processing Graphs of Gang Tasks on Heterogeneous Platforms

Shareef Ahmed, Denver Massey and James H. Anderson



# **Processing Graphs**



- Node = Task
- Edge = Precedence constraint
- Goal:
  - Response time ≤ Deadline

### Processing Graphs of Gang Nodes



Nodes are <u>rigid gang</u> tasks



• Rigid = Same number of threads for all jobs of  $\tau_i$ 





- Multiple compute units
- Each node assigned to a compute unit
- Each compute unit has multiple same-speed processors



Task Model

#### A Use Case

Scheduling processing graphs on multicore+GPU





GPU-accessing task



**GPU** kernel







#### Problem

Determine response-time bound of DAGs formed by gang tasks

Assumption: Constrained deadline

Scheduling: Work-conserving, Semi work-conserving

Each DAG receives dedicated number of processors on each compute unit



Task Model

# Work-Conserving Scheduling





Time

# Semi Work-Conserving Scheduling

**RTAS '25** 



Try until a job does not fit



∃ unscheduled ready job : job's processor requirement >

number of idle processors

Time

# Why Semi Work-Conserving Scheduling?

Scheduling in NVIDIA GPU is semi work-conserving when all GPU work is submitted from the same address space (and some more constraints)

- 1. Amert et al., RTSS 2017
- 2. Bakita and Anderson, RTAS 2024



RTAS '25



Assumption 1: One compute unit

Assumption 2: Sequential node (one thread per task)

Graham, Siam J. of Appl. Math., 1969



 $\tau_1$ 

Interfering workload / m

 $\tau_3$ 

### Response-Time Bound



Assumption 1: One compute unit

Assumption 2: Sequential node (one thread per task)

Longest path
Graham, Siam J. of Appl. Math., 1969



Time

 $\tau_8$ 



Assumption 1: One compute unit

Assumption 2: Sequential node (one thread per task)

Gang Multiple threads



Step 1: Determine the minimum number of busy processors when  $\tau_i$  is ready but unscheduled





Step 2: Upper bound total interference time





Any path can be a critical path

Determine a set of nodes (not necessarily on a path) that upper bounds interference time





RTAS '25 15



Assumption 1: <del>One</del> compute <del>unit</del> Multiple units

 $\tau_2$  $\tau_3$  $au_4$  $\tau_1$ 

Scheduling in different compute units can be different

Bound under X

Bound under work-conserving

Work-conserving vs. Semi work-conserving scheduling





GPU as a shared resource vs. scheduling platform

**GPU** kernel



With locking

CPU-only DAG response-time bound



Without locking

DAG of gang tasks response-time bound





Locking-based GPU access vs. Default GPU scheduling







Histogram of Oriented Gradients

GPU partitioning using libsmctrl [Bakita and Anderson, RTAS 2023]



RTAS '25

### Conclusion & Thank You!









21



THE UNIVERSITY
of NORTH CAROLINA
at CHAPEL HILL

RTAS '25