Add resource Id remapping support #773

dongahn · 2020-11-24T07:31:08Z

This PR adds basic support for resource Id namespace remapping and solves hwloc GPU ID remapping.

Add initial support to aid in remapping resource Id and rank efficiently for nested instances (i.e., class resource_namespace_remapper_t)
Integrate it into the base resource reader class to make it available for all of the reader classes
Fix incorrect GPU numbering of our hwloc reader using this support
Modify sched-fluxion-resource to establish correct GPU remapping for hwloc reader. The hwloc reader of a nested flux instance discovers the GPU resources using hwloc. However, hwloc uses the logical Id space for each discovered GPU resource, which does not work with GPU affinity at execution time (i.e., CUDA_VISIBLE_DEVICES).

dongahn · 2020-11-28T23:25:57Z

@Mergefyio rebase

dongahn · 2020-11-28T23:29:10Z

@Mergifyio rebase

mergify · 2020-11-28T23:29:47Z

Command rebase: success

Branch has been successfully rebased

dongahn · 2020-11-28T23:31:02Z

This is good for your review.

While testing this on rzansel with actual GPUs, I found a couple of problems with flux-core. I posted a PR for one problem: flux-framework/flux-core#3376

But I don't think we have any immediate solution for the other: flux-framework/flux-core#3375

milroy

This looks great @dongahn, but may need a few minor changes that I highlighted.

milroy · 2020-12-01T06:25:47Z

resource/readers/resource_namespace_remapper.cpp

@@ -0,0 +1,207 @@
+/*****************************************************************************\


Nits: typos in commit message:

infon (ref_id is remapped to remapped_id) for each

-> info (ref_id is remapped to remapped_id) for each

of remapping info infomration per each.

-> of remapping info.

which them allows data querying to be performed

-> which then allows data querying to be performed

such a way that we can find the corresponding

-> in such a way that we can find the corresponding

Sorry about this. I will clean up the commit message. Thanks!

milroy · 2020-12-01T07:12:48Z

resource/readers/resource_namespace_remapper.cpp

+            m_remap[exec_target_range] = std::map<const std::string,
+                                                  std::map<uint64_t,
+                                                           uint64_t>> ();


It looks like you can avoid the subsequent [exec_target_range].find and [exec_target_range][name_type].find lookups by doing something like:

Suggested change

m_remap[exec_target_range] = std::map<const std::string,

std::map<uint64_t,

uint64_t>> ();

m_remap[exec_target_range] = std::map<const std::string,

std::map<uint64_t,

uint64_t>> ();

m_remap[exec_target_range][name_type] = std::map<uint64_t,

uint64_t> ();

m_remap[exec_target_range][name_type][ref_id] = remapped_id;

goto success;

There may be a cleaner way to accomplish this.

As we discussed, I picked readability and simplicity over optimization for the initial condition. I think it makes sense to add this in this case so I will try to add this. But I have a slight reservation on the use of goto statement for success as a unconventional control flow can be confusing. So I will try to avoid it.

@milroy: Looking at the code, I am actually having a second thought.

It feels like doing this optimization will make the control flow more complex. It is like loop hoisting used in hot loop optimization. But this isn't a hotspot as this initial iteration happens only once per each execution target range (in a typical case, this would be called just once). So I have a slight preference to readability (with simpler control flow) over minor performance improvement.

Thoughts?

Since it's not a hotspot, it's probably better to lean toward readability.

resource/modules/resource_match.cpp

milroy · 2020-12-01T09:09:16Z

resource/readers/resource_reader_hwloc.cpp

+                    rc = -1;
+                    break;
+                }
+                if (remap_id > std::numeric_limits<int>::max ()) {


This is necessary because hwloc_obj::logical_index is unsigned. Couldn't we use something smaller than uint64_t for remap_id (as originating in resource/readers/resource_namespace_remapper.cpp), or is there another advantage for using the 64 bit type?

I wanted to make this class as general as possible. I considered two ways. 1) make this class templated; or 1) use the largest data type. Going with 1) will have some side effects one of which is to have the implementations in header files. So for a little class like this, I thought it would be okay to go with 2). From our discussion yesterday, you seem to be okay with it. Unless I hear otherwise, I will keep this as is. Thanks.

milroy · 2020-12-01T19:26:54Z

t/scripts/flux-ion-resource.py

@@ -83,6 +83,10 @@ def rpc_find (self, criteria, find_format=None):

    def rpc_status (self):
        return self.f.rpc ("sched-fluxion-resource.status").get ()
+
+    def rpc_namespace_info (self, rank, type_name, Id):


Nit: IMHO Id is too close to id, which is a built-in Python function. This isn't a necessary change.

I had no idea id was a keyword. I will go with identity.

SteVwonder

LGTM! Just some comments below, none of them are required. Thanks @dongahn! Huge improvement for GPU users.

resource/readers/resource_namespace_remapper.cpp

SteVwonder · 2020-12-03T21:20:08Z

resource/readers/resource_namespace_remapper.cpp

+        auto m_remap_iter = m_remap.find (exec_target_range);
+
+        if (m_remap_iter == m_remap.end ()) {
+            m_remap[exec_target_range] = std::map<const std::string,


Would using the m_remap.emplace(exec_target_range, std::map<....>) help here, since "insertion only takes place if no other element in the container has a key equivalent to the one being emplaced "? I don't think it would be any more efficient, but I think it would shrink the find, if, and [ lines down into a single line (for both the inner and outer maps). Just a thought, not a required change.

SteVwonder · 2020-12-03T22:14:51Z

t/t1016-nest-namespace.t

@@ -0,0 +1,91 @@
+#!/bin/sh
+


Minor typo in commit message: shen -> when

SteVwonder · 2020-12-15T00:59:29Z

resource/readers/resource_namespace_remapper.cpp

+        auto m_remap_iter = m_remap.find (exec_target_range);
+
+        if (m_remap_iter == m_remap.end ()) {
+            auto ret = m_remap.emplace (exec_target_range,
+                                        std::map<const std::string,
+                                                 std::map<uint64_t,
+                                                          uint64_t>> ());
+            if (!ret.second) {
+                errno = ENOMEM;
+                goto error;
+            }


Since you are using emplace, it is my understanding that you don't need the find or the if:

Suggested change

auto m_remap_iter = m_remap.find (exec_target_range);

if (m_remap_iter == m_remap.end ()) {

auto ret = m_remap.emplace (exec_target_range,

std::map<const std::string,

std::map<uint64_t,

uint64_t>> ());

if (!ret.second) {

errno = ENOMEM;

goto error;

}

auto ret = m_remap.emplace (exec_target_range,

std::map<const std::string,

std::map<uint64_t,

uint64_t>> ());

Also IIUC, insert and emplace only return false for ret.second when the element already existed in the map, not when there is a memory error, so if you keep those checks, I think ENOMEM is a misnomer. EEXISTS might be better. I'm basing this off the docs in cplusplus.com, so I very well may be wrong about that. [1] [2]

Sorry, I see now that you are using the result of the find further down. So ignore my comment about not needing the find. Sorry about that.

The bit about ENOMEM still stands though. Assuming that ret.second is only false when the element already exists (and not when out of memory), I don't think you need that check at all (the find already proved the element doesn't exist).

I considered EEXIST but that condition will never occur because of the enclosing condition: if (m_remap_iter == m_remap.end ()) {

So I used ENOMEM as catch-all. Maybe it is better to just not checking ret.second since it should not occur?

Maybe it is better to just not checking ret.second since it should not occur?

Yeah, I would agree with that. Your call though. My previous approval still stands, so feel free to put MWP on this PR when you are happy with it. It'll probably need a rebase too for the new GitHub actions to take affect (I'm still seeing Travis on the status of this).

Problem: The rank and resource Id namespace of a nested flux instance is different from that of its parent. Thus, the resource reader of a nested instance may need to remap the rank and/or certain resource Ids as it populates its resource graph store from the resources emitted from its parent. Add a header and C++ source file that introduces the resource_namespace_remapper_t class. It is designed to add and query resource Id or rank remapping information efficiently both in terms of performance and memory overheads. Using the add() method, users should first pass the resource type (e.g., "core", "gpu") or "rank" as the "name_type" argument along with the Id remapping info (ref_id is remapped to remapped_id) for each unique range of flux ranks (e.g., 0-3). resource_namespace_remapper_t keeps each remapping data using distinct_range_t as key to a std::map member object. IOW, we treat each distinct rank range which has the same remapping info as an equivalent set and keep only one copy of remapping info. Furthermore, the distinct_range_t class implements "operator<()" so that this mapping info can be kept in std::map in ascending rank-range order, which then allows data querying to be performed in O(logN) time. However, the method for querying is the query() method, whose semantics is per individual rank based. To support retrieval of remapping information for a individual rank, we implement "operator<()" in such a way that we can find the corresponding record when the rank intersects a rank-range key.

Make resource_namespace_remapper available for all reader classes by integrating it into the base reader class as a public member object.

Problem: the hwloc reader of a nested flux instance discovers the GPU resources using hwloc. Hwloc uses the logical Id space for each discovered GPU resource, which does not work with GPU affinity at execution time (i.e., CUDA_VISIBLE_DEVICES). Solution: If remap information has been established, hwloc reader remaps the logical Id of the self-discovered GPU devices into the remapped Id. The sched-fluxion-resource will be modified to establish this remap by using RV1 execution key coming from core's resource.acquire interface.

Establish GPU Id remapping by using RV1 execution key coming from core's resource.acquire interface. This is used by the hwloc reader to remap the self-discovered GPU Ids, which are logical, into physical.

Problem: it is nearly impossible to test hwloc's reader's GPU Id remapping on non-GPU systems using fake hwloc xml files, which makes it difficult to develop a CI test. Add ns-info (or namespace info) RPC to sched-fluxion-resource to support further testing. This RPC returns remapping information corresponding to <rank, resource-type-name, ref-id>.

Add a subcommand into flux-ion-resource: ns-info Rank Type Id to get to help invoke ns-info RPC within sched-fluxion-resource.

dongahn · 2020-12-17T18:09:23Z

FYI -- CI failed because flux-core picked up the comma separated list format fix for CUDA_VISIBLE_DEIVCES. I will quickly adjust the failing test. flux-framework/flux-core#3376

Use a Sierra hwloc xml which has 4 GPUs. Get a nested allocation using flux mini alloc with 2 GPUs and run - flux ion-resource ns-info 0 gpu 0 - flux ion-resource ns-info 0 gpu 1 to check how the self-discovered GPU Id 0 and 1 are remapped to. When we use high Id first match policy, they should map to 2 and 3 because the Fluxion scheduler will select GPU 2 and 3 (i.e., CUDA_VISIBLE_DEVICE=2,3 passed into the nested instance). Similarly, when we use low Id first match policy, they should map to 0 and 1.

codecov · 2020-12-17T18:20:31Z

Codecov Report

Merging #773 (45c4143) into master (a8fe07c) will decrease coverage by 0.0%.
The diff coverage is 69.7%.

@@           Coverage Diff            @@
##           master    #773     +/-   ##
========================================
- Coverage    73.5%   73.4%   -0.1%     
========================================
  Files          79      81      +2     
  Lines        8173    8316    +143     
========================================
+ Hits         6011    6109     +98     
- Misses       2162    2207     +45

Impacted Files	Coverage Δ
resource/readers/resource_reader_base.hpp	`100.0% <ø> (ø)`
resource/readers/resource_reader_hwloc.cpp	`80.2% <40.0%> (-2.5%)`	⬇️
resource/readers/resource_namespace_remapper.cpp	`66.1% <66.1%> (ø)`
resource/modules/resource_match.cpp	`74.8% <77.1%> (+<0.1%)`	⬆️
resource/readers/resource_namespace_remapper.hpp	`100.0% <100.0%> (ø)`

dongahn · 2020-12-17T18:29:05Z

Ok. that did the trick. Thanks @milroy and @SteVwonder for your thorough review!

SteVwonder · 2020-12-17T19:39:59Z

Thanks @dongahn for putting this together!

milroy mentioned this pull request Nov 26, 2020

Implement unpack_at for JGF #775

Merged

dongahn requested review from SteVwonder and milroy November 28, 2020 23:25

dongahn changed the title ~~[WIP] Add resource Id remapping support~~ Add resource Id remapping support Nov 28, 2020

SteVwonder force-pushed the boostrap_from_rv1 branch from 4341eca to 7113842 Compare November 28, 2020 23:29

milroy requested changes Dec 1, 2020

View reviewed changes

dongahn force-pushed the boostrap_from_rv1 branch from 7113842 to 39ab316 Compare December 2, 2020 18:37

codecov bot deleted a comment from codecov-io Dec 2, 2020

dongahn force-pushed the boostrap_from_rv1 branch from 39ab316 to 470e79b Compare December 2, 2020 18:46

SteVwonder approved these changes Dec 3, 2020

View reviewed changes

milroy approved these changes Dec 4, 2020

View reviewed changes

dongahn force-pushed the boostrap_from_rv1 branch from 470e79b to f8bca2d Compare December 14, 2020 08:18

SteVwonder reviewed Dec 15, 2020

View reviewed changes

dongahn force-pushed the boostrap_from_rv1 branch from f8bca2d to 0b4c129 Compare December 17, 2020 10:21

dongahn added 8 commits December 17, 2020 02:22

build: add resource_namespace_remapper to libresource

ca0a7fa

reader: integrate resource_namespace_remapper

5aeacb9

Make resource_namespace_remapper available for all reader classes by integrating it into the base reader class as a public member object.

resource: add gpu Id remapping support for hwloc reader

2fd85b8

Establish GPU Id remapping by using RV1 execution key coming from core's resource.acquire interface. This is used by the hwloc reader to remap the self-discovered GPU Ids, which are logical, into physical.

flux-ion-resource: fix a typo in a command comment

055bfd4

flux-ion-resource: add support for ns-info

9d25ec7

Add a subcommand into flux-ion-resource: ns-info Rank Type Id to get to help invoke ns-info RPC within sched-fluxion-resource.

dongahn force-pushed the boostrap_from_rv1 branch from 0b4c129 to 61b1c39 Compare December 17, 2020 10:22

dongahn added the merge-when-passing mergify.io - merge PR automatically once CI passes label Dec 17, 2020

build: incorporate t1016-nest-namespace.t into Make

45c4143

dongahn force-pushed the boostrap_from_rv1 branch from 61b1c39 to 45c4143 Compare December 17, 2020 18:14

mergify bot merged commit 59210e0 into flux-framework:master Dec 17, 2020

dongahn mentioned this pull request Dec 19, 2020

Test compute core id space is not remapped for nest instances #527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resource Id remapping support #773

Add resource Id remapping support #773

dongahn commented Nov 24, 2020 •

edited

Loading

dongahn commented Nov 28, 2020

dongahn commented Nov 28, 2020

mergify bot commented Nov 28, 2020

dongahn commented Nov 28, 2020

milroy left a comment

milroy Dec 1, 2020

dongahn Dec 2, 2020

milroy Dec 1, 2020

dongahn Dec 2, 2020

dongahn Dec 2, 2020

milroy Dec 4, 2020

milroy Dec 1, 2020

dongahn Dec 2, 2020

milroy Dec 1, 2020

dongahn Dec 2, 2020

SteVwonder left a comment

SteVwonder Dec 3, 2020

SteVwonder Dec 3, 2020

SteVwonder Dec 15, 2020

SteVwonder Dec 15, 2020

dongahn Dec 15, 2020

SteVwonder Dec 17, 2020

dongahn commented Dec 17, 2020

codecov bot commented Dec 17, 2020

dongahn commented Dec 17, 2020

SteVwonder commented Dec 17, 2020

		@@ -0,0 +1,207 @@
		/*****************************************************************************\

Add resource Id remapping support #773

Add resource Id remapping support #773

Conversation

dongahn commented Nov 24, 2020 • edited Loading

dongahn commented Nov 28, 2020

dongahn commented Nov 28, 2020

mergify bot commented Nov 28, 2020

dongahn commented Nov 28, 2020

milroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteVwonder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongahn commented Dec 17, 2020

codecov bot commented Dec 17, 2020

Codecov Report

dongahn commented Dec 17, 2020

SteVwonder commented Dec 17, 2020

dongahn commented Nov 24, 2020 •

edited

Loading