Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to create a local matrix per MPI process #687

Open
Goon83 opened this issue Feb 3, 2020 · 6 comments
Open

Is it possible to create a local matrix per MPI process #687

Goon83 opened this issue Feb 3, 2020 · 6 comments

Comments

@Goon83
Copy link

Goon83 commented Feb 3, 2020

Hi,
Is it possible to create a local matrix per MPI process? It is like MPI_COMM_SELF for file creation. Then each MPI process access its own data.

Thanks,
Bin

@devreal
Copy link
Member

devreal commented Feb 3, 2020

@Goon83 I have not tried this but maybe something along these lines works:

auto team = dash::Team::All().split(dash::util::Locality::Scope::Unit, dash::size());

This should give you a team with only one unit each. You can pass this team to the c'tor of the matrix and that should create independent matrices in each unit.

@Goon83
Copy link
Author

Goon83 commented Feb 18, 2020

Hi Devreal,
I tried the following code but it reports the below error. Any hint to write it in right way?
Followed the doc to any provide a single argument to split. Nor sure it is right or not.

Code:
auto team_local = dash::Team::All().split(dash::Team::All().size());

Error:

 error: calling a protected constructor of class 'dash::Team'
  auto team_local = dash::Team::All().split(dash::Team::All().size());

@devreal
Copy link
Member

devreal commented Feb 18, 2020

Just realized that split returns a Team&. Can you try auto& instead?

auto& team_local = dash::Team::All().split(dash::Team::All().size());

@Goon83
Copy link
Author

Goon83 commented Feb 19, 2020

Hi devreal,
Thanks for the information. I came up with a simple test code but it still complains with some error. See below for details. Did I do something wrong ?

Bests,
Bin

Sample code:

int main(int argc, char *argv[])
{
    dash::init(&argc, &argv);

    dash::TeamSpec<2> teamspec;
    teamspec.balance_extents();

    dash::global_unit_t myid = dash::myid();

    size_t rows = 8;
    size_t cols = 8;

    auto &team_local = dash::Team::All().split(dash::Team::All().size());
    dash::Matrix<int, 2> matrix_local(dash::SizeSpec<2>(rows, cols), dash::DistributionSpec<2>(), team_local, teamspec);

    if (0 == myid)
    {
        cout << "matrix_local size: " << matrix_local.extent(0)
             << " x " << matrix_local.extent(1)
             << " == " << matrix_local.size()
             << endl;
    }

    dash::Team::All().barrier();

    for (size_t i = 0; i < rows; i++)
    {
        for (size_t k = 0; k < cols; k++)
        {
            matrix_local[i][k] = myid;
        }
    }

    for (size_t i = 0; i < rows; i++)
    {
        for (size_t k = 0; k < cols; k++)
        {
            int value = matrix_local[i][k];
            int expected = myid;
            DASH_ASSERT(expected == value);
        }
    }

    int value = matrix_local[5][5];
    cout << value << " at rank " << myid << "\n";

    dash::Team::All().barrier();

    dash::finalize();

    return 0;
}


Error Info:

mpirun -n 2 ./test-dash-local

matrix_local size: 8 x 8 == 64
Fatal error in MPI_Put: Invalid rank, error stack:
MPI_Put(161): MPI_Put(origin_addr=0x7ffeec1e525c, origin_count=1, MPI_INT, target_rank=1, target_disp=0, target_count=1, MPI_INT, win=0xa0000003) failed
MPI_Put(136): Invalid rank has value 1 but must be nonnegative and less than 1
Fatal error in MPI_Put: Invalid rank, error stack:
MPI_Put(161): MPI_Put(origin_addr=0x7ffee3dec25c, origin_count=1, MPI_INT, target_rank=1, target_disp=0, target_count=1, MPI_INT, win=0xa0000003) failed
MPI_Put(136): Invalid rank has value 1 but must be nonnegative and less than 1

@devreal
Copy link
Member

devreal commented Feb 22, 2020

Sorry, I didn't see your reply until just now. The problem is that your teamspec uses a different team (dash::Team::All() is the default if you don't pass any team).

This should fix it:

int main(int argc, char *argv[])
{
    dash::init(&argc, &argv);

    dash::global_unit_t myid = dash::myid();

    size_t rows = 8;
    size_t cols = 8;

    auto &team_local = dash::Team::All().split(dash::Team::All().size());

    dash::TeamSpec<2> teamspec{team_local};
    teamspec.balance_extents();

    dash::Matrix<int, 2> matrix_local(dash::SizeSpec<2>(rows, cols), dash::DistributionSpec<2>(), team_local, teamspec);

    if (0 == myid)
    {
        cout << "matrix_local size: " << matrix_local.extent(0)
             << " x " << matrix_local.extent(1)
             << " == " << matrix_local.size()
             << endl;
    }

    dash::Team::All().barrier();

    for (size_t i = 0; i < rows; i++)
    {
        for (size_t k = 0; k < cols; k++)
        {
            matrix_local[i][k] = myid;
        }
    }

    for (size_t i = 0; i < rows; i++)
    {
        for (size_t k = 0; k < cols; k++)
        {
            int value = matrix_local[i][k];
            int expected = myid;
            DASH_ASSERT(expected == value);
        }
    }

    int value = matrix_local[5][5];
    cout << value << " at rank " << myid << "\n";

    dash::Team::All().barrier();

    dash::finalize();

    return 0;
}

We should think about changing the pattern and matrix interface to disallow passing a team and a teamspec to the c'tor to prevent this kind of ambiguity.

@Goon83
Copy link
Author

Goon83 commented Feb 24, 2020

Hi @devreal ,
I copied and pasted the new code and it still reports the below error. Do you have idea why the error happens. Thanks.

Bests,
Bin

[    0 ERROR ] [ 92091521.343 ] dart_globmem.c           :432  !!! DART: dart_team_memalloc_aligned_full ! Unknown team -1
[    0 ERROR ] [ 11536 ] AllocationPolicy.h       :176  | GlobalAllocationPolicy.do_global_allocate(nlocal)| cannot allocate global memory segment 256
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
Abort trap: 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants