Commit cf886d5
committed
Import ULFM Fault Tolerance
The historical repository with full history and attribution is available
at https://bitbucket.org/icldistcomp/ulfm2/src/ulfm/.
Squashed commit of the following:
commit 73b6fa48c8af40bfa28e24f6c79176a254c449be
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 14 19:21:20 2020 -0400
Typo in comment for non-blocking error check
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 3a9fd329e35564af826c81aae18d4df4eebbd275
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 14 19:19:08 2020 -0400
Do not iface_check in non-blocking and never set MPI_ERROR in single
status functions
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit a9913a4777e0d7d78ff9ead0a51e807316f01d2f
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 14 18:12:08 2020 -0400
Remove iface_create_check on intercomm creations
commit 99ea1398127c51ada0179ab1737f2134ee0de8ff
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 14 17:43:41 2020 -0400
Update README to denote supported/unsupported components and default
settings
commit 59110aa35fa465cddf65e2937066928e45a685c0
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 14 13:22:14 2020 -0400
Do not disable compile-time components
with_ft is on by default at configure time
enable_ft is off by default at runtime
have a --tune file to control the behavior of loaded components
disable runtime loading of MTL and PML components and hcoll when FT is
on.
commit 66566b63f1dd9eae633d57c1f3cca57c78978a22
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed May 13 04:34:50 2020 -0400
Correct error path in comm_spawn
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 9a5cb3cb79ab4321a14425a422f68d336b4681ab
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed May 13 02:39:43 2020 -0400
Remove extra ompi_request_t fields (tag, peer, any_src_pending)
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit b1dda7c8d51c66f10dadcea676d7e5622b549a18
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 12 14:32:39 2020 -0400
Cleanup ftagree (FAILURE_PROB)
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit c05bf3ef14ac8d5b55f936bd2ff7680575a1d019
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 12 12:27:30 2020 -0400
Remove the need to modify every coll component to add agree
Rename coll_agreement to coll_agree (to match existing practice of
matching the MPI name)
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Copyright cleanup in unchanged files
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit cf0461886a9318ac0b87c73f2c2a1868b9481be6
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 12 02:30:27 2020 -0400
Copyright cleanup
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 61eb3b3163011769a020d2a714085380e8b6d8b3
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 12 01:46:39 2020 -0400
Round 1 of review comments
commit 64d956017415bf40397a12f039e62211e57c5c56
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon May 11 00:34:23 2020 -0400
Revert changes to version and README for standalone ULFM packaging.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit cd5c5ed41b3dcd4162632c47ac500daf4cc5216f
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Sun May 10 23:58:33 2020 -0400
Revert "Restore ulfm specific changes to openib btl cancelled by merge 4ce1669a"
This reverts commit f2b7da5d488f1b1d27c6a8643128a10eadd86f67.
Revert "Revert "platform: Remove "with_verbs" from all the platform files.""
This reverts commit 74d9c41e32e5b0c7fdb720156091a1eb49c03537.
Revert "Revert "README: Remove all references to --with-verbs[*]""
This reverts commit 385dbd0dad512245e9197af98244ac970f3d956e.
Revert "Revert "opal/common: remove stale common components""
This reverts commit 0c3a306c695eb12d489b9fdbfa4ec6262935e7c1.
Revert "Revert "m4: remove all configury related to libibverbs""
This reverts commit f8f1b8537fd929a4fc1432936a71d7f2def41bbd.
Revert "Revert "btl/openib: So long / farewell / it's time to say goodnight""
This reverts commit 4a82cca865ac043e8aab75356ed78786115b52ef.
commit f627b1c53de171dd6551e8b00fb5907715364939
Merge: fb3507a1 9996b9f5
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 14 21:20:59 2020 -0400
Merge branch 'master' into ulfm-prrte
commit fb3507a19183fe4293dad1d0d432641a11640a89
Merge: 0823ee3e 0dc23252
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Sun May 10 23:07:42 2020 -0400
Merge branch 'master' into ulfm (orte removal)
commit 0823ee3e57d24d11ee1c8ba232c601707645a7a8
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Jan 31 16:26:00 2020 -0500
An error in readme about Agree: it does a AND
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 322684e42d99e28964678c9f54a0de570dd47f39
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Jan 31 15:02:23 2020 -0500
Change verbosity in agree to help track split-decision bugs
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 67ca89e04d4b5452fc5871d823e00ae5f6e247bb
Merge: d4ff45bd cf4398e2
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Jan 31 18:54:20 2020 +0000
Merged in abouteiller/ulfm2/bugfix/era_thread_safe2 (pull request #21)
Thread safe access to era_incomplete_msg and passed_agreement hash-tables
Approved-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit cf4398e2a0431386b2216ae73e4251c0978143bc
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 30 15:38:08 2020 -0500
Thread safe access to era_incomplete_msg and passed_agreement
hash-tables
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit d4ff45bdf4aad071d3f1abddda9ac3576a83741e
Merge: cdd2f6b4 12757660
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Thu Jan 30 18:47:34 2020 -0500
Merge remote-tracking branch 'upstream/master' into ulfm
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Conflicts:
ompi/include/mpif-values.pl
ompi/mca/coll/libnbc/nbc.c
ompi/mca/pml/ob1/pml_ob1.c
ompi/tools/ompi_info/param.c
opal/mca/btl/tcp/btl_tcp_endpoint.c
opal/mca/btl/tcp/btl_tcp_frag.c
opal/mca/hwloc/hwloc2/configure.m4
orte/mca/odls/base/odls_base_default_fns.c
commit cdd2f6b43961857cf4c84c27de608c7462e37919
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 30 14:14:45 2020 -0500
Update VERSION to the new numbering scheme :v4.1.0u1a1: alpha 1 of the
first release of ULFM based on (unreleased, devel) v4.1.0
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
commit b8da0edf73b446cc2aa59f0f86b48c925d3add37
Merge: e5c6c5e6 c6ade8fa
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 30 04:32:15 2020 +0000
Merged in abouteiller/ulfm2/bugfix/concurrent-tcp-close (pull request #16)
Do not close the socket meanwhile the opal_progress loop is adding events to the event base
commit e5c6c5e6f240260514e08e130177e7f86f2246ee
Merge: c2212cb0 227a6779
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 30 04:31:29 2020 +0000
Merged in abouteiller/ulfm2/bugfix/openib-noproc-error (pull request #20)
An error without an errproc is always promoted to fatal, which causes pandemic failures when openIB credits to a dead peer exhaust.
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit c2212cb0fd4ed8a54b36f02f9cb234cd1df2ac69
Merge: 43c1d324 2510df24
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 30 04:30:42 2020 +0000
Merged in abouteiller/ulfm2/bugfix/recursive-era-mark-failed (pull request #19)
Resolve recursive and multithreaded access to the era
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit 2510df24a73ba5a563537e0c44b6249f163679cd
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Jan 22 15:33:48 2020 -0500
Resolve recursive and multithreaded access to the era_parent and
next_child functions causing inconsistent agreements
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 227a67797859e176336a7033b1bf9cb0f94584c7
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon May 20 12:08:18 2019 -0400
An error without an errproc is always promoted to fatal, which causes
pandemic failures when openIB credits to a dead peer exhaust.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 43c1d32448e64ff2bd322b206d82b27e75033fd8
Merge: cf8dc43f a36f138a
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Jan 22 20:55:33 2020 +0000
Merged in bugfix/sync-mt-waitall-any-some (pull request #18)
bug fix SYNC_WAIT with threads in WAITALL and friends
commit cf8dc43f907353b40b42aaf7318e05b49e7243a5
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Sun Nov 17 12:00:54 2019 -0500
Close the detector before removing the bsend system, but after deleting Self attr
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 6e386e4d66288f68e8c12a81ede76b7cceb86471
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Sat Nov 16 22:13:05 2019 -0500
Cleanup asserts and add some more debug messages
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 593db6aca8dd89997eb6787cad409778e11ef0b8
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Sat Nov 16 22:08:50 2019 -0500
Return a revoke error only when comm is revoked
commit a36f138a911a457fae57366bbbb501eb1efe77ee
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Nov 13 17:36:30 2019 -0500
Fix a case were the SYNC_WAIT would be rearmed while it was unsafe
w.r.t. a progress thread, and cases were the SYNC would be released
before being SIGNALED.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 791214b118570df301c6cbe47ad291a54bc21ab8
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Nov 13 17:25:14 2019 -0500
Be more verbose about having a progress thread in the detector.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 0e249ca1ae5cb27a3f3d907173b65db188380ce5
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Nov 12 17:28:57 2019 -0500
Remove the pending event when socket is TCP_FAILED
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit f363e250686cc299631fa26a2cc92e3f2dc9e5d6
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Nov 12 17:20:52 2019 -0500
Fix a set of issues with Agree
commit c7473b5d227a74f28a7fa4a6019f498e06d20b34
Merge: 897b87a0 88c18329
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Nov 11 19:54:55 2019 +0000
Merged in abouteiller/ulfm2/sanity/dont-mark-myself-failed (pull request #15)
Do not mark myself as failed, this is never normal
Approved-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 88c18329e525ad7cf5648c10e20a59add0073c11
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Nov 8 16:16:30 2019 -0500
Do not mark myself as failed, this is never normal
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit c6ade8fa34d8545f17afde35eda67ab4ceedc3f2
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Nov 6 14:01:01 2019 -0500
Do not close the socket meanwhile the opal_progress loop is adding
events to the event base
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 897b87a0d680c3604756309ef78c368675eb884c
Merge: 94391d9e 82c9b479
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Nov 11 19:20:07 2019 +0000
Merged in abouteiller/ulfm2/bugfix/mt-sync-revoked (pull request #17)
Bugfix/mt sync revoked
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit 82c9b479ed4656696e3a1217405847c68ddc2575
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Nov 8 18:03:26 2019 -0500
Do not add more requests to the matching queue after the comm is revoked
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 4b01a5764869dff4f922903283053784f5a42301
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Nov 8 17:56:37 2019 -0500
Bugfix: we need to check if the request if ok before entering the first
waitsync_mt
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 94391d9e38ad53ce55bc2764ed910b329ef4b92f
Merge: eb275c65 f7b5b637
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Nov 5 14:05:08 2019 +0000
Merged in abouteiller/ulfm2/bugfix/fd-drift (pull request #14)
reduce the sensitivity fo the detector to noise and drift
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit f7b5b63763b974cf06372645cff3e044a4a53165
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Nov 4 10:58:06 2019 -0500
reduce the sensitivity fo the detector to noise and drift
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
commit eb275c655dee7ee7d18fe24004a3d37bfd25a8c2
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Oct 18 17:02:09 2019 -0400
Document why an assert may trigger in false-detection scenarios
commit 52c2a5d710f80c0d26bf1cd7c42f7cbd58cc1e24
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 16 15:06:02 2019 -0400
Use the correct option to force internal pmix/event
commit bc69fd1bd1acf3a778b11c13f667ca3b972f1610
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 15 14:44:20 2019 -0400
We have modifications in pmix and libevent, prefer the internal ones
commit 617e2b4c9ce27c24d3c8eb6c8aa539884904a65c
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 15 14:43:16 2019 -0400
Bugfix a case where the FD would keep observing a dead process forever
if reported from inline (rather than by the detector itself)
commit b54585d832588258277ea4d16d519c6a46439260
Author: Nuria Losada <nlosada@icl.utk.edu>
Date: Tue Aug 6 10:18:52 2019 -0400
Avoid cleanup of job_session_dir and orte proc_session_dir upon application process failure
commit f8d536027988500abb87adc22fa147be6d3eda7e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jul 11 14:44:05 2019 -0400
Cleanup rdma_frags and registrations in revoked/error sendreqs
Free up rdma_frag in sendreqs when the request is cancelled in error or
revoked.
Return registrations for cancelled/revoked sendreqs
Remove dead/useless code
commit 6c76e287178d42d7dfd1e50e6be4ba18a86a06a1
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jun 13 04:19:44 2019 -0400
Missing semicolon appears only when fotran logical needs conversion
commit 92e108f9ae1e4ffb129086ada8d4a7643ee8c708
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jun 13 03:29:28 2019 -0400
A bug in PMIx disables node-local detection, use the OMPI detector
instead
commit 4dcf700e1a49479d1df4693b32cdc5cd187ec056
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri May 24 14:28:15 2019 -0400
Do not send rbcast to known dead processes to avoid paying the
send-detection penalty
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 6f375bff8e2d893343064e51bc01b6806d166d1c
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed May 22 13:59:22 2019 -0400
When receiving a wrong heartbeat, ignore it rather than rearming
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 027afa741bf99481e7b1c2ad66579fd611190489
Merge: 08122763 b7806672
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 21 17:32:53 2019 -0400
Merge branch 'master' into ulfm
commit 081227637a652b7b82103697c0b7c353ad58e220
Merge: 6f002936 aa5e5a65
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Apr 23 20:57:35 2019 +0000
Merged in abouteiller/ulfm2/merge/postopenib (pull request #12)
Merge/postopenib
Approved-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit aa5e5a65e4e02931b6239749a1d1671bd407f655
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Apr 3 17:33:32 2019 -0400
Let errors flow through spawn/connect accept in order to make sure we do
not end-up in unmatched mpi calls in error cases
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit edf0086d55ad955b26336bd96d131482dbb88ef4
Merge: 0fe172d9 97b7fab8
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Mar 25 11:14:23 2019 -0400
Merge branch 'master' into merge/postopenib
commit 0fe172d9bf5cf7e9f82c951004ce32ffd8cc2955
Merge: f2b7da5d 53cd31ed
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Mar 22 00:46:18 2019 -0400
Merge branch 'master' into merge/postopenib
commit f2b7da5d488f1b1d27c6a8643128a10eadd86f67
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 13 15:08:06 2019 -0400
Restore ulfm specific changes to openib btl cancelled by merge 4ce1669a
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 74d9c41e32e5b0c7fdb720156091a1eb49c03537
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 13 14:57:55 2019 -0400
Revert "platform: Remove "with_verbs" from all the platform files."
This reverts commit 99553eb1b9b2a6300525e06114b38c1c091f23e8.
commit 385dbd0dad512245e9197af98244ac970f3d956e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 13 14:57:47 2019 -0400
Revert "README: Remove all references to --with-verbs[*]"
This reverts commit 48a33ee6db06df1426d3ab9fa4adb2c6d182f8d3.
commit 0c3a306c695eb12d489b9fdbfa4ec6262935e7c1
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 13 15:25:21 2019 -0400
Revert "opal/common: remove stale common components"
This reverts commit 3f4af8e51ca70f7ca0e46b734f3e11e513b858dc.
commit f8f1b8537fd929a4fc1432936a71d7f2def41bbd
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 13 14:56:52 2019 -0400
Revert "m4: remove all configury related to libibverbs"
This reverts commit 59c8ab6da4276ff398453a54910c6c0fb67a153c.
commit 4a82cca865ac043e8aab75356ed78786115b52ef
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 13 14:56:10 2019 -0400
Revert "btl/openib: So long / farewell / it's time to say goodnight"
This reverts commit 8de786f5a40ab96069b9c661d6ea8bb892688cac.
commit 4ce1669a7463280528473eeb69e59dc360f75a31
Merge: 6f002936 01737960
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 13 14:54:24 2019 -0400
Merge branch 'master' into merge/postopenib
commit 6f002936fc1d08dc3d82190c6997a910b655b59d
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Sat Mar 9 10:02:59 2019 -0500
Suppress the not useful gotos for error cases that cannot happen
issue #40
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
commit 67ae93928ebac0eafd0948cdd5602854fa2d6f07
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 7 14:36:28 2019 -0500
Resolve deadlock in MT wait-sync rearming post-error
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 804bb69340ca1500828a78f91917a2ea155f256e
Author: Thananon Patinyasakdikul <tpatinya@utk.edu>
Date: Tue Jan 29 13:34:44 2019 -0500
opal/threads: reverted #6199
This commit reverted pr #6199 as it introduced deadlock in some cases.
Also removed the assert as the condition is obsoleted.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
commit b7f8c6ffc361d7753abc9b76093582f6f98b52e3
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 6 14:33:10 2019 -0500
Rename ftbasic to ftagree
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
commit 8b057449f1950e3ff79fd8592a82db78e533948b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Feb 22 19:10:35 2019 -0500
Simplify generation of PMPI_xxx_f
Fixup ompix_xxx in fortran pmpi interface
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 6979f860d08c27aa6dc6a7c6f1ade171bc0c01bf
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Thu Feb 21 22:03:22 2019 -0500
Fix the warnings in the Fortran API.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit 11deb93207d786488789811f6641cb68003a9e40
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Feb 21 19:56:33 2019 -0500
Erroneous modification in typedef for rdma heartbeats
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit c0f544b690e850ff8ec164ee90ab0dd006f0e941
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Thu Feb 21 19:56:09 2019 -0500
Prevent EPIPE on OSX.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit 96be67d66ff6e7656c879ddf0c2605a86f45cf3c
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Thu Feb 21 19:52:52 2019 -0500
Address a race condition in libevent select.
This is not really a fix for the race condition because I could not
figure out how it happen, but it does address the problem generated by
the race. If we do not remove a bad fd from the select list we keep
getting the same error from select, and we stop doing any progress on
the communication side. Thus, we forcefully disable all bad fd as soon
as select fails, and we are back in track, progress ensure and
everything seems to work as expected (no leftover events in the event
base).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit eab20ba06442936293d21cae78e03c7c68f500b3
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Feb 21 19:33:54 2019 -0500
resolve pedantic warnings in PMPI fortran ulfm bindings
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit eb55ffb189cbb77a52f38943ab44427752f4af39
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Feb 21 19:00:32 2019 -0500
Remove pedantic warnings in ERA agreement
commit eb85245b30f5cb885a87da40b8f671d56cc6236b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Feb 21 17:58:36 2019 -0500
OPAL_ENABLE_MULTI_THREADS does not exist anymore
also fix a number of warning in enable-picky in detector/propagators
commit 04b0a92b540b2163b37f840bc3f35b2992567de4
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Jan 4 15:44:40 2019 -0500
The order of the attribute creation is important
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit c87d9483ad9799b6d3b7a6d48770ee2fd74b7855
Merge: edf88350 8a18a831
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Jan 4 13:35:19 2019 -0500
Merge remote-tracking branch 'ulfm2/ulfm' into ulfm
commit 8a18a831dab6161e19b64f17c7640b8eb3a03188
Merge: d19c4a82 8ad77b66
Author: Nathan Weeks <weeks@iastate.edu>
Date: Fri Jan 4 18:23:06 2019 +0000
Merged in nathanweeks/ulfm2/issue/use-mpi (pull request #11)
Fix INTENT of flag argument to MPIX_Comm_[i]agree
Approved-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 8ad77b66a9d45dc8c73c25e0a321725d8e8b0689
Author: Nathan Weeks <weeks@iastate.edu>
Date: Fri Jan 4 10:18:36 2019 -0600
Fix INTENT of flag argument to MPIX_Comm_[i]agree
Signed-off-by: Nathan Weeks <weeks@iastate.edu>
commit edf88350a8b46fe92cf40a72266685ecbbeccad3
Merge: d19c4a82 0dc0d77b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 3 13:51:47 2019 -0500
Merge branch 'master' into ulfm
commit d19c4a82df7d79285aa5d39cbb2ea1507898f65f
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 3 12:17:59 2019 -0500
Handle the case where the bridge comm is revoked in get_rprocs
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 383b889df896e5059c2542b439bfb7f6846c4422
Merge: 2c536936 ce61988c
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 21:40:37 2018 +0000
Merged in abouteiller/ulfm2/feature/isrevoked (pull request #9)
Adding 'is_revoked' functions for communicators
commit ce61988ca8ed085ae999fa6866b5459d8952c756
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 16:34:05 2018 -0500
Correct F08 and other bindings for is_revoked
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 6c7f413ad17c3232c811b14ffa00ddeb3d2dd1c4
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Mar 26 12:28:30 2018 -0400
Adding 'is_revoked' functions for communicators
commit 2c536936a337d2e7508213a95724bf8f9c9c6239
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 15:26:44 2018 -0500
Rename README to README.ompi
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 9f2d068ee078fa2aaba725010d0cb70b4c5ddb3c
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 15:24:32 2018 -0500
More README renaming for Bitbucket
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 9861c014cb5f19b356a67982c22295fd1da7fc8d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 15:14:01 2018 -0500
Move the Open MPI README so the ULFM readme gets rendered from the
bitbucket page
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit e8127fc61c0ed677c1061e3e788623e61299992c
Merge: ec5675fc cc16badc
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 19:57:21 2018 +0000
Merged in abouteiller/ulfm2/topic/usepmpi (pull request #10)
F08 and PMPI for the ftmpi bindings
Approved-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit cc16badc25a81f05c7e9c0dd646d5b1dd1599d8c
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 01:47:40 2018 -0500
Add PMPI F08 ftmpi bindings
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit cd2850fdadb1a0c36dc370f7991ea8f86e1c626a
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 01:15:13 2018 -0500
Correct fortran ftmpi bindings w/o weak symbols
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 866d91f2b7cf9a58c2740dcfb3d884451756965d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Dec 21 00:19:37 2018 -0500
Upgrade mpiext ftmpi to the new PMPI generation system:
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit ec5675fc533cc921a4565e8bde28238dcbfdc6ce
Merge: dbcfc7a9 14eec9a3
Author: Nathan T. Weeks <weeks@iastate.edu>
Date: Fri Dec 21 07:10:49 2018 +0000
Merged in nathanweeks/ulfm2/feature/mpi_f08 (pull request #6)
Add mpi_f08 bindings for ULFM routines
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit dbcfc7a986eba5dbc6ce7c590b232697739567b2
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Dec 18 13:54:08 2018 -0500
Upgrade the ftmpi extension to the new naming scheme; restore pcollreq
since it does not cause problem anymore
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 5170d9cb7f12ca882790c22544ef18448ceb3860
Merge: f00c5732 6f5f3110
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Dec 18 11:26:36 2018 -0500
Merge branch 'master' into ulfm
commit f00c5732902e2d8cbd033083248b1b9cca992d5b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Sat Nov 3 11:29:03 2018 -0400
Disable pcoll for the time being it breaks the fortran bindings
commit e24ddc24977e91a44fbcf352dd3156cc7eb35e0c
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Nov 2 00:44:47 2018 -0400
update version string and changelog
commit 6304043d40daf6759960814975e0f964f3c117bb
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Nov 2 00:43:27 2018 -0400
Set sane default components
commit bbb19203bda985f96ec608b9e24178e74926b540
Merge: 77f9157e 37954b5f
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Nov 1 15:18:45 2018 -0400
Merge branch 'master' into ulfm
commit 77f9157ea7dcb5c2b517455c9e249b6b8068fa5d
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 31 12:51:11 2018 -0400
Resolve a recursive destruct on the iof proct in finalize
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
commit 3ef11c7d09adaa47d76db72dc58a661b89e571fd
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 24 02:03:24 2018 -0400
Prevent errmgr invokation from crashing in finalize
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 86985a5b61e2ccc60bbe938e81d947684d12c8f2
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Jan 26 15:23:19 2018 -0500
Re-add the Handle error cases in TCP BTL rejected in upstream
When an error is returned by the socket operations, trigger the
appropriate error path in the PML to give an opportunity for
rerouting/error handling.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 33b8fce232b233a3b0ed519802eb15eb7e5995ab
Merge: 6566fc4c a1e85b03
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 30 17:04:11 2018 -0400
Merge branch 'master' into ulfm
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 6566fc4c68ff0d89d68abdfd8382b411104b47d6
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 23 22:42:35 2018 -0400
Correctly propagate the oversubscribe flag to the spawnees
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
commit 07df428c2f82718133d707c5f017f417c07e3bd8
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 22 15:38:31 2018 -0400
The error field of requests needs to be rearmed at start, not at create
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 359f044b4d2cac87fcbb55411c642bb108dcf720
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 22 11:25:01 2018 -0400
Correctly bubble up errors in NBC collective operations
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 9579efaeca2ccdfb553cbf122755571e8af970fe
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 22 11:17:00 2018 -0400
Bugfix a debug statement calling pml dump
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 428f3506927497ed09f7ad1d97c0e5fbfb4adf67
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Oct 18 10:56:44 2018 -0400
Disable inband PML error reporting during MPI Finalize as it interferes
with the Finalize process. A better fix is being worked on upstream, but
lets have it work in the meantime.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit ce72ffb4a76e6d33f4e12f8aa4cba93115009c2f
Merge: d9284a60 69f9da91
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Oct 4 12:38:26 2018 -0400
Merge branch 'master' into ulfm
commit d9284a6005c2e2c615d19903a6d819f126d735c7
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Sep 26 10:52:29 2018 -0400
A pmix_3x constant was still present.
commit bc26604d3ed16b73ff8f1f756adf965d194272fe
Merge: 908eead4 3f598e9e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Sep 24 17:40:15 2018 -0400
Merge branch 'master' into ulfm
commit 908eead4aedf95a5e565bf4f9af5ac2ccd2494f9
Merge: 70ee1f45 1ca6f38e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Aug 7 13:30:54 2018 -0400
Merge remote-tracking branch 'ulfm2/ulfm' into ulfm
commit 70ee1f452b40f0ac7e2b319cfc478859a3fffe21
Merge: e87f595e ae030146
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 6 14:01:18 2018 -0400
Merge branch 'master' into ulfm
Heavy modifications in nbc error management and coll tags
commit 1ca6f38ea8a3d0d26efd4a7e755c7edc17bc8e47
Merge: e87f595e 4d129617
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 1 14:08:46 2018 +0000
Merged in abouteiller/ulfm2/feature/pubsub (pull request #5)
Do not disable publish/subscribe for no good reason: these are local operations.
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit 14eec9a3d164cc68d92844fc219f0664aa36fd90
Author: Nathan T. Weeks <weeks@iastate.edu>
Date: Tue Feb 27 18:56:56 2018 -0800
Add mpi_f08 bindings for ULFM routines
Signed-off-by: Nathan T. Weeks <weeks@iastate.edu>
commit e87f595e6bf1ab2366c10f05d3aac0217079d68c
Merge: 63e0514d df0ccbee
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 1 11:07:05 2018 +0000
Merged in abouteiller/ulfm2 (pull request #8)
Ulfm
commit df0ccbeee3727663a9ddb1a39ca670343f004bb9
Merge: 63e0514d 9944d63d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 1 05:38:53 2018 -0500
Merge branch 'master' into ulfm
commit 4d12961757171b1aa28b67efc9a40d24266d9998
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Feb 21 19:02:42 2018 -0500
Do not disable publish/subscribe for no good reason: these are local operations.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 63e0514db046de8665f2f3510fab7e739a93a7c2
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Fri Feb 16 01:55:29 2018 -0500
Fix usage of OPAL_ENABLE_FT_MPI.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit cec02d4408489cc24ae5d4dd69476d6e33c5fab9
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Feb 14 16:18:57 2018 -0500
bugfix: missing declarations for *ft_register_params
commit 6006795e842354b2bbf9308ee119e2dcaf1848a7
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Feb 14 16:18:16 2018 -0500
NBC_Error does not have an int as first param
commit ac6bb3ea190e3f441d025d398a711dbd22e2a4b3
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Feb 13 17:45:58 2018 -0500
Further tuning of the timeout default value for the thread detector
commit 577c61693c4d10dded6c5d4e4f909caf9794bad3
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Feb 12 14:53:15 2018 -0500
Wrong number of params to NCB_DEBUG
commit e6cf7dc044a9f84aaab4c41ebfab27029f12972e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Feb 12 14:52:59 2018 -0500
wrong encoding
commit 228c12add80446de2220f8f9761ff260a3cd2034
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Feb 12 13:11:16 2018 -0500
Expose the FT and detector controls to the enduser in ompi_info
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 713c94e85a141772fad8a4cb2842e643b9f22716
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Sun Feb 11 22:23:38 2018 -0500
Fix ULFM profiling.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit 7a42d912261b62082b9e8d8e6586ba4f3dac8ee9
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Feb 1 01:43:02 2018 -0500
Erroneous merge in comm_cid: uninitialized epoch
commit 8e940d2938e4dc236bd4acfae4e3678de9a71810
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Mon Jan 29 13:48:13 2018 -0500
Minor fixes to make clang happy.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit 11e6355b5a4aeacdb19d9b3dd6c4bd7863834cb2
Merge: 17d0158a 5b0df815
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 25 11:03:42 2018 -0500
Merge branch 'master' into ulfm
commit 17d0158a45fb08fcad202a9352729fae829f68d1
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Jan 17 16:35:28 2018 -0500
bugfix: any-source request completed meanwhile it was reported PROC_FAILED_PENDING needs to see its status rechecked
commit 51bbd220c75ca59f230e9729836dcc33a20313a6
Merge: 199f5f0d f3a096dd
Author: Nathan T. Weeks <weeks@iastate.edu>
Date: Wed Dec 20 00:31:59 2017 +0000
Merged in nathanweeks/ulfm2/issue/comm_failure_get_acked-f90 (pull request #3)
Correct type of MPI_Comm_failure_get_acked failedgrp argument in Fortran USE mpi interface
Approved-by: George Bosilca <bosilca@icl.utk.edu>
commit f3a096dda733cbdd3f91524fd9973af5ba41e7d1
Author: Nathan T. Weeks <weeks@iastate.edu>
Date: Tue Dec 12 19:18:41 2017 -0800
Correct type of MPI_Comm_failure_get_acked failedgrp argument in Fortran USE mpi interface
commit 199f5f0d2d6139460d0461cbf4b374d117dac4f6
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Nov 20 15:19:48 2017 -0500
Make sure we mark the proc as WAITPID status in signalled and non-zero exit cases
commit e3006cafe4f9e4e55774679199b94b1e3d24ca5d
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Fri Nov 3 23:47:16 2017 +0000
No accents in the names
commit 2e75c73cc620eceb7396e9aac77a13e235c2a77b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Nov 3 18:59:52 2017 -0400
Tweak default FD and update readme notes
commit f4bd88c98f1936a609e9145cd506b22a5722fa90
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Nov 3 18:59:22 2017 -0400
Pass correct arguments to pmix cb when out of memory
commit 87d50db1d34695a97de094977f7fa9163c35b14e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Nov 2 10:48:10 2017 -0400
Changing the default IB retry timeouts is not a good idea.
We'll need to find another way to speedup credit recovery in failure cases.
commit 2fb5440a589baf8666f6cf30992b3a3bd04a6aca
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Nov 1 10:07:39 2017 -0400
Mark the IB endpoint as failed when invoking an error; this resolves UDCM connection deadlocks
commit 79aca0bb799f90f53c949e161b9f173c1fca2996
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 31 23:20:31 2017 -0400
Make it compile in non-debug builds
commit 04f61d22769f13adcfec822f83bc5ec079501a62
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 31 22:55:51 2017 -0400
bugfix: major: openib send credits returned correctly after a fault for pending frags to dead processes; also tweak the default IB retry timeouts tomake this happen faster
commit 942b0ab8bd8fc5f9e0b39312553c3a42228720c4
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 31 22:02:24 2017 -0400
Bugfix: leaking frags after failure in TCP btl
commit 6db29438a0299f779b59972ae6528a035ff56348
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 30 21:34:04 2017 -0400
Copyrights since 1624f1f5
commit 5dd7d6fc35e1398e12338ecc49eadf30aa818a8d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 30 21:21:37 2017 -0400
bugfix: returning ERR_PROC_FAILED from iSend violates ULFM spec.
commit 9bf3923d51dcf876f1c20a01757cd94dbde9022a
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 30 17:01:33 2017 -0400
Bugfix to upstream: do not return ERR_IN_STATUS from collectives
commit 954cd2f53e9c2985a21bbb1fc374b83678df8f8c
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 30 16:28:38 2017 -0400
Bugfix: capture cases where ERR_UNREACH is returned instead of PROC_FAILED when the BTL finds the failure first
commit 61c5954fc1aff273a40c213d38e850862e9bf7e7
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Oct 27 17:30:23 2017 -0400
Fix error cases in TCP connect_ack
commit 0237a70791b7b9d6f8b657e1a647b3b0dfab935f
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Oct 27 17:27:48 2017 -0400
Various fixes to orte/pmix so that late notifications do not crash during finalize
commit afe72afab6f873a66c9f257ac8d1e36f32627882
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Oct 27 17:24:29 2017 -0400
Turn of ftmpi_enabled after the FD is turned off.
commit 9712330b37fb8d5b7f1f77e79efe0a0f6c695ade
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Oct 27 17:07:55 2017 -0400
Fallback to abort when pml finds an error and ftmpi_enable is false
commit a9ec68580d3fddd436b5df3b31e0621ba5d11f77
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 25 16:46:42 2017 -0400
Bugfix: interrupt operations on localcomm in failed/revoked intercomms
commit 8bacc1491355d4369251d45fa2e9db0e7647d05e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Oct 19 12:58:11 2017 -0400
Adjust init slack
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit 1624f1f521bcd24978370ce614889fb01841ea8c
Merge: 768e6f5c 689f1be9
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Oct 19 12:23:44 2017 -0400
Merge branch 'master' into ulfm
commit 768e6f5c563bc4575fc3dd50313d0136958dd863
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Oct 19 12:19:28 2017 -0400
Resolve a case where the detector creates an event with infinite period
commit 252544f8e4493ac5c2478f6d5322757168a67869
Merge: e3fff257 27eb401a
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 16 15:54:06 2017 -0400
Merge branch 'master' into ulfm
commit e3fff257517996f5758cedb7c6f6082f9e18a6da
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 16 13:37:59 2017 -0400
Disable XPmem as it doesn't work with recovery
commit d105a9f951a27bde804f3b9398e1e97acf894763
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Wed Oct 4 19:41:09 2017 -0400
Pass OMPI CFLAGS to libevent.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit 250892aaa815e4f5b2e9692dd51f81fc4f47b733
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 3 19:55:13 2017 -0400
Bugfix: permit detection of multiple failures on the same node
commit 914fcbda90ac1b00d47dea7808e7cdfb48e73bba
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 3 11:16:36 2017 -0400
File had been added by mistake
commit 9540a2c7ccb901119bebbc0be6edc9b0e6b86c76
Merge: 16221bf5 a3ac67be
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 3 10:16:20 2017 -0400
Merge branch 'master' into ulfm
commit 16221bf5d7c312532230b2fabb891791327c5118
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 2 13:33:30 2017 -0400
Bugfix: cleanup half created comms when failures strike in comm_dup and friends
commit d04eb935478fa3afc1975aa7de0119d398e9772d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Sep 29 00:20:58 2017 -0400
Silence too verbose messages in libnbc
commit 3ab5df55dbd087423acb7c87ba34ada99a6752b6
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Sep 28 23:40:12 2017 -0400
Interrupt the getnextcid_nb when a failure disrupts it.
commit 2609388abeaadcaf6095130499c60bfc46ba4a00
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Sep 28 23:39:21 2017 -0400
Propagate error codes from NBC to upper layers.
commit f679439e032eb3f03dc9afcdd62c2eae686bdb46
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Sep 27 17:52:30 2017 -0400
Start from known failures rather than acked failures in comm_free agree
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit b63b7c15a1139395ff56f3fb448efea56dc7de91
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Wed Sep 27 01:08:05 2017 -0400
Use the correct header.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit b4535b770b197e6c278340ffbff5891401e294c0
Merge: d888d603 7cb22e1b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Sep 25 23:18:14 2017 -0400
Merged perf/shrink_remembers into ulfm
commit 7cb22e1b6bec7b3fd71aeff0bc7d737a5838dabe
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Sep 25 23:07:46 2017 -0400
Perf: start shrink from known failures
commit fecf5707a2882701f9435b25a487e1cb1aa8be9b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Sep 25 23:07:02 2017 -0400
Bugfix: revoke should not revoke NBCs pertaining to shrink
commit d888d6035f5b9e41ef39b76a1709522b3652f890
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Sep 22 17:50:42 2017 -0400
Perf: decrease fd_finalize duration
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
commit b064faf15c6349ffd5e4bf51b72960a77a7cfbf7
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Sep 22 11:56:19 2017 -0400
Bugfix: deadlock in finalize may happen if the fault detector is turned off while the last ERA is ongoing
commit 024b90109cec452a249b0e2abee8b1c947141650
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Sep 22 11:54:46 2017 -0400
Bugfix: thread safety needs to reload and recheck the proc when observer changes
commit 9eb779f8c8e9d3a53c9c159944fa83613be9e0e0
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Fri Sep 22 12:41:08 2017 -0400
Support barriers with 1 proc communicators.
Make sure the barrier supports being called with a
communicator of size 1.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
commit f403bef6c2ec9f757881b13deaaed4c790b6bcf7
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Sep 21 16:15:46 2017 -0400
Bugfix: reset the req_complete field when redoing a wait_sync after a failure (Issue #19)
commit 06bb8ed210288a0554897b872ee9a31c1766464a
Merge: 79efd24f ab68aced
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Sep 21 16:11:41 2017 -0400
Merge remote-tracking branch 'origin/heads/master' into ulfm
commit 79efd24fe8f975f39b0d4bd61ee3e4dc2a99dd6d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Sep 12 20:42:32 2017 -0400
Bugfix: compilation problems --without-ft
commit 88bae3699c36b5e9aec90b36ca313ed9ca6a3f74
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Sep 12 14:05:00 2017 -0400
Bugfix: simplified handling of --with-ft options
commit 9ec76f804313215fe8d43c73579d8e06f501cc20
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Sep 11 18:04:10 2017 -0400
Remove the agreement in finalize.
commit e856ed3b54e93384d756fb791866ea8a55b8c68d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Sep 7 17:22:59 2017 -0400
Removing finalize deadlocks from known problems
commit 9d9aa8808500e3192633887c65f73d4d7e789abb
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Thu Sep 7 21:13:53 2017 +0000
Update the README.
commit ea42a96e2a84814d9d8f35b285ff6479e7a87db9
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Sep 6 16:40:43 2017 -0400
Fix: post-failure deadlocks in Finalize, and control FT with --disable-recovery rather than esotheric mca params.
commit d37ac65a2acedb70e55176267c1586a39baf62fd
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Sep 1 19:08:55 2017 -0400
bugfix: finalize detector after all but 1 rank died.
commit 3eb197625d2f49d7da0fe268d044b0a6997e09f9
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Aug 31 16:53:50 2017 -0400
cleanup: remove dead code in finalize
commit da229614428d6646ca5da3e91a93ba45f2be45f2
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Aug 31 16:20:39 2017 -0400
bugfix: redo the wait_sync_mt when a global sync interrupts another request
commit 8285f9d3466919f8838609e3f054df229baa16c9
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Aug 31 16:14:36 2017 -0400
Bugfix: prevent updating the failed_grp from multipe threads
commit 5a565247c83a20dfd684876acba1fa7633629ad0
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Aug 31 13:49:20 2017 -0400
Bugfix in detector finalization
commit 6fcd853ff8b45ae599883d7bf76675ac969db52e
Merge: 42a3858d d06b989d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Aug 29 02:37:54 2017 +0000
Merged in abouteiller/ulfm2/feature/README (pull request #2)
Put README.ULFM in markdown and make it a self-contained install/getting started
commit d06b989d277925f98a5575cf629b3c8c53c705ff
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 28 22:33:40 2017 -0400
Put README.ULFM in markdown and make it a self-contained install/getting started
commit 42a3858df24fc3b2047e20b95797a3f2b80fef3b
Merge: 97070faf 1434c0e6
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 28 22:12:11 2017 +0000
Merged in abouteiller/ulfm2/feature/README (pull request #1)
Feature/README
commit 1434c0e61793f5b3e543fe6b0151e665c6e525f5
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 28 17:02:26 2017 -0400
Update the README
commit 23798cf84e35a99735a237226bab5fd811809bfd
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Aug 10 11:25:04 2017 -0400
Update README
commit d685eba8805da4617b604cdd7a1f72584537c7c4
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Aug 8 13:43:41 2017 -0400
Adding a README that's specific to ULFM
It combines the old NEWS-ulfm from ULFM1
INSTALL from Open MPI applies directly so no need for one
commit 97070faf87190faf6c50ea0a0a8557e94ec51775
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 28 15:47:51 2017 -0400
topo aware FD does not observe same-node sibling
commit 938e0174959a3187037e1ac6356a9f6236fbc8ff
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 14 23:31:36 2017 -0400
Reduce noise and some finalize conditions in comm_detector
commit 08c6f2d6e97ffe36389261edcbdd99f9a4ed38eb
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 14 23:29:27 2017 -0400
Reduce verbosity for events that are "normal" in FT with CMA
commit 4fbd4d36933f2401330862429d420f8b179470ed
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 14 18:05:11 2017 -0400
Fallback to pmix abort if ompi abort cannot be issued
commit baf523d73922b9e00c4c9f44b2de34283e0d2ebb
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 14 17:24:40 2017 -0400
Orte reports ERR_UNREACH or ERR_PROC_ABORTED when it detects local failures, take both into account.
commit f4513c3458e44fcd0aa6db8dbd77c553572bbe2d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Aug 14 14:54:51 2017 -0400
Bug in upstream: cannot call ompi_abort from a pmix cb
commit 8af800522eaf727c7f8ca8726cb7285765019483
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Aug 9 19:29:31 2017 -0400
Re-enable the TOPO graph operations, and trigger an appropriate warning when FT is enabled at the same time
commit 1fc9c039585983eddf0c9cafc9176e253a82a26e
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Aug 9 19:14:41 2017 -0400
Re-enable the RMA OSC operations, and trigger an appropriate warning when FT is enabled at the same time
commit 0214c850587a9dc4c1f18d086c6ae76c9c5fef3d
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Aug 9 18:34:08 2017 -0400
Re-enable files for non-FT runs, and generate an appropriate warning about what happens when using files and failures happen
commit b20bd7c70eee582a93428e810e625bda829e975b
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Aug 9 11:48:48 2017 -0400
Make --with-ft=mpi on by default on this fork
commit 59fca1bc961668069f79f14baffc708c65b80869
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jul 27 21:15:26 2017 -0400
make the sync_wakeup work in multithreaded runs
commit 4f917d9863037e3522637350ebda4109a37a5c46
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jul 27 20:46:31 2017 -0400
Proper cleanup of rdma registrations
commit ad86f26cb16fcd530d7a4f265d60a4f5dedb7f64
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Jul 26 11:50:02 2017 -0400
Restore --with-ft option and enable vader BTL from changes in upstream
commit 6f9abef3d444e05be2f664e4659f7fb4422e8350
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 25 09:03:57 2017 -0400
Move proc_failed checks outside of the conditional check_args block
commit c9783c52ade667362194da90b1132ca5afbb58a5
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 25 09:40:29 2017 -0400
Add support for neighboring colls and other MPI 3.1 stuff
cart/graph create
commit 35cb76303963ec83aeb27c2109b374163d57c0f6
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Wed May 24 13:04:00 2017 -0400
Make sure we do not initialize ERA and failure detector if FT is not requested; and fix a number of bugs when FT is not requesteed.
commit 12de8f950596b9f0d93d4aa301dbdbb0f0179b7c
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 23 13:25:57 2017 -0400
An error introduced during rebase
commit dbb86cb9cd68e7953a92728e2a9ee9fa15df3cd5
Author: Aurelien Bouteiller <bouteill@icl.utk.edu>
Date: Mon May 22 13:31:55 2017 -0400
Remove the opal_array comm_epoch as it is not needed anymore
commit 777b04cd67c8da1bbe95551ddd62b3bc1afd9a18
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Apr 18 15:05:22 2017 -0400
Missing an extern
commit 471f3121ce75a4405d11349789c9c356ebe7b5c5
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Apr 17 23:54:42 2017 -0400
The epoch overflow check must happen after the cid overflow check
commit ec2286eb1463f0486ad062b62ff505904e25a236
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Mar 27 16:35:16 2017 -0400
Reconcile the FT coll components with the new coll initialization (coll. become coll->)
commit 9727e60ec6b9554f52532042fed928f389d6ac3c
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Mar 27 14:58:58 2017 -0400
Update the nobuild list
commit f787b5d78cec90fd73e4fba888297fd936f9ae75
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Feb 24 17:42:56 2017 -0500
Adding a default no-build list for known problematic components.
commit ae557eedf91760e30bfbdd919156a756509c600d
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Feb 15 17:54:33 2017 -0500
coll_base_module has been updated to 2_2_0
commit 5fd144f316023c37562fe4b09e8d266ebab613b0
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 26 14:56:55 2017 -0500
Importing change from ULFM1 94f1fb9 (malloc(0) in ERA)
commit 8d49d0ac9b20d8b519493aece25a61153ff275a2
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 26 14:53:36 2017 -0500
The pmix-errhandler integration is not completely ready for prime yet
commit 022897b5b589011b7c076759ca2a6b2b51c8ec86
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Jan 26 14:52:47 2017 -0500
Convert the agreement in finalize to the new signature and stronger sync before turning off the detector
commit c67edf45b10b6fbae68d4bafc2c95a079932c703
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Nov 10 15:11:05 2016 -0500
Permit interruption of the wait_sync in case of errors
commit c631c5599c2ec87588b2f3d3f059d83bf77b4f35
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Nov 7 15:30:12 2016 -0500
Fix iagree by making the need to update of the failed_group a parameter
commit ebc714d3284aa99b94e65558e62bfa6ba01ac068
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Nov 3 14:42:08 2016 -0400
Restoring the errhandler/errmgr interaction to capture errors
commit 4fa09b9780c94b5766cf0d522fdc974352926da3
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Oct 31 15:37:22 2016 -0400
cid_ft functions are operational again, shrink fixed.
commit 15b5a4d1b13730929c46b23da442f26a4b88cc48
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Oct 20 16:44:54 2016 -0400
Make sure that the rbcast/detector tags are initialized before progressing the engine.
commit da21280dffb6ba0252de0d04e02566e0b96e7000
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Oct 14 19:20:20 2016 -0400
We can save an agreement in finalize if we take care of ignoring stray rbcast at this time
commit 355b4b0b2796bbcb0d2d4b6d07f16980346d4b0b
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 12 21:19:38 2016 -0400
Make errors detected in NBC collectives complete the operation, and stop COMM_COLL requests
commit 8a88a81a83027f4aee446d77e0c074657f37a4b3
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 12 21:19:04 2016 -0400
Some more REQUEST_COMPLETE fixes
commit 461d209343f51021557c1f6f11d05911c4134d5a
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 12 18:34:39 2016 -0400
request_testall/some returns ERR_PROC_FAILED and REVOKED just like request_waitall/some (the mpi layer takes care of setting it to IN_STATUS again later)..
commit 7481f709f8d360f07f9082d13fae1c67c7b7219b
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 12 18:33:45 2016 -0400
use REQUEST_COMPLETE in send_cancel
commit 79fd44e4f550076f57e4b101e7f347a47ac013dc
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 12 17:42:55 2016 -0400
free_reqs does cancel the requests, so its replacement code should too.
commit f710a951e1c36a9663575533f05cd59b43f85a33
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 12 01:31:09 2016 -0400
Put back epochs in cid allocation
commit 95e86ae7fb3fef16b0e5fdf2eff7b98eb4af28f1
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Oct 5 18:08:16 2016 -0400
gen_cid must set req_mpi_object.comm
commit dfd07582e1c131aacc14ab3b81bde1f5745fce07
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Oct 4 22:44:45 2016 -0400
rebase on master
commit 2da1b6bceb6ec52ac496f5a25101358e13db0892
Author: George Bosilca <bosilca@icl.utk.edu>
Date: Mon May 9 12:25:01 2016 -0400
Add FT to summary.
commit fce79759f7dcb58efa19b4948e6b66ada9807bb1
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Fri May 6 17:03:20 2016 -0400
This hack has been committed by mistake
commit e507d83a66155df1fe5196228068f90a4132387f
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu May 5 10:37:50 2016 -0400
Do the finalize in abort only if there were actual failures during the run
commit c1eb96a997625129ccc4a690892f2f9e742ac245
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 3 17:34:58 2016 -0400
Using the new OPAL_ENABLE_THREAD_MULTI where applicable and removing some useless rmb()
Using the new OPAL_ENABLE_THREAD_MULTI where applicable and removing some useless rmb()
Using OPAL_ENABLE_MULTI_THREADS and removing some useless rmb()
commit d4a91a0a9607c10c52ee7a739754e10a16035a47
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 3 17:34:14 2016 -0400
Fix a bug where the rank of immediate neighbors in the BMG where incorrectly computed
commit 4ed5a3779c8295b501cf589bf520843b8fcdc7c8
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 3 16:10:05 2016 -0400
reinstate the abort in finalize, as the fix pushed by ralph is not always working
commit b31892772e4518e26603d4289fc3f0a57af2ef5f
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 3 16:06:44 2016 -0400
We need to synchronize before removing the FD callbacks
commit 9dfe5dfc7200692330b939fcdb8965aac25b50fb
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 3 16:05:30 2016 -0400
Keep searching for the next hop in the ring of the BMG when it is found dead during a comm
commit 9dad817b402de6c5a725de582cffb20f12f1ae54
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Apr 5 16:55:23 2016 -0400
Various Cray XK fixes
commit 3921fbc38a9a913367f5c8c94682f4198aec6e06
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Apr 1 13:53:32 2016 -0400
Make the revoke ring more reliable
Still not perfect as we do no reemit for failures detected after the initial post
commit de1a5ce9b13f87aa5367ca5305a389ae56f8822b
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Apr 1 13:52:53 2016 -0400
Adding a small injection facility to the interface (non-standard, for testing only)
commit b92d0997b567ef8e14abd4e76124568e049b6589
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 31 14:50:17 2016 -0400
Do not do extra stuff in Finalize when disable_ftmpi
commit 6cd0d7cbe3e1ed2c436c4f98ece4ca57e9242da3
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 31 01:39:37 2016 -0400
More thread safety in error reporting paths
commit 8b8b3c2c8d120e9aa5141338bcc8c95e43d79397
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 29 08:59:19 2016 -0400
various debugging stuff
commit b6e6156d0026bf54897c20477887738462f5fbfd
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Mon Mar 28 15:59:27 2016 -0400
Move back these things in finalize to make sure they happen before we tear down BTL etc.
commit 6b12383734735130b6a31ee2d5af6b63bf8ae6bd
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Fri Mar 25 08:40:12 2016 -0400
fix the FD thread sync variable being optimized out in -O3
commit c76d0e968d2c61a3f630680f615293217f48b015
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 22 17:55:53 2016 -0400
rdma based heartbeat now works
commit a3b35cc4d946cc7cf9a2af7afcfcb934d9a47a35
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 22 09:52:24 2016 -0400
Adding RDMA based heartbeats
commit 7a2603b2fe55f8e69ce80c6dbaace5ac3d37f7b8
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 15 23:59:53 2016 -0400
Adding a thread to the FD. This cause a race in add_procs.
commit 97f59ac7666d7430f2b50c16b411f7455552c3ba
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 15 17:41:36 2016 -0400
Detector is complete w/o progress thread. The timer resolution is a bit too coarse and false suspicions are common...
commit 809fc3d6300b8333a99178290392ab6fe3b96116
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 10 16:28:07 2016 -0500
Adding the fd to this repo. missing the thread and libevent timeout triggers
commit e5932ef24746229ea0e2422e91aca3707bff9f32
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue May 3 11:52:43 2016 -0400
use-mpi extensions should not have a lib.la
commit 93699550538bc8800ffcc1fcddd1f6de9d71839c
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 17 13:45:26 2016 -0400
Fixing some issues in MPI_THREAD_MULTIPLE enabled builds
Reinstate the pmix_fence in finalize
Remove some duplicate debug messages
commit ab485e166f60dccc233a13a4529a3e1012b4f7da
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 10 16:58:52 2016 -0500
Move the initialization/finalization of the revoke/rbcast etc in comm_init
This initialization done elsewhere
commit c1744c01ad087a97cd6f03bf3fe6fa669acb049e
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Thu Mar 10 16:28:51 2016 -0500
Fix the global variable warning with the failed_group
commit 0822fe56dbe9dfcff02d5d784be033067332bf88
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 9 14:57:03 2016 -0500
Silence warnings about failed TCP connections, which is a normal situation w/FT
commit 84596b8aba2972047d8afeef0e1c334df2b02e63
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 9 11:32:30 2016 -0500
Make sure we do not try to cancel completed requests
commit d93e6289cc6f55a88649c77d1d9d4ffd581a6404
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 9 10:59:54 2016 -0500
Make the CID collective tags part of the colletive tag namespace
commit 840cc828916574a2ab8b051bc688efeb7d6c27fc
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 9 10:58:14 2016 -0500
Correctly promote ERR_PROC_FAILED_PENDING to PROC_FAILED for blocking operations and complete the request
commit 0b5fcf45acf860bd3bc74eb1503b37d85cc33aff
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 8 17:22:23 2016 -0500
Fix a bug in intercomm_create and enable error returning from low level comms in all cases
commit 71c0c65699f095c5a0aa7cc4982e113c40075769
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 2 09:17:38 2016 -0500
cumulative copyright update
commit b0138ff5141506a3ac1b6b060849bc9ba6b91df4
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Wed Mar 2 01:45:56 2016 -0500
Disable auto-cleanup in orte to better test survivability of MPI layer.
orte finalize is broken.
commit 16dff489177807122a83f1e8b0004bbc7abf8ff5
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 1 23:11:54 2016 -0500
The base logic for shrink_inter is there. As soon as cid_reduce_inter_ft is implemented it should work.
commit db2a955f388916e61321cb9bcf683750d5191a01
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 1 23:04:49 2016 -0500
Fix a bug in shrink where the failed group was used partially uninitialized
commit 60469079c3bc2fdb08129174a5a2188b711a2418
Author: Aurélien Bouteiller <bouteill@icl.utk.edu>
Date: Tue Mar 1 18:46:31 2016 -0500
Cleanup cruft from jjh original prototype
commit 8187b…1 parent 9996b9f commit cf886d5
File tree
190 files changed
+11503
-201
lines changed- config
- contrib
- amca-param-sets
- platform
- ompi
- attribute
- communicator
- ft
- dpm
- errhandler
- group
- include
- mca
- coll
- base
- basic
- demo
- ftagree
- libnbc
- io/base
- osc/base
- pml
- base
- ob1
- topo/base
- mpiext/ftmpi
- c
- profile
- mpif-h
- use-mpi-f08
- profile
- use-mpi
- mpi/c
- proc
- request
- runtime
- tools/ompi_info
- opal/mca
- btl
- sm
- tcp
- event/libevent2022/libevent
- threads
- pthreads
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
190 files changed
+11503
-201
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
2 | 6 | | |
3 | 7 | | |
4 | 8 | | |
| |||
12 | 16 | | |
13 | 17 | | |
14 | 18 | | |
| 19 | + | |
15 | 20 | | |
16 | 21 | | |
17 | 22 | | |
18 | 23 | | |
19 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
20 | 28 | | |
21 | 29 | | |
22 | 30 | | |
| |||
33 | 41 | | |
34 | 42 | | |
35 | 43 | | |
36 | | - | |
37 | | - | |
38 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
39 | 47 | | |
40 | 48 | | |
41 | 49 | | |
| |||
63 | 71 | | |
64 | 72 | | |
65 | 73 | | |
66 | | - | |
| 74 | + | |
| 75 | + | |
67 | 76 | | |
68 | 77 | | |
| 78 | + | |
69 | 79 | | |
70 | 80 | | |
71 | 81 | | |
| |||
74 | 84 | | |
75 | 85 | | |
76 | 86 | | |
77 | | - | |
78 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
79 | 97 | | |
80 | 98 | | |
81 | 99 | | |
| |||
89 | 107 | | |
90 | 108 | | |
91 | 109 | | |
92 | | - | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
93 | 113 | | |
94 | 114 | | |
95 | 115 | | |
| |||
101 | 121 | | |
102 | 122 | | |
103 | 123 | | |
| 124 | + | |
104 | 125 | | |
105 | 126 | | |
106 | 127 | | |
107 | 128 | | |
108 | 129 | | |
109 | 130 | | |
110 | 131 | | |
| 132 | + | |
| 133 | + | |
111 | 134 | | |
112 | 135 | | |
113 | 136 | | |
| 137 | + | |
114 | 138 | | |
115 | 139 | | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
116 | 151 | | |
117 | 152 | | |
118 | 153 | | |
| |||
175 | 210 | | |
176 | 211 | | |
177 | 212 | | |
| 213 | + | |
178 | 214 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1085 | 1085 | | |
1086 | 1086 | | |
1087 | 1087 | | |
1088 | | - | |
| 1088 | + | |
1089 | 1089 | | |
1090 | 1090 | | |
1091 | 1091 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
30 | 34 | | |
31 | 35 | | |
32 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
55 | 58 | | |
56 | 59 | | |
57 | 60 | | |
| |||
123 | 126 | | |
124 | 127 | | |
125 | 128 | | |
126 | | - | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
127 | 134 | | |
128 | 135 | | |
129 | 136 | | |
| |||
133 | 140 | | |
134 | 141 | | |
135 | 142 | | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
136 | 149 | | |
137 | 150 | | |
138 | 151 | | |
| |||
161 | 174 | | |
162 | 175 | | |
163 | 176 | | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
164 | 180 | | |
165 | 181 | | |
166 | 182 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
0 commit comments