Skip to content

Conversation

@matthiasgoergens
Copy link
Contributor

Please review the changes before applying. There's still a bug somewhere. Probably something part the scripts that I overlooked.

Found the bug! It builds now---but only in the `make api-chroot'. I'll merge. There's probably still some problem with v6.hg, trying to make clean on xen-api.hg.

@rokstrnisa
Copy link
Contributor

Apart from the error mentioned by Rob, it looks OK to me. Does it build? Also, don't forget to commit changes to the build.hg.

@matthiasgoergens
Copy link
Contributor Author

Doesn't build. But Rob found a bug. Perhaps that's the last one?

I'll commit to build.hg, once it works. I don't want to get untested stuff into its trunk.

@robhoes
Copy link
Member

robhoes commented Feb 1, 2011

Hmm... the line with the bug and my comment now seem to have disappeared. Is it possible to change the commits in a pull request, rather than adding a new commit to fix a bug?

@matthiasgoergens
Copy link
Contributor Author

I changed to what commit my master was pointing to. I made a new commit that did not have the mistake in.

I don't know whether that's actually I good idea. I feel that we should never change history in our xen-org repository, but I am not sure about the private repos. I am in favour of only pulling in commits that we know to be good into xen-org, though.

`hg archive' to `git archive' needed some careful attention, since
`git archive' works slightly differently.
simonjbeaumont referenced this pull request in simonjbeaumont/xen-api May 13, 2014
The read_records_where function in the database layer (used by the
get_all_records and get_all_records_where APIs) was reading the database
multiple times by calling find_refs_with_filter[1] to get the refs that matched
the query and then calling read_record[2] for each of these refs.

This violates point #2 of the  locking strategy stated at the top of the module
that read only functions must only call get_database once to ensure they
operate on a consistent snapshot.

Since [1] and [2] make get_database calls the get_all_records* functions make
n+1 calls to get_database for a table with n records. Because of this, deleting
a record during a get_all_records_where will result in a DBCache_NotFound
exception with parameter "missing row".

This commits adds internal variants of functions [1] and [2] that take
an actual instance of Database.t rather than a Db_ref.t which is a Database.t
ref ref.

Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
simonjbeaumont referenced this pull request in simonjbeaumont/xen-api May 13, 2014
The read_records_where function in the database layer (used by the
get_all_records and get_all_records_where APIs) was reading the database
multiple times by calling find_refs_with_filter[1] to get the refs that matched
the query and then calling read_record[2] for each of these refs.

This violates point #2 of the  locking strategy stated at the top of the module
that read only functions must only call get_database once to ensure they
operate on a consistent snapshot.

Since [1] and [2] make get_database calls the get_all_records* functions make
n+1 calls to get_database for a table with n records. Because of this, deleting
a record during a get_all_records_where will result in a DBCache_NotFound
exception with parameter "missing row".

This commits adds internal variants of functions [1] and [2] that take
an actual instance of Database.t rather than a Db_ref.t which is a Database.t
ref ref.

Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
simonjbeaumont referenced this pull request in simonjbeaumont/xen-api May 13, 2014
The read_records_where function in the database layer (used by the
get_all_records and get_all_records_where APIs) was reading the database
multiple times by calling find_refs_with_filter[1] to get the refs that matched
the query and then calling read_record[2] for each of these refs.

This violates point #2 of the  locking strategy stated at the top of the module
that read only functions must only call get_database once to ensure they
operate on a consistent snapshot.

Since [1] and [2] make get_database calls the get_all_records* functions make
n+1 calls to get_database for a table with n records. Because of this, deleting
a record during a get_all_records_where will result in a DBCache_NotFound
exception with parameter "missing row".

This commits adds internal variants of functions [1] and [2] that take
an actual instance of Database.t rather than a Db_ref.t which is a Database.t
ref ref.

Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
thomassa pushed a commit to thomassa/xen-api that referenced this pull request May 21, 2014
The read_records_where function in the database layer (used by the
get_all_records and get_all_records_where APIs) was reading the database
multiple times by calling find_refs_with_filter[1] to get the refs that matched
the query and then calling read_record[2] for each of these refs.

This violates point xapi-project#2 of the  locking strategy stated at the top of the module
that read only functions must only call get_database once to ensure they
operate on a consistent snapshot.

Since [1] and [2] make get_database calls the get_all_records* functions make
n+1 calls to get_database for a table with n records. Because of this, deleting
a record during a get_all_records_where will result in a DBCache_NotFound
exception with parameter "missing row".

This commits adds internal variants of functions [1] and [2] that take
an actual instance of Database.t rather than a Db_ref.t which is a Database.t
ref ref.

Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
johnelse added a commit that referenced this pull request Aug 26, 2014
CA-142107 #2: Improve logging around migration script
djs55 added a commit that referenced this pull request Jan 5, 2015
[RFC]: Add a 'floppy' VBD type (take #2)
sharady pushed a commit to sharady/xen-api that referenced this pull request Apr 10, 2015
…-1256-clearwater-lcm

SCTX-1256: Check for BIOS strings on restart and update DB in case of mismatch
sharady pushed a commit to sharady/xen-api that referenced this pull request May 11, 2015
sharady pushed a commit to sharady/xen-api that referenced this pull request May 11, 2015
sharady pushed a commit to sharady/xen-api that referenced this pull request May 11, 2015
[RFC] xen: Add a "floppy" type, alongside cdrom and hard disk (take xapi-project#2)
koushikcgit pushed a commit to koushikcgit/xen-api that referenced this pull request Aug 7, 2015
Fix xenopsd following the changes to xcp-idl
koushikcgit pushed a commit to koushikcgit/xen-api that referenced this pull request Aug 7, 2015
CA-142107 xapi-project#2: Improve logging around migration script
koushikcgit pushed a commit to koushikcgit/xen-api that referenced this pull request Aug 7, 2015
[RFC] xc: support floppy disk images (take xapi-project#2)
thomassa added a commit to thomassa/xen-api that referenced this pull request Sep 8, 2017
…feature/CBT_set_snapshot_cbt_enable_field to feature/CBT

* commit 'ca0792594bbff0b1c1cbd37e566e19320870d3be':
  VDI.snapshot: inherit cbt_enabled field from snapshotted VDI
lippirk referenced this pull request in lippirk/xen-api Jul 20, 2020
Rearrange files and switch to OASIS
kc284 pushed a commit to kc284/xen-api that referenced this pull request Mar 11, 2021
Force the filter to return an array at all times to ensure the look-up using index works properly
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Jul 6, 2021
Add SSL support, simplify the interface
robhoes pushed a commit that referenced this pull request Sep 14, 2021
Add unit tests and travis integration
robhoes pushed a commit that referenced this pull request Sep 14, 2021
robhoes pushed a commit that referenced this pull request Sep 14, 2021
robhoes pushed a commit that referenced this pull request Sep 21, 2021
CP-4405: Pull relevant commits from xen-api-libs
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Sep 21, 2021
CA-120439: Use Unix.LargeFile for 32-bit machines
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Sep 21, 2021
CP-7327 nice and ionice config options for sparse_dd.conf
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Sep 21, 2021
Allow for parallel installs of blktap2.5
robhoes pushed a commit that referenced this pull request Oct 21, 2021
Replace network_test with unit tests from xen-api
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Jan 10, 2022
wsproxy: port to _oasis and ppx
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Jan 10, 2022
Add vdi-similar-content and vdi-compose
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Jan 10, 2022
CA-177418: supply a value for vdi_info.uuid
robhoes pushed a commit that referenced this pull request Jan 10, 2022
Authenticate, Attach and Activate
edwintorok referenced this pull request in edwintorok/xen-api Jan 10, 2022
CA-308072: fix varstored-guard with toolstack restart
robhoes pushed a commit to robhoes/xen-api that referenced this pull request Apr 7, 2022
psafont pushed a commit that referenced this pull request Apr 11, 2023
edwintorok added a commit that referenced this pull request Oct 12, 2023
This enables PAM to be used in multithreaded mode (currently XAPI has a global lock around auth).

Using an off-cpu flamegraph I identified that concurrent PAM calls are slow due to a call to `sleep(1)`.
`pam_authenticate` calls `crypt_r` which calls `NSSLOW_Init` which on first use will try to initialize the just `dlopen`-ed library.
If it encounters a race condition it does a `sleep(1)`. This race condition can be quite reliably reproduced when performing a lot of PAM authentications from multiple threads in parallel.

GDB can also be used to confirm this by putting a breakpoint on `sleep`:
```
  #0  __sleep (seconds=seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:42
  #1  0x00007ffff1548e22 in freebl_RunLoaderOnce () at lowhash_vector.c:122
  #2  0x00007ffff1548f31 in freebl_InitVector () at lowhash_vector.c:131
  #3  NSSLOW_Init () at lowhash_vector.c:148
  #4  0x00007ffff1b8f09a in __sha512_crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=0x7ffff31e17b8 "dIJbsXKc0",
  #5  0x00007ffff1b8d070 in __crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=<optimized out>,
  #6  0x00007ffff1dc9abc in verify_pwd_hash (p=p@entry=0x7fffd8005a60 "pamtest-edvint", hash=<optimized out>, nullok=nullok@entry=0) at passverify.c:111
  #7  0x00007ffff1dc9139 in _unix_verify_password (pamh=pamh@entry=0x7fffd8002910, name=0x7fffd8002ab0 "pamtest-edvint", p=0x7fffd8005a60 "pamtest-edvint", ctrl=ctrl@entry=8389156) at support.c:777
  #8  0x00007ffff1dc6556 in pam_sm_authenticate (pamh=0x7fffd8002910, flags=<optimized out>, argc=<optimized out>, argv=<optimized out>) at pam_unix_auth.c:178
  #9  0x00007ffff7bcef1a in _pam_dispatch_aux (use_cached_chain=<optimized out>, resumed=<optimized out>, h=<optimized out>, flags=1, pamh=0x7fffd8002910) at pam_dispatch.c:110
  #10 _pam_dispatch (pamh=pamh@entry=0x7fffd8002910, flags=1, choice=choice@entry=1) at pam_dispatch.c:426
  #11 0x00007ffff7bce7e0 in pam_authenticate (pamh=0x7fffd8002910, flags=flags@entry=1) at pam_auth.c:34
  #12 0x00000000005ae567 in XA_mh_authorize (username=username@entry=0x7fffd80028d0 "pamtest-edvint", password=password@entry=0x7fffd80028f0 "pamtest-edvint", error=error@entry=0x7ffff31e1be8) at xa_auth.c:83
  #13 0x00000000005adf20 in stub_XA_mh_authorize (username=<optimized out>, password=<optimized out>) at xa_auth_stubs.c:42
```

`pam_start` and `pam_end` doesn't help here, because on `pam_end` the library is `dlclose`-ed, so on next `pam_authenticate` it will have to go through the initialization code again.
(This initialization code would've belonged into `pam_start`, not `pam_authenticate`, but there are several layers here including a call to `crypt_r`).
Upstream has fixed this problem >5 years ago by switching to libxcrypt instead.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
psafont pushed a commit that referenced this pull request Jan 31, 2024
Add standard files, make initial release
psafont pushed a commit that referenced this pull request Jan 31, 2024
lindig pushed a commit to lindig/xen-api that referenced this pull request Feb 28, 2024
Backport of 3b52b72

This enables PAM to be used in multithreaded mode (currently XAPI has a global lock around auth).

Using an off-cpu flamegraph I identified that concurrent PAM calls are slow due to a call to `sleep(1)`.
`pam_authenticate` calls `crypt_r` which calls `NSSLOW_Init` which on first use will try to initialize the just `dlopen`-ed library.
If it encounters a race condition it does a `sleep(1)`. This race condition can be quite reliably reproduced when performing a lot of PAM authentications from multiple threads in parallel.

GDB can also be used to confirm this by putting a breakpoint on `sleep`:
```
  #0  __sleep (seconds=seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:42
  xapi-project#1  0x00007ffff1548e22 in freebl_RunLoaderOnce () at lowhash_vector.c:122
  xapi-project#2  0x00007ffff1548f31 in freebl_InitVector () at lowhash_vector.c:131
  xapi-project#3  NSSLOW_Init () at lowhash_vector.c:148
  xapi-project#4  0x00007ffff1b8f09a in __sha512_crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=0x7ffff31e17b8 "dIJbsXKc0",
  xapi-project#5  0x00007ffff1b8d070 in __crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=<optimized out>,
  xapi-project#6  0x00007ffff1dc9abc in verify_pwd_hash (p=p@entry=0x7fffd8005a60 "pamtest-edvint", hash=<optimized out>, nullok=nullok@entry=0) at passverify.c:111
  xapi-project#7  0x00007ffff1dc9139 in _unix_verify_password (pamh=pamh@entry=0x7fffd8002910, name=0x7fffd8002ab0 "pamtest-edvint", p=0x7fffd8005a60 "pamtest-edvint", ctrl=ctrl@entry=8389156) at support.c:777
  xapi-project#8  0x00007ffff1dc6556 in pam_sm_authenticate (pamh=0x7fffd8002910, flags=<optimized out>, argc=<optimized out>, argv=<optimized out>) at pam_unix_auth.c:178
  xapi-project#9  0x00007ffff7bcef1a in _pam_dispatch_aux (use_cached_chain=<optimized out>, resumed=<optimized out>, h=<optimized out>, flags=1, pamh=0x7fffd8002910) at pam_dispatch.c:110
  xapi-project#10 _pam_dispatch (pamh=pamh@entry=0x7fffd8002910, flags=1, choice=choice@entry=1) at pam_dispatch.c:426
  xapi-project#11 0x00007ffff7bce7e0 in pam_authenticate (pamh=0x7fffd8002910, flags=flags@entry=1) at pam_auth.c:34
  xapi-project#12 0x00000000005ae567 in XA_mh_authorize (username=username@entry=0x7fffd80028d0 "pamtest-edvint", password=password@entry=0x7fffd80028f0 "pamtest-edvint", error=error@entry=0x7ffff31e1be8) at xa_auth.c:83
  xapi-project#13 0x00000000005adf20 in stub_XA_mh_authorize (username=<optimized out>, password=<optimized out>) at xa_auth_stubs.c:42
```

`pam_start` and `pam_end` doesn't help here, because on `pam_end` the library is `dlclose`-ed, so on next `pam_authenticate` it will have to go through the initialization code again.
(This initialization code would've belonged into `pam_start`, not `pam_authenticate`, but there are several layers here including a call to `crypt_r`).
Upstream has fixed this problem >5 years ago by switching to libxcrypt instead.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
lindig pushed a commit that referenced this pull request Feb 29, 2024
Backport of 3b52b72

This enables PAM to be used in multithreaded mode (currently XAPI has a global lock around auth).

Using an off-cpu flamegraph I identified that concurrent PAM calls are slow due to a call to `sleep(1)`.
`pam_authenticate` calls `crypt_r` which calls `NSSLOW_Init` which on first use will try to initialize the just `dlopen`-ed library.
If it encounters a race condition it does a `sleep(1)`. This race condition can be quite reliably reproduced when performing a lot of PAM authentications from multiple threads in parallel.

GDB can also be used to confirm this by putting a breakpoint on `sleep`:
```
  #0  __sleep (seconds=seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:42
  #1  0x00007ffff1548e22 in freebl_RunLoaderOnce () at lowhash_vector.c:122
  #2  0x00007ffff1548f31 in freebl_InitVector () at lowhash_vector.c:131
  #3  NSSLOW_Init () at lowhash_vector.c:148
  #4  0x00007ffff1b8f09a in __sha512_crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=0x7ffff31e17b8 "dIJbsXKc0",
  #5  0x00007ffff1b8d070 in __crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=<optimized out>,
  #6  0x00007ffff1dc9abc in verify_pwd_hash (p=p@entry=0x7fffd8005a60 "pamtest-edvint", hash=<optimized out>, nullok=nullok@entry=0) at passverify.c:111
  #7  0x00007ffff1dc9139 in _unix_verify_password (pamh=pamh@entry=0x7fffd8002910, name=0x7fffd8002ab0 "pamtest-edvint", p=0x7fffd8005a60 "pamtest-edvint", ctrl=ctrl@entry=8389156) at support.c:777
  #8  0x00007ffff1dc6556 in pam_sm_authenticate (pamh=0x7fffd8002910, flags=<optimized out>, argc=<optimized out>, argv=<optimized out>) at pam_unix_auth.c:178
  #9  0x00007ffff7bcef1a in _pam_dispatch_aux (use_cached_chain=<optimized out>, resumed=<optimized out>, h=<optimized out>, flags=1, pamh=0x7fffd8002910) at pam_dispatch.c:110
  #10 _pam_dispatch (pamh=pamh@entry=0x7fffd8002910, flags=1, choice=choice@entry=1) at pam_dispatch.c:426
  #11 0x00007ffff7bce7e0 in pam_authenticate (pamh=0x7fffd8002910, flags=flags@entry=1) at pam_auth.c:34
  #12 0x00000000005ae567 in XA_mh_authorize (username=username@entry=0x7fffd80028d0 "pamtest-edvint", password=password@entry=0x7fffd80028f0 "pamtest-edvint", error=error@entry=0x7ffff31e1be8) at xa_auth.c:83
  #13 0x00000000005adf20 in stub_XA_mh_authorize (username=<optimized out>, password=<optimized out>) at xa_auth_stubs.c:42
```

`pam_start` and `pam_end` doesn't help here, because on `pam_end` the library is `dlclose`-ed, so on next `pam_authenticate` it will have to go through the initialization code again.
(This initialization code would've belonged into `pam_start`, not `pam_authenticate`, but there are several layers here including a call to `crypt_r`).
Upstream has fixed this problem >5 years ago by switching to libxcrypt instead.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants