Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MsSql flakiness #3062

Merged
merged 1 commit into from
Sep 18, 2024
Merged

Fix MsSql flakiness #3062

merged 1 commit into from
Sep 18, 2024

Conversation

cbeauchesne
Copy link
Collaborator

@cbeauchesne cbeauchesne commented Sep 18, 2024

Motivation

mcr.microsoft.com/azure-sql-edge:latest is very flaky at startup. the output is quite painfull because it contains tons of * for a reason I ignore. Once the ouput cleaned, it says :

2024/09/18 05:20:17 [launchpadd] INFO: Extensibility Log Header: <timestamp> <process> <sandboxId> <sessionId> <message>
2024/09/18 05:20:17 [launchpadd] WARNING: Failed to load /var/opt/mssql/mssql.conf ini file with error open /var/opt/mssql/mssql.conf: no such file or directory
2024/09/18 05:20:17 [launchpadd] ERROR: RevoScaleR installation was not found. RevoScaleR is required to be installed in order to use R scripts.
2024/09/18 05:20:17 [launchpadd] ERROR: revoscalepy installation was not found. revoscalepy is required to be installed in order to use Python scripts.
2024/09/18 05:20:17 [launchpadd] INFO: DataDirectories =  /bin:/etc:/lib:/lib32:/lib64:/sbin:/usr/bin:/usr/include:/usr/lib:/usr/lib32:/usr/lib64:/usr/libexec/gcc:/usr/sbin:/usr/share:/var/lib:/opt/microsoft:/opt/mssql-extensibility:/opt/mssql/mlservices:/opt/mssql/lib/zulu-jre-11:/opt/mssql-tools
2024/09/18 05:20:17 Drop permitted effective capabilities.
2024/09/18 05:20:21 [launchpadd] INFO: Polybase remote hadoop bridge disabled
2024/09/18 05:20:21 [launchpadd] INFO: Launchpadd is connecting to mssql on localhost:1431
2024/09/18 05:20:21 [launchpadd] WARNING: Failed to connect to SQL because: dial tcp [::1]:1431: getsockopt: connection refused, will reattempt connection.
2024/09/18 05:20:22 [launchpadd] WARNING: Failed to connect to SQL because: dial tcp [::1]:1431: getsockopt: connection refused, will reattempt connection.
2024/09/18 05:20:23 [launchpadd] WARNING: Failed to connect to SQL because: dial tcp [::1]:1431: getsockopt: connection refused, will reattempt connection.
2024/09/18 05:20:24 [launchpadd] WARNING: Failed to connect to SQL because: dial tcp [::1]:1431: getsockopt: connection refused, will reattempt connection.
2024/09/18 05:20:25 [launchpadd] WARNING: Failed to connect to SQL because: dial tcp [::1]:1431: getsockopt: connection refused, will reattempt connection.
This program has encountered a fatal error and cannot continue running at Wed Sep 18 05:20:25 2024
The following diagnostic information is available:

         Reason: 0x00000001
         Signal: SIGABRT - Aborted (6)
          Stack:
                 IP               Function
                 ---------------- --------------------------------------
                 000056385f8525e1 std::__1::bad_function_call::~bad_function_call()+0x324b1
                 000056385f851fa6 std::__1::bad_function_call::~bad_function_call()+0x31e76
                 000056385f85152f std::__1::bad_function_call::~bad_function_call()+0x313ff
                 00007f9cd9aea090 killpg+0x40
                 00007f9cd9aea00b gsignal+0xcb
                 00007f9cd9ac9859 abort+0x12b
                 000056385f7e5276 std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char>>::str() const+0xc936
                 000056385f867434 std::__1::bad_function_call::~bad_function_call()+0x47304
                 000056385f899508 std::__1::bad_function_call::~bad_function_call()+0x793d8
                 000056385f8992ea std::__1::bad_function_call::~bad_function_call()+0x791ba
                 000056385f7eb6ea std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char>>::str() const+0x12daa
                 000056385f7eb360 std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char>>::str() const+0x12a20
        Process: 37 - sqlservr
         Thread: 42 (application thread 0x8)
    Instance Id: ef925511-ab20-4bfd-a37e-70a30b26c5cb
       Crash Id: dc109565-99fd-4288-a3e2-ffdabc6728a1
    Build stamp: 8ec8bf7535ba9e851dd60ffb1ceae46cb19b7b05f6b3e246cedcc89ff8f741aa
   Distribution: Ubuntu 20.04.6 LTS
     Processors: 2
   Total Memory: 8324345856 bytes
      Timestamp: Wed Sep 18 05:20:25 2024
     Last errno: 2
Last errno text: No such file or directory
Capturing a dump of 37
Successfully captured dump: /var/opt/mssql/log/core.sqlservr.9_18_2024_5_20_26.37
/bin/cat: /proc/37/maps: Permission denied
cat: /proc/37/environ: Permission denied
find: '/proc/37/task/37/fdinfo': Permission denied
find: '/proc/37/task/38/fdinfo': Permission denied
find: '/proc/37/task/40/fdinfo': Permission denied
find: '/proc/37/task/41/fdinfo': Permission denied
find: '/proc/37/task/42/fdinfo': Permission denied
find: '/proc/37/task/43/fdinfo': Permission denied
find: '/proc/37/task/44/fdinfo': Permission denied
find: '/proc/37/task/45/fdinfo': Permission denied
find: '/proc/37/task/46/fdinfo': Permission denied
find: '/proc/37/task/47/fdinfo': Permission denied
find: '/proc/37/task/48/fdinfo': Permission denied
find: '/proc/37/task/49/fdinfo': Permission denied
find: '/proc/37/task/50/fdinfo': Permission denied
find: '/proc/37/task/51/fdinfo': Permission denied
find: '/proc/37/task/52/fdinfo': Permission denied
find: '/proc/37/task/53/fdinfo': Permission denied
find: '/proc/37/task/54/fdinfo': Permission denied
find: '/proc/37/task/55/fdinfo': Permission denied
find: '/proc/37/task/56/fdinfo': Permission denied
find: '/proc/37/task/57/fdinfo': Permission denied
find: '/proc/37/map_files': Permission denied
find: '/proc/37/fdinfo': Permission denied
find: '/proc/37/task/37/fdinfo': Permission denied
find: '/proc/37/task/38/fdinfo': Permission denied
find: '/proc/37/task/40/fdinfo': Permission denied
find: '/proc/37/task/41/fdinfo': Permission denied
find: '/proc/37/task/42/fdinfo': Permission denied
find: '/proc/37/task/43/fdinfo': Permission denied
find: '/proc/37/task/44/fdinfo': Permission denied
find: '/proc/37/task/45/fdinfo': Permission denied
find: '/proc/37/task/46/fdinfo': Permission denied
find: '/proc/37/task/47/fdinfo': Permission denied
find: '/proc/37/task/48/fdinfo': Permission denied
find: '/proc/37/task/49/fdinfo': Permission denied
find: '/proc/37/task/50/fdinfo': Permission denied
find: '/proc/37/task/51/fdinfo': Permission denied
find: '/proc/37/task/52/fdinfo': Permission denied
find: '/proc/37/task/53/fdinfo': Permission denied
find: '/proc/37/task/54/fdinfo': Permission denied
find: '/proc/37/task/55/fdinfo': Permission denied
find: '/proc/37/task/56/fdinfo': Permission denied
find: '/proc/37/task/57/fdinfo': Permission denied
find: '/proc/37/map_files': Permission denied
find: '/proc/37/fdinfo': Permission denied
find: '/proc/37/task/37/fdinfo': Permission denied
find: '/proc/37/task/38/fdinfo': Permission denied
find: '/proc/37/task/40/fdinfo': Permission denied
find: '/proc/37/task/41/fdinfo': Permission denied
find: '/proc/37/task/42/fdinfo': Permission denied
find: '/proc/37/task/43/fdinfo': Permission denied
find: '/proc/37/task/44/fdinfo': Permission denied
find: '/proc/37/task/45/fdinfo': Permission denied
find: '/proc/37/task/46/fdinfo': Permission denied
find: '/proc/37/task/47/fdinfo': Permission denied
find: '/proc/37/task/48/fdinfo': Permission denied
find: '/proc/37/task/49/fdinfo': Permission denied
find: '/proc/37/task/50/fdinfo': Permission denied
find: '/proc/37/task/51/fdinfo': Permission denied
find: '/proc/37/task/52/fdinfo': Permission denied
find: '/proc/37/task/53/fdinfo': Permission denied
find: '/proc/37/task/54/fdinfo': Permission denied
find: '/proc/37/task/55/fdinfo': Permission denied
find: '/proc/37/task/56/fdinfo': Permission denied
find: '/proc/37/task/57/fdinfo': Permission denied
find: '/proc/37/map_files': Permission denied
find: '/proc/37/fdinfo': Permission denied
find: '/proc/37/task/37/fdinfo': Permission denied
find: '/proc/37/task/38/fdinfo': Permission denied
find: '/proc/37/task/40/fdinfo': Permission denied
find: '/proc/37/task/41/fdinfo': Permission denied
find: '/proc/37/task/42/fdinfo': Permission denied
find: '/proc/37/task/43/fdinfo': Permission denied
find: '/proc/37/task/44/fdinfo': Permission denied
find: '/proc/37/task/45/fdinfo': Permission denied
find: '/proc/37/task/46/fdinfo': Permission denied
find: '/proc/37/task/47/fdinfo': Permission denied
find: '/proc/37/task/48/fdinfo': Permission denied
find: '/proc/37/task/49/fdinfo': Permission denied
find: '/proc/37/task/50/fdinfo': Permission denied
find: '/proc/37/task/51/fdinfo': Permission denied
find: '/proc/37/task/52/fdinfo': Permission denied
find: '/proc/37/task/53/fdinfo': Permission denied
find: '/proc/37/task/54/fdinfo': Permission denied
find: '/proc/37/task/55/fdinfo': Permission denied
find: '/proc/37/task/56/fdinfo': Permission denied
find: '/proc/37/task/57/fdinfo': Permission denied
find: '/proc/37/map_files': Permission denied
find: '/proc/37/fdinfo': Permission denied
dmesg: read kernel buffer failed: Operation not permitted
timeout: failed to run command 'journalctl': No such file or directory
timeout: failed to run command 'journalctl': No such file or directory

I didn't find anything about that error. Though all the docs says the container must be started with cap_add=SYS_PTRACE

Changes

Add cap_add=SYS_PTRACE for mssql container.
Also adding user=root as this thread suggests

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes (if something not related to your task is failing, you can ignore it)
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner. We're working on refining the codeowners file quickly.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • If PR title starts with [<language>], double-check that only <language> is impacted by the change
  • No system-tests internal is modified. Otherwise, I have the approval from R&P team
  • CI is green, or failing jobs are not related to this change (and you are 100% sure about this statement)
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added (or removed)?

@cbeauchesne cbeauchesne changed the title Fix MsSql flakyness Fix MsSql flakiness Sep 18, 2024
@cbeauchesne cbeauchesne force-pushed the cbeauchesne/fix-mssql branch 2 times, most recently from f946c5d to a2f0e12 Compare September 18, 2024 08:55
@cbeauchesne cbeauchesne marked this pull request as ready for review September 18, 2024 08:55
@cbeauchesne cbeauchesne requested a review from a team as a code owner September 18, 2024 08:55
@cbeauchesne cbeauchesne merged commit ffc8e4b into main Sep 18, 2024
1 check passed
@cbeauchesne cbeauchesne deleted the cbeauchesne/fix-mssql branch September 18, 2024 08:56
@rochdev
Copy link
Member

rochdev commented Sep 19, 2024

@cbeauchesne
Copy link
Collaborator Author

yes, unfortunatly 😭

@cbeauchesne
Copy link
Collaborator Author

I found this issue : microsoft/mssql-docker#868

And the os used is ubuntu 22.04.5, with kernel 6.8

@cbeauchesne
Copy link
Collaborator Author

Next try : #3082

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants