- 
                Notifications
    
You must be signed in to change notification settings  - Fork 15.1k
 
[CAS] Add OnDiskGraphDB and OnDiskKeyValueDB #114102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CAS] Add OnDiskGraphDB and OnDiskKeyValueDB #114102
Conversation
e980f34    to
    c497eeb      
    Compare
  
    Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement ObjectStore and ActionCache respectively. Those are on-disk persistent storage that build upon OnDiskTrieHashMap and implements key functions that are required by LLVMCAS interfaces. This abstraction layer defines how the objects are hashed and stored on disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB also defines: * How objects of various size are store on disk and are referenced by the trie nodes. * How to store the references from one stored object to another object that is referenced. In addition to basic APIs for ObjectStore and ActionCache, other advances database configuration features can be implemented in this layer without exposing to the users of the LLVMCAS interface. For example, OnDiskGraphDB has a faulty in function to fetch data from an upstream OnDiskGraphDB if the data is missing. Reviewers: Pull Request: #114102
| 
          
 ✅ With the latest revision this PR passed the C/C++ code formatter.  | 
    
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement ObjectStore and ActionCache respectively. Those are on-disk persistent storage that build upon OnDiskTrieHashMap and implements key functions that are required by LLVMCAS interfaces. This abstraction layer defines how the objects are hashed and stored on disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB also defines: * How objects of various size are store on disk and are referenced by the trie nodes. * How to store the references from one stored object to another object that is referenced. In addition to basic APIs for ObjectStore and ActionCache, other advances database configuration features can be implemented in this layer without exposing to the users of the LLVMCAS interface. For example, OnDiskGraphDB has a faulty in function to fetch data from an upstream OnDiskGraphDB if the data is missing. Reviewers: Pull Request: #114102
a2bb4e4    to
    e2d20f8      
    Compare
  
    Created using spr 1.3.7
Created using spr 1.3.7
Created using spr 1.3.7
Created using spr 1.3.7
Created using spr 1.3.7
| return InternalRef(Offset.get()); | ||
| } | ||
| 
               | 
          ||
| friend bool operator==(InternalRef LHS, InternalRef RHS) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity: Can modern C++ = default this since there is only one data member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is c++20? I don't think we are there yet.
        
          
                llvm/lib/CAS/OnDiskGraphDB.cpp
              
                Outdated
          
        
      | 
               | 
          ||
| using StandaloneDataMapTy = StandaloneDataMap<16>; | ||
| 
               | 
          ||
| struct InternalHandle { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment that explains what the "Internal" part of the name means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this class doesn't provide much value in terms of adding readability. It is basically providing some helper function when decoding ObjectHandle. I can just fold everything into ObjectHandle while not exposing the detailed encoding scheme in the header. Removing InternalHandle.
Created using spr 1.3.7
Created using spr 1.3.7
Created using spr 1.3.7
| 
           AArch64 bot failure due to AWS outage. Otherwise, don't see any problem. Merging.  | 
    
| 
           LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/16197 Here is the relevant piece of the build log for the reference | 
    
| 
           This broke the MSAN bot. Please fix forward or revert if you need extra time  | 
    
| 
           LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/13253 Here is the relevant piece of the build log for the reference | 
    
| 
           LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/14/builds/4508 Here is the relevant piece of the build log for the reference | 
    
          
 Sorry didn't see the MSAN bot. The error is weird maybe due to how   | 
    
          
 #164457 landed in https://lab.llvm.org/buildbot/#/builders/164/builds/14720 but the test (  | 
    
          
 Attempt 2 here: #164493 I didn't figure out the real reason of failure last time. It was due to not all the fields in   | 
    
          
 Thanks for the quick fix!  | 
    
| OnDiskKeyValueDBTest.cpp | ||
| OnDiskTrieRawHashMapTest.cpp | ||
| ProgramTest.cpp | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most other places unconditionally include tests for conditional features and then ifdef out all the test code based on what's in config.h. Any reason not to do that here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is cleaner this way than #ifdef the entire file out for those files. I can switch back to old ways if that is preferred.
I guess the other solution is put all of those under one test fixture and GTEST_SKIP() in Setup() if the build configuration is different.
| 
           The test cases introduced here are failing in our CI on Red Hat Enterprise Linux 10, even last night despite all of those corrections merged later. This is the list of the systems on which these tests do .not. fail: 
  | 
    
| 
           The error log on RHEL10:  | 
    
| 
           @pawosm-arm what is your target triple for those systems? What is the file system format? Do you have ways to reproduce it like from a docker image?  | 
    
| 
           We have down stream CI running on Ubuntu and do not observe such errors. Let me know how can I help to fix it (or figure out some limitations).  | 
    
          
 It's the same as in Ubuntu, and as in the previous versions of RHEL:  The filesystem is a standard ext4, and our building method is publicly visible: https://github.com/arm/arm-toolchain/blob/arm-software/arm-software/linux/build.sh (https://github.com/arm/arm-toolchain/blob/arm-software/arm-software/linux/build.sh-HOWTO.md for more details)  | 
    
          
 That is weird. I have trouble to get a RHEL10 container to build llvm. The error looks like some difference in file system memory mapping but it is hard to say without reproducing it. Let me know what you want to do in the next step. This is unit-test so we can disable that for now if we have a good macro to guard it against.  | 
    
| 
           I manage to build inside a container but I am not able to reproduce:  | 
    
          
 That is extremely concerning. This test might have exposed something serious which have a potential to haunt the userspace programs doing specific or otherwise extensive filesystem I/O.  | 
    
          
 I need to obtain some more knowledge from our devops, every single detail that could make any difference even the slightest one that they may consider insignificant that could result in this unexpected difference in file system memory mapping. Also, could you write some higher level explanation what is the purpose of the code (and the test) that's failing? Maybe this could give them a hint where to look at...  | 
    
          
 There is an error message   | 
    
          
 A naïve question: could filesystem size affect that? or lack of free space?  | 
    
| 
           I would expect running out of space will trigger a different error, and this error also happens on the read only side. Highly unlikely but I can't say for sure  | 
    
| 
           Good news is that I've managed to reproduce it manually, outside of our CI, which means the problem isn't of CI-magical class: The bad news, I still don't know what the root cause could be.  | 
    
| 
           Is this deterministically failing? Is there anyway we can figure out a way to reproduce it myself?  | 
    
| 
           Trouble is, there is nothing unusual in the docker image I've tried: https://catalog.redhat.com/en/software/containers/ubi10/ubi/66f2b46b122803e4937d11ae#packages I've also managed to reproduce it again, on a much simpler AArch64 machine, so it's not a case of using a sophisticated hardware. Have you got a smaller piece of code that does the same thing so I could compile and run it in there? Could it be a glibc bug?  | 
    
| 
           Oh, I can reproduce now. It seems only when using  Looks like a corrupted data record somehow.  | 
    
          
 Makes sense, notice that on us it happens as we're testing the bootstrap compiler, which has been built with gcc/g++.  | 
    
| 
           I don't know if it is a   | 
    
| 
           I can't see any obvious logic mistakes in the code. I haven't dig too deep but it seems   | 
    
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement ObjectStore and ActionCache respectively. Those are on-disk persistent storage that build upon OnDiskTrieHashMap and implements key functions that are required by LLVMCAS interfaces. This abstraction layer defines how the objects are hashed and stored on disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB also defines: * How objects of various size are store on disk and are referenced by the trie nodes. * How to store the references from one stored object to another object that is referenced. In addition to basic APIs for ObjectStore and ActionCache, other advances database configuration features can be implemented in this layer without exposing to the users of the LLVMCAS interface. For example, OnDiskGraphDB has a faulty in function to fetch data from an upstream OnDiskGraphDB if the data is missing.
Fix MSAN failure and expensive test failure.
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement ObjectStore and ActionCache respectively. Those are on-disk persistent storage that build upon OnDiskTrieHashMap and implements key functions that are required by LLVMCAS interfaces. This abstraction layer defines how the objects are hashed and stored on disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB also defines: * How objects of various size are store on disk and are referenced by the trie nodes. * How to store the references from one stored object to another object that is referenced. In addition to basic APIs for ObjectStore and ActionCache, other advances database configuration features can be implemented in this layer without exposing to the users of the LLVMCAS interface. For example, OnDiskGraphDB has a faulty in function to fetch data from an upstream OnDiskGraphDB if the data is missing.
Fix MSAN failure and expensive test failure.
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement
ObjectStore and ActionCache respectively. Those are on-disk persistent
storage that build upon OnDiskTrieHashMap and implements key functions
that are required by LLVMCAS interfaces.
This abstraction layer defines how the objects are hashed and stored on
disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB
also defines:
the trie nodes.
that is referenced.
In addition to basic APIs for ObjectStore and ActionCache, other
advances database configuration features can be implemented in this
layer without exposing to the users of the LLVMCAS interface. For
example, OnDiskGraphDB has a faulty in function to fetch data from an
upstream OnDiskGraphDB if the data is missing.