-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-30365 Add XREF Sasha service to K8s #18798
base: candidate-9.6.x
Are you sure you want to change the base?
HPCC-30365 Add XREF Sasha service to K8s #18798
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-30365 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jackdelv, it's looking good, I've given it a first pass, and made some comments. Please take a look.
dali/base/dadfs.cpp
Outdated
@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm | |||
StringBuffer fullname; | |||
if (findPathSepChar(name)==NULL) | |||
addPathSepChar(fullname.append(partdir)); | |||
#ifdef _CONTAINERIZED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, using isContainerized() vs the #define is preferrable (so that the code is scanned/compiled by developers routinely etc.)
And, if both can be avoided and the code made general (even if the use case is not currently used in bare-metal), that is most preferrable.
This is a case in point, we should cope with this even in the bare-metal case in general utility classes/methods, even though it is not configurable in BM at the moment.
But also, it looks like this exposes a wide problem(!). This function is used not just by XREF, it's also used by dadfs.cpp removePhysicalFiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, just noticed that removePhysicalFiles is not used by anything - we should delete it avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect constructPartFilename should be deprecated (probably removed too).
In XREF's case, it needs a way to get a physical part + endpoint given the meta data it's collected.
The 'lnentry' (CLogicalNameEntry) it has built from meta data, should perhaps have a method to built it, it should know (from the meta data) if the file is striped and if it has dirPerPart.
Btw, dirPerPart (a flag in the meta data) means it has a directory extension that includes the file part number (i.e. each part is in it's own directory)
'striping' is different, and an additional directory not per part, but enumerated over N striped disks. The stripNum is calculated from a hash (@lfnHash) stored in the file meta. Originally calculated and set there by hashing the logicalfilename.
See makePhysicalPartName and it's callers (and uses of lfnHash and calcStripeNumber).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed remotePhysicalFiles and added a constructPartFilename function to CLogicalNameEntry. I had some trouble finding the dirPerPart flag in the metadata. The only examples I was able to find was using queryDirPerPart() on a StoragePlane, but when trying that it seemed the metadata was missing and it was defaulting to the return value of isContainerized().
Added lfnHash to the CLogicalNameEntry and used calcStripeNumber on the part number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old constructPartFilename function had some tests for it, so I couldn't delete the function without also removing the tests. Should there now be tests for the new constructPartFilename if it does get added to CLogicalNameEntry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should remove the old constructPartFilename, that is now only used by tests, and if not covered already, add tests that exercise makePhysicalPartName.
But for now, for this PR, I would just add comments that it is deprecated, but test code uses it - and open a separate JIRA to revisit sometime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the new CLogicalNameEntry::constructPartFilename, I think you can remove the changes here right? i.e.:
#ifdef _CONTAINERIZED
if (partmax>1)
addPathSepChar(fullname.append(partno+1)); // If there are multiple files they will be striped by part number
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the _Containerized changes and added a comment to the old constructPartFilename.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
StringBuffer stripeMask; | ||
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname)); | ||
// Replace part number in stripe directory with mask | ||
mname.replaceString(stripeDir, stripeMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what this is for? (let's discuss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the XREF code is trying to match all the parts of an orphan file and each part is in its own directory, it incorrectly creates separate entries for each part when they should be combined. Adding a part mask to the directory for each part allows them to be matched properly. Sorry for the confusion if I caused any because I was using the term striped incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typical path on a striped storage that also has dirPerPart=true, will look like:
/var/lib/HPCCSystems/hpcc-data/d8/somescope/otherscope/35/thelfnname._35_of_400
where 'd8' is the stripe sub-directory, and '35' is the dir-per-part sub-directory.
When parsing a filename like this, the dirPerPart format can be deduced if the directory before the filename, is a numeric only (which is invalid for a scope/filename, which must begin with an alpha character), and that it matches the part # of the filename.
The fact that it's striped is less obvious, but could be deduced by the fact that the leading part of the filename '/var/lib/HPCCSystems/hpcc-data' is a prefix of one of the planes, and the plane specs will say numDevices>1
From the code here, it looks like it's dealing with the dir-per-part #, but code referencing 'stripe'.
I suspect this needs looking at closely, and testing with files that are on striped storage, and dirPerPlane, and ones that aren't - would be good to add some CPPUNIT tests here of various paths to make sure it is giving expected results.
I haven't looked in detail how the info is being used by the callers of parseFileName, but I suspect it may need to pass back, the deduced plane, and perhaps the stripe and/or flags like dirPerPart.. not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have restructured how it gets the dir-per-part number and added checks for the stripe number. I was able to test by changing the string before passing into the function, but I am not sure how to add to the CPPUNIT tests.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
@@ -2600,7 +2616,11 @@ class CXRefManager: public CXRefManagerBase | |||
Owned<IGroup> g; | |||
unsigned j; | |||
if (!nclusters) { | |||
#ifdef _CONTAINERIZED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preferable to use isCotainerized()
In fact, I think it would be worth just changing the message to the planes version.
In BM, internally, we generate planes for all clusters, such that methods like getStoragePlane/getPlanesIterator work with clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to planes message.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
for (int i = 0; i < numdirs; i++) | ||
{ | ||
Owned<IPropertyTree> storagePlane = getStoragePlane(clusters[i]); | ||
dirs[i] = storagePlane->queryProp("@prefix"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it makes sense to be enumerating the directories into 'dirs' and passing them xrefmanager.process.
In BM, the dirs are not different cluster dirs, but the primary/replicate dirs, which are consistent across all clusters.
I think we should keep it simple in cloud version, and assert that we are only dealing with 1 plane at a time (at least initially), the dirs should contain a single directory of the plane prefix being processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to only pass in a single directory through dirs.
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied."); | ||
|
||
#ifdef _CONTAINERIZED | ||
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)"); | ||
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This k8s code may work for BM too.. would be good to unify if so.
BM has "data" planes automatically generated from the clusters, so it should in theory..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the k8s code on BM and there were additional storage planes that didn't work. In BM when iterating through the available storage planes there were mythor, myroxie (which both worked normally), mythor_mirror, hthor_hthor, and hthor_myeclagent (which all returned "Not Found" when trying to generate the results). Additionally, there was no storage plane for SuperFiles. Should this still be added manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something, did you manage to consolidate it such that BM works using common code (e.g. getPlanesIterator("data". .) ?
Additionally, there was no storage plane for SuperFiles. Should this still be added manually?
hm not sure - superfiles don't have their own plane, they could even contain files that were on different planes.
Not sure how XREF handles them tbh - will have to look more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the BM code now iterates through the 'clusters' using getPlanesIterator("data", nullptr). It partially worked where the Thor and Roxie clusters showed up properly and could be run, but additional 'clusters' (mythor_mirror, hthor_hthor, etc) were found that failed when trying to generate results.
I have removed the DBGLOG with a containerized message from the generalized code to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mythor_mirror, hthor_hthor, etc
I think the (BM only) mirror planes should probably be ignored here. It would best if they were ignored not by name, but by 'copy', which currently isn't recorded when the plane IPropertyTree is created, but could be.
The hthor_hthor plane .. what was the error? (and can you check where "hthor_hthor" is being constructed?)
Hthor wasn't available as an option before in BM - I don't know why specifically it wasn't supported, but I think we can ignore for now this plane type, with a big fat comment saying we should revisit this functionality.
(but like ignoring the mirror, we should ignore by type, not by name, which may need a flag recording in the IPT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to find where the hthor_hthor plane is created. I don't even see it in the list of planes anymore. I'm really not sure what happened with that. I was able to see the error when hthor__hthor did show up. When I clicked generate, the status returned "Not Found".
that's fine for now. We can move it to a separate esp service pod if we really want to. |
Only store a single storage plane in dirs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I left some responses to your comments and questions. Back to you.
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied."); | ||
|
||
#ifdef _CONTAINERIZED | ||
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)"); | ||
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the k8s code on BM and there were additional storage planes that didn't work. In BM when iterating through the available storage planes there were mythor, myroxie (which both worked normally), mythor_mirror, hthor_hthor, and hthor_myeclagent (which all returned "Not Found" when trying to generate the results). Additionally, there was no storage plane for SuperFiles. Should this still be added manually?
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
StringBuffer stripeMask; | ||
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname)); | ||
// Replace part number in stripe directory with mask | ||
mname.replaceString(stripeDir, stripeMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the XREF code is trying to match all the parts of an orphan file and each part is in its own directory, it incorrectly creates separate entries for each part when they should be combined. Adding a part mask to the directory for each part allows them to be matched properly. Sorry for the confusion if I caused any because I was using the term striped incorrectly.
dali/base/dadfs.cpp
Outdated
@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm | |||
StringBuffer fullname; | |||
if (findPathSepChar(name)==NULL) | |||
addPathSepChar(fullname.append(partdir)); | |||
#ifdef _CONTAINERIZED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed remotePhysicalFiles and added a constructPartFilename function to CLogicalNameEntry. I had some trouble finding the dirPerPart flag in the metadata. The only examples I was able to find was using queryDirPerPart() on a StoragePlane, but when trying that it seemed the metadata was missing and it was defaulting to the return value of isContainerized().
Added lfnHash to the CLogicalNameEntry and used calcStripeNumber on the part number.
dali/base/dadfs.cpp
Outdated
@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm | |||
StringBuffer fullname; | |||
if (findPathSepChar(name)==NULL) | |||
addPathSepChar(fullname.append(partdir)); | |||
#ifdef _CONTAINERIZED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old constructPartFilename function had some tests for it, so I couldn't delete the function without also removing the tests. Should there now be tests for the new constructPartFilename if it does get added to CLogicalNameEntry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelv - looks good overall.
Please see comments and questions.
dali/base/dadfs.cpp
Outdated
@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm | |||
StringBuffer fullname; | |||
if (findPathSepChar(name)==NULL) | |||
addPathSepChar(fullname.append(partdir)); | |||
#ifdef _CONTAINERIZED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should remove the old constructPartFilename, that is now only used by tests, and if not covered already, add tests that exercise makePhysicalPartName.
But for now, for this PR, I would just add comments that it is deprecated, but test code uses it - and open a separate JIRA to revisit sometime.
dali/base/dadfs.cpp
Outdated
@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm | |||
StringBuffer fullname; | |||
if (findPathSepChar(name)==NULL) | |||
addPathSepChar(fullname.append(partdir)); | |||
#ifdef _CONTAINERIZED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the new CLogicalNameEntry::constructPartFilename, I think you can remove the changes here right? i.e.:
#ifdef _CONTAINERIZED
if (partmax>1)
addPathSepChar(fullname.append(partno+1)); // If there are multiple files they will be striped by part number
#endif
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true); | ||
const char * prefix = plane->queryPrefix(); | ||
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices()); | ||
bool dirPerPart = max>1?plane->queryDirPerPart():false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not make any practical difference - but whether a file is dirPerPart is a flag in the logical file meta data.
In theory, a logical file may have this flag set, but the plane doesn't or vice versa. That could only happen if the plane spec. had changed since files were created.
You can determine it from the file from "@flags". See CFileDescriptor::getFlags()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added file to new constructPartFilename parameters to get dirPerPart from the flags.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
} | ||
expandMask(partName, pmask, partNo, max); | ||
|
||
StringBuffer fullname; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial/style: better to co-locate closer to where used ~(i.e. just before makePhysicalPartName) for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
StringBuffer stripeMask; | ||
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname)); | ||
// Replace part number in stripe directory with mask | ||
mname.replaceString(stripeDir, stripeMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typical path on a striped storage that also has dirPerPart=true, will look like:
/var/lib/HPCCSystems/hpcc-data/d8/somescope/otherscope/35/thelfnname._35_of_400
where 'd8' is the stripe sub-directory, and '35' is the dir-per-part sub-directory.
When parsing a filename like this, the dirPerPart format can be deduced if the directory before the filename, is a numeric only (which is invalid for a scope/filename, which must begin with an alpha character), and that it matches the part # of the filename.
The fact that it's striped is less obvious, but could be deduced by the fact that the leading part of the filename '/var/lib/HPCCSystems/hpcc-data' is a prefix of one of the planes, and the plane specs will say numDevices>1
From the code here, it looks like it's dealing with the dir-per-part #, but code referencing 'stripe'.
I suspect this needs looking at closely, and testing with files that are on striped storage, and dirPerPlane, and ones that aren't - would be good to add some CPPUNIT tests here of various paths to make sure it is giving expected results.
I haven't looked in detail how the info is being used by the callers of parseFileName, but I suspect it may need to pass back, the deduced plane, and perhaps the stripe and/or flags like dirPerPart.. not sure.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
{ | ||
DBGLOG("CONTAINERIZED(runXRef)"); | ||
numdirs = 1; | ||
Owned<IPropertyTree> storagePlane = getStoragePlane(clusters[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think xrefmanger.process() will now (in containerized) handle >1 cluster, processing them in series,
but getting the dir (prefix) from the 1st here, won't be correct if they different.
I think this directory getting code would probably make more sense inside the process() method, where in BM, it can do as it does now and assume same dir for all, but in containerized, it can get the correct dir for each 'cluster' (plane) each time in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it makes more sense this way. I have moved the directory getting code into process.
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied."); | ||
|
||
#ifdef _CONTAINERIZED | ||
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)"); | ||
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something, did you manage to consolidate it such that BM works using common code (e.g. getPlanesIterator("data". .) ?
Additionally, there was no storage plane for SuperFiles. Should this still be added manually?
hm not sure - superfiles don't have their own plane, they could even contain files that were on different planes.
Not sure how XREF handles them tbh - will have to look more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I have responded to your comments and questions.
I think the new constructPartFilename function should be added to the CPPUNIT tests, but I am not sure how they work or the best way to extend them.
dali/base/dadfs.cpp
Outdated
@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm | |||
StringBuffer fullname; | |||
if (findPathSepChar(name)==NULL) | |||
addPathSepChar(fullname.append(partdir)); | |||
#ifdef _CONTAINERIZED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the _Containerized changes and added a comment to the old constructPartFilename.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
} | ||
expandMask(partName, pmask, partNo, max); | ||
|
||
StringBuffer fullname; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true); | ||
const char * prefix = plane->queryPrefix(); | ||
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices()); | ||
bool dirPerPart = max>1?plane->queryDirPerPart():false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added file to new constructPartFilename parameters to get dirPerPart from the flags.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
StringBuffer stripeMask; | ||
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname)); | ||
// Replace part number in stripe directory with mask | ||
mname.replaceString(stripeDir, stripeMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have restructured how it gets the dir-per-part number and added checks for the stripe number. I was able to test by changing the string before passing into the function, but I am not sure how to add to the CPPUNIT tests.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
{ | ||
DBGLOG("CONTAINERIZED(runXRef)"); | ||
numdirs = 1; | ||
Owned<IPropertyTree> storagePlane = getStoragePlane(clusters[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it makes more sense this way. I have moved the directory getting code into process.
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied."); | ||
|
||
#ifdef _CONTAINERIZED | ||
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)"); | ||
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the BM code now iterates through the 'clusters' using getPlanesIterator("data", nullptr). It partially worked where the Thor and Roxie clusters showed up properly and could be run, but additional 'clusters' (mythor_mirror, hthor_hthor, etc) were found that failed when trying to generate results.
I have removed the DBGLOG with a containerized message from the generalized code to avoid confusion.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices()); | ||
|
||
// Get dir-per-part # from file metadata or storage plan info | ||
FileDescriptorFlags flags = static_cast<FileDescriptorFlags>(file.getPropInt("Attr/@flags")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I added file to the parameters for constructPartFilename, but I think that may be incorrect since it uses file in the CLogicalNameEntry constructor. Which is considered best practice, initializing CLogicalNameEntry with the dir-per-part/stripe-number and adding additional members to the class or passing file into constructPartFilename and getting the information right when we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think best to follow same pattern as for other info. extracted from the IPT file now, and pull them out and store them in CLogicalNameEntry as members. So, Attr/@flags can be read there once and used here. See previous command above. Otherwise, there is a lot of repeated work re-calling things for every part of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved IPT file extraction to the CLogicalNameEntry constructor to avoid re-calling things too many times. Added info as members of CLogicalNameEntry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackdelv - apologies for taking an age to get back to this.
Please see comments.
It's definitely getting closer.
@@ -508,7 +508,8 @@ sasha: | |||
#disabled: true | |||
#switchMinTime: 1 | |||
#queues: "*" | |||
|
|||
|
|||
xref: {} # NB: no properties defined, use defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with others, it would be good to list (commented out) common xref config options here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some options that don't really mean anything. Should they be usable options or just placeholders? I don't think any would be used even if they were defined right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be..
saserver should be loading the component configuration (line 331 of saserver.cpp) into 'serverConfig',
and various points in saxref.cpp looks for properties in it.
Some may not be relevant for containerized.
Some will need reworking, e.g. xrefMaxMemory can prob. remain as an option, but it should by default use all available memory (as resourced for the container), e.g. 90% of container memory. Probably similarly with maxThreads.
The schedule ones are still relevant.
As the code stands, it's going to be trying to pick up properties from "DFUXRef/" because in BM, the sasha service configs are subtrees within 1 config. In Containerized it should assume the root is the component config
( e.g. see createSashaFileExpiryServer's way of conditionally treating the config depending on whether containerized or not ).
It may be okay as it is for now, it should default to semi ok default, and then we can open a separate JIRA to revisit.
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied."); | ||
|
||
#ifdef _CONTAINERIZED | ||
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)"); | ||
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mythor_mirror, hthor_hthor, etc
I think the (BM only) mirror planes should probably be ignored here. It would best if they were ignored not by name, but by 'copy', which currently isn't recorded when the plane IPropertyTree is created, but could be.
The hthor_hthor plane .. what was the error? (and can you check where "hthor_hthor" is being constructed?)
Hthor wasn't available as an option before in BM - I don't know why specifically it wasn't supported, but I think we can ignore for now this plane type, with a big fat comment saying we should revisit this functionality.
(but like ignoring the mirror, we should ignore by type, not by name, which may need a flag recording in the IPT).
|
||
BoolHash uniqueProcesses; | ||
BoolHash uniquePlanes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addUniqueXRefNode and it's use of a HT to track unique, doesn't seem useful now (and not sure it was).
i.e. each plane name is guarantee to be unique anyway..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the use of addUniqueXRef node. Should the function be deleted entirely if it isn't useful anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably, but either way, we need to be careful this unification on planes doesn't break XREF in BM.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
expandMask(partName, pmask, partNo, max); | ||
|
||
// Get stripeNum from storage plane | ||
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially the plane may not exist.
This an other errors at this point should abort processing this file further I think, and somehow be part of the report of logical files which are 'bad'.
in fact it's calling this per file part.. we should minimize the work..
it looks like this and prefix (and anything else that is at the file level), should have been called/extracted once, probably in processFile, when the lnentry is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think constructPartFilename should be moved outside of xref and into dadfs.cpp (where the deprecated version is), and extended to take any necessary parameters it needs.
To avoid confusion with the deprecated, but not yet completely deleted old method, we should rename the old method to e.g. deprecatedConstructPartFilename for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for plane. When an error occurs at this point, it is logged with CXRefManagerBase::error.
Moved file level info to lnentry creation and move constructPartFilename to dadfs.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should pass in required=false, otherwise getDataStoragePlane will throw an exception, instead of the more contextual error message below.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
for (;;) { | ||
char c=*name; | ||
if (!c) | ||
break; | ||
if ((c=='d')&&(isdigit(name[1]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This routine should really parse the filename, looking for a plane prefix match first.
Reject if no match. All files should exist in a plane.
Then check next segment for striping, and check if corresponding plane has striping, if it hasn't then complain/reject.
Then parse remainder of path, and check penultimate segment for dir-per-part pattern (i.e. only check path segment before the filename).
Because this is called a lot, it does need to be efficient.
So I would collect the data plane prefixes upfront, then initially check to see if the name has a prefix of one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to check prefixes of data storage planes at the beginning. If no match is found throw an error. If a storage plane prefix is matched then it checks numDevices>1 and looks for striping in the next segment.
Skips what has already been added, parses the remainder of the path, and avoids copying until the filename is found. Once the file name is found it checks the last segment for a dir-per-part number. If a number is found and matches the part number of the path then the mask is added to mname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay, but given how many times (maybe a billion times perhaps on prod), we may want to revisit, to cache the plane prefixes, to avoid iterating over the IPT each time.
Also, given it's fairly opaque functionality and critical, it would be good to add a CPPUNIT test with a list of paths to validate it behaves as expected for each. Could you add ?
- Not sure if any options are used
- Change deprecated constructPartFilename to deprecatedConstructPartFilename - Keeps the constructPartFilename method simply for passing members to the function - Move file level info to creation of lnentry rather than get it for each part - Put guards on variables that can be null/empty
- Find match first and throw an error if one is not found - Check next segment for striping if numDevices>1 - Only check last segment before filename for dir-per-part number - avoid copying single chars and copy chunks after we look for dir-per-part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith I responded to all your comments and left a couple small questions. Back to you.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
expandMask(partName, pmask, partNo, max); | ||
|
||
// Get stripeNum from storage plane | ||
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for plane. When an error occurs at this point, it is logged with CXRefManagerBase::error.
Moved file level info to lnentry creation and move constructPartFilename to dadfs.cpp
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices()); | ||
|
||
// Get dir-per-part # from file metadata or storage plan info | ||
FileDescriptorFlags flags = static_cast<FileDescriptorFlags>(file.getPropInt("Attr/@flags")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved IPT file extraction to the CLogicalNameEntry constructor to avoid re-calling things too many times. Added info as members of CLogicalNameEntry.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
unsigned n; | ||
unsigned d; | ||
mspec.calcPartLocation(partNo, max, copy, grp?grp->ordinality():max, n, d); | ||
setReplicateFilename(fullname, d); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added !isContainerized().
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
@@ -923,17 +964,60 @@ static bool parseFileName(const char *name,StringBuffer &mname,unsigned &num,uns | |||
name = nonrepdir.str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added !isContainerized().
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied."); | ||
|
||
#ifdef _CONTAINERIZED | ||
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)"); | ||
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to find where the hthor_hthor plane is created. I don't even see it in the list of planes anymore. I'm really not sure what happened with that. I was able to see the error when hthor__hthor did show up. When I clicked generate, the status returned "Not Found".
|
||
BoolHash uniqueProcesses; | ||
BoolHash uniquePlanes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the use of addUniqueXRef node. Should the function be deleted entirely if it isn't useful anymore?
@@ -508,7 +508,8 @@ sasha: | |||
#disabled: true | |||
#switchMinTime: 1 | |||
#queues: "*" | |||
|
|||
|
|||
xref: {} # NB: no properties defined, use defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some options that don't really mean anything. Should they be usable options or just placeholders? I don't think any would be used even if they were defined right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jackdelv.
See follow up comments/replies.
Is the mirror plane being handled correctly now (being ignored) in BM?
@@ -508,7 +508,8 @@ sasha: | |||
#disabled: true | |||
#switchMinTime: 1 | |||
#queues: "*" | |||
|
|||
|
|||
xref: {} # NB: no properties defined, use defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be..
saserver should be loading the component configuration (line 331 of saserver.cpp) into 'serverConfig',
and various points in saxref.cpp looks for properties in it.
Some may not be relevant for containerized.
Some will need reworking, e.g. xrefMaxMemory can prob. remain as an option, but it should by default use all available memory (as resourced for the container), e.g. 90% of container memory. Probably similarly with maxThreads.
The schedule ones are still relevant.
As the code stands, it's going to be trying to pick up properties from "DFUXRef/" because in BM, the sasha service configs are subtrees within 1 config. In Containerized it should assume the root is the component config
( e.g. see createSashaFileExpiryServer's way of conditionally treating the config depending on whether containerized or not ).
It may be okay as it is for now, it should default to semi ok default, and then we can open a separate JIRA to revisit.
|
||
BoolHash uniqueProcesses; | ||
BoolHash uniquePlanes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably, but either way, we need to be careful this unification on planes doesn't break XREF in BM.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
expandMask(partName, pmask, partNo, max); | ||
|
||
// Get stripeNum from storage plane | ||
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should pass in required=false, otherwise getDataStoragePlane will throw an exception, instead of the more contextual error message below.
makePhysicalPartName(lname, partNo+1, max, fullname, 0, DFD_OSdefault, prefix, dirPerPart, stripeNum); | ||
|
||
unsigned n = 0; | ||
if (!isContainerized()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #18798 (comment)
can you add a comment here to something like:
"revisit: constructPartFilename should probably be refactored not to deal with replicate directories, by pre-determining the alternate prefix if copy>0"
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
IGroup *grp = lnentry->queryGroup(); | ||
if (!grp) { | ||
manager.warn(lnentry->lname.get(),"No group found, ignoring logical file"); | ||
return; | ||
} | ||
constructPartFilename(grp,partno,numparts,partname,partmask,partdir,replicate,replicateoffset,rfn); | ||
lnentry->constructPartFilename(partno, replicate, rfn, file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added replicate?1:0 because that was how it was handled in the deprecated version, but is that correct? What does copy mean?
I think good enough for now. It is assuming 2 copies (in BM), but theoretically there could be more.
@@ -1581,33 +1648,32 @@ void loadFromDFS(CXRefManagerBase &manager,IGroup *grp,unsigned numdirs,const ch | |||
if (lnentry->outsidenodes.find(rep)==NotFound) | |||
lnentry->outsidenodes.append(rep); | |||
} | |||
if (isContainerized()) | |||
{ | |||
UWARNLOG("Replication not supported in containerized version"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to generate a very large number of of logged warnings.
replication should really be a property of the plane.
For now, let's just turn this into a comment. In containerized we not expect planes to be replicated.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
IGroup *grp = lnentry->queryGroup(); | ||
if (!grp) { | ||
manager.warn(lnentry->lname.get(),"No group found, ignoring logical file"); | ||
return; | ||
} | ||
constructPartFilename(grp,partno,numparts,partname,partmask,partdir,replicate,replicateoffset,rfn); | ||
lnentry->constructPartFilename(partno, replicate, rfn, file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added if (isContainerized()) to break out of loop and warn the user. Is the user the correct audience?
this will be hit very frequently. Should just be a comment. See other PR comment near code.
dali/dfuXRefLib/dfuxreflib.cpp
Outdated
for (;;) { | ||
char c=*name; | ||
if (!c) | ||
break; | ||
if ((c=='d')&&(isdigit(name[1]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay, but given how many times (maybe a billion times perhaps on prod), we may want to revisit, to cache the plane prefixes, to avoid iterating over the IPT each time.
Also, given it's fairly opaque functionality and critical, it would be good to add a CPPUNIT test with a list of paths to validate it behaves as expected for each. Could you add ?
name++; | ||
while (*name&&isdigit(*name)) | ||
name++; | ||
mname.append("d$P$"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why d$P$?
These changes add Xref functionality to the containerized version of the platform. There were a couple changes because thor file parts were striped across multiple directories.
I'm not sure that I correctly added the new XRef pod because everything still runs on the ECL Watch pod.
What is the best way to make this toggleable? An additional #ifdef? Should I still be using _CONTAINERIZED?
Type of change:
Checklist:
Smoketest:
Testing: