-
Notifications
You must be signed in to change notification settings - Fork 55
Devicemanager: New error codes + Device Manager unit tests and few other fixes. #420
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for this, both for direct mode and LVM mode?
Tests for device manager is bit challenging, as they deal with devices. I stared with LVM tests: avalluri@c7437e5 . |
1bc573f
to
3cbca7b
Compare
@pohly, I added unit tests for LVM, but I am not sure yet about covering ndctl manager, which needs a bit thinking how to mimic the pmem device. |
@@ -0,0 +1,229 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name of that file seems misspelled, should be pmem-lvm_test.go ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be pmd-lvm_test.go
. I fixed it now.
3cbca7b
to
ee26179
Compare
Device manager tests failed in CI while setting up loop device:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #383 (review), please also revise error reporting and checking such that the error code returned by ns.cs.dm.GetDevice
in nodeserver.go
is checked.
5aa343e
to
127e7ac
Compare
Makefile
Outdated
@@ -13,6 +13,7 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
GO=GOOS=linux GO111MODULE=off go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why GOOS=linux?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why GOOS=linux?
This is not added part of this change. It was existing before, I just moved from that place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't consistent then. I was looking at the first changed line and that only had:
build: $(CMDS) $(TEST_CMDS)
go test -run none ./pkg/... ./test/e2e
127e7ac
to
fc7976a
Compare
test/test.make
Outdated
$(TEST_CMD) $(shell go list $(TEST_ARGS) | sed -e 's;$(IMPORT_PATH);.;') | ||
$(TEST_CMD) $(shell $(GO) list $(TEST_ARGS) | grep -v pmem-device-manager | sed -e 's;$(IMPORT_PATH);.;') | ||
RUN_DM_TESTS = \ | ||
$(TEST_CMD) ./pkg/pmem-device-manager -exec $(SUDO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes "make test" depend on sudo, which is not currently a requirement and also shouldn't be. Developers should be able to build and test on a machine with a user account that isn't allowed to become root.
The other issue is that the test depends on /dev/loop-control
, right?
Let me propose a different approach to running tests that need to be run in a certain environment:
- in the test code, check those preconditions (are root, have /dev/loop-control): if not met, skip the test
- integrate the running of unit tests into our E2E testing:
- build test binaries with
go test -c
- copy test binaries into the first worker
- run them there as root
- build test binaries with
This is more complicated, but it also gives us better test coverage, because then operations that may depend on the OS kernel of the cluster node actually get tested with that kernel. We can also do unit tests that depend on actual PMEM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes "make test" depend on sudo, which is not currently a requirement and also shouldn't be. Developers should be able to build and test on a machine with a user account that isn't allowed to become root.
To avoid this root user. I initially tried ruining these tests in a container (#420 (comment)).
The other issue is that the test depends on
/dev/loop-control
, right?Let me propose a different approach to running tests that need to be run in a certain environment:
in the test code, check those preconditions (are root, have /dev/loop-control): if not met, skip the test
integrate the running of unit tests into our E2E testing:
- build test binaries with
go test -c
- copy test binaries into the first worker
- run them there as root
This is more complicated, but it also gives us better test coverage, because then operations that may depend on the OS kernel of the cluster node actually get tested with that kernel. We can also do unit tests that depend on actual PMEM.
Ok, Instead of running these(device manager) unit tests in full(3-node) test cluster, I would propose to run them in a single virtual machine node. I pushed changes to support this: 45f21a6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This single node vm installs only dependencies needed for PMEM, such as LVM2, ndctl. No other packages like Docker, Kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Instead of running these(device manager) unit tests in full(3-node) test cluster, I would propose to run them in a single virtual machine node.
That's a massive change to some already pretty complex scripts. What's the advantage compared to just using the normal VM cluster?
We need to consider two different use cases:
- a developer using this while making changes to the source code
- the CI jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a massive change to some already pretty complex scripts.
No, it's not that massive change as it looks in the PR. Most of the changes are indentations. It's just replacing few bits in the right place in test/start-kubernetes.sh
script.
fc7976a
to
127d02d
Compare
Defined package level error codes. The implementations are supposed to return these error codes. And also revised the error message formatting and wrapping using go-1.13 fmt package.
Discussion on issue intel#413, raised a requirement that PMEM-CSI should handle the DeleteVolume() call for volumes that the underlined device already deleted manually by the admin. So for such a volume we just ignore and return no error in DeviceManager.DeleteVolume(). This also makes DeleteVolume() idempotent.
e8b480a
to
6287946
Compare
If this what you feel right, we can still run using |
@pohly I took your change pohly@a9e7136 and squashed to ff07726. Thanks. |
Interesting question. Ya, may be true. We cannot ignore that namespace in this case. |
By the way, we already dealing with only active namespaces for ListDevices, this case affects this also. |
There is something weird going on in CI, |
So what do you think of the solution I have here: 45f7c42#diff-b99dbbc6a60be19c9eacba35ad8a5526
What do we need to fix for ListDevices? Should it report inactive, non-empty namespaces and currently ignores them? |
You could try running the E2E tests in that mode locally. |
Should work, I couldn't see any issues with that.
Yes, it also should have that fix. Maybe we better fix this ndctl layer. |
6287946
to
8f0a948
Compare
@@ -93,7 +93,7 @@ func (r *Region) ActiveNamespaces() []*Namespace { | |||
return r.namespaces(true) | |||
} | |||
|
|||
//AllNamespaces returns all namespaces in the region | |||
//AllNamespaces returns all non-zero sized namespaces in the region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here explaining why it ignores namespaces with zero size? I know its in the commit message, but that's not necessarily where someone will look when reading this description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/ndctl/region.go
Outdated
if !onlyActive || C.ndctl_namespace_is_active(ndns) == true { | ||
namespaces = append(namespaces, (*Namespace)(ndns)) | ||
ns := (*Namespace)(ndns) | ||
if (onlyActive && ns.Active()) || ns.Size() != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this break the semantic of onlyActive
?
If it is true, then inactive namespaces are returned if their size is > 0, which is not what the caller asked for, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, Fixed it now. Thanks.
@avalluri: Did you also make some more changes to the Jenkinsfile? It's no longer valid: https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/blue/organizations/jenkins/pmem-csi/detail/PR-420/44/pipeline |
63ca968
to
c3ef004
Compare
…paces While deleting a namespace ndctl might still keeep the namespace inactive and with size zero. And also there might be possible that the unsuccessful deletes via Region.DestroyNamespace() might left the namespace disabled but not deleted, so we should not ignore them too. This change, considers the size of the found namespace and considers only if it's non-zero.
Tests for both device managers: direct and LVM. LVM tests used file backed loop device to mimic an logical volume group, This requires /dev/loop-control file on host where the tests are run. Intentionally disable running these device manager unit tests as it requires sudo/root user permission.
Device manager tests need root/sudo user permission, and loop control device on host where they run. So to avoid this we run these tests under a new virtual machine. Added new make target 'run_dm_tests' which sets up pmem-csi-${CLUSTER}-worker1 node if not exists and runs the device manager unit tests in it. To make use of existing test cluster scripts, made needed changes to them to support creating a single pmem node.
c3ef004
to
b7cb533
Compare
${env.BUILD_IMAGE} bash -c 'set -x; \ | ||
swupd bundle-add openssh-client && \ | ||
make run_dm_tests; \ | ||
make stop'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is now treated as success even if make run_dm_tests
fails. As I intend to remove this step in #473, we can merge this PR without fixing this.
This PR address a couple of issues:
Defined new error codes that are returned by DeviceManager interface methods. Made changes to implementations to use these error codes and revised the error messages using error formatting features provided by go-1.13
fmt
package.Discussion on issue cache data in LVM device manager? #413, raised a requirement that PMEM-CSI should handle the DeleteVolume() call for volumes that the underlined device already deleted manually by the admin. So for such a volume we just ignore and return no error in DeviceManager.DeleteVolume(). This also makes DeleteVolume() idempotent.
Issue DeleteVolume response is incorrect if volume is in use #402 : CSI.DeleteVolume should return FailedPreCondition error in case of the volume is in use. Test case added for the same.
Unit tests for device manager
This PR also contains a commit that removes the unused code in DeviceManager.