Skip to content

Comments

Allow more devices per gateway setup#917

Merged
grafnu merged 54 commits intofaucetsdn:masterfrom
grafnu:dlimit
Sep 30, 2021
Merged

Allow more devices per gateway setup#917
grafnu merged 54 commits intofaucetsdn:masterfrom
grafnu:dlimit

Conversation

@grafnu
Copy link
Collaborator

@grafnu grafnu commented Aug 31, 2021

No description provided.

@grafnu grafnu requested review from anurag6 and henry54809 August 31, 2021 16:17
@codecov
Copy link

codecov bot commented Aug 31, 2021

Codecov Report

Merging #917 (55b2c02) into master (de55ccc) will increase coverage by 0.20%.
The diff coverage is 94.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #917      +/-   ##
==========================================
+ Coverage   82.24%   82.44%   +0.20%     
==========================================
  Files          46       46              
  Lines        5774     5857      +83     
==========================================
+ Hits         4749     4829      +80     
- Misses       1025     1028       +3     
Flag Coverage Δ
ata 62.61% <78.97%> (-0.09%) ⬇️
aux 68.22% <75.00%> (+0.48%) ⬆️
base 66.34% <71.02%> (+0.35%) ⬆️
dhcp 67.49% <73.86%> (+0.47%) ⬆️
many 67.21% <65.34%> (+0.28%) ⬆️
mud 72.08% <71.02%> (+0.32%) ⬆️
switch 67.73% <90.90%> (+0.43%) ⬆️
topo 66.46% <70.45%> (+0.43%) ⬆️
unit 32.47% <24.43%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
daq/proto/system_config_pb2.py 100.00% <ø> (ø)
daq/topology.py 95.85% <89.65%> (-0.24%) ⬇️
daq/host.py 90.92% <92.30%> (-0.23%) ⬇️
daq/runner.py 85.34% <93.93%> (+0.48%) ⬆️
daq/base_gateway.py 93.58% <97.22%> (+1.24%) ⬆️
daq/dhcp_monitor.py 99.05% <100.00%> (+0.05%) ⬆️
daq/external_gateway.py 100.00% <100.00%> (ø)
daq/network.py 95.37% <100.00%> (+0.01%) ⬆️
daq/report.py 98.41% <100.00%> (+0.03%) ⬆️
daq/tcpdump_helper.py 78.68% <100.00%> (+0.17%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de55ccc...55b2c02. Read the comment docs.

@grafnu
Copy link
Collaborator Author

grafnu commented Aug 31, 2021 via email

@henry54809
Copy link
Collaborator

The "direct_device_traffic" or "direct_port_traffic" in target_set_activate essentially does the same thing, which is informing the network of this device and sets up necessary ports/ acls. There can be some consolidations so the network setups per device aren't in 2 places.

So, the underlying code (in topology) needs the device, not just the gateway and port set... It maintains a data structure that links devices and gateways, and therefore ports, so it's not sufficient to pass just the port_set, it needs the device (and device.gateway) to function properly without major refactoring. So, basically the gateway ports only exist if there is a device that references that gateway. There's no provision for a "gateway without a device" It comes down to this function in topology.py: def register_device(self, device): """Register a device with the network topology""" LOGGER.info('Registering device %s port_set %s', device, device.gateway.port_set) self._populate_set_devices(device, device.gateway.port_set) self._ensure_gateway_interfaces(device.gateway.port_set) ... which, as you see, needs both device and device.gateway.port_set As for the name. I was intending the function to be one level up from that specifics about what it's doing now (setting up the network ports), which is why I went with registering. It's not necessarily allocating anything, as such, and I don't think it should be specific to ports. Basically, it's trying to say "prepare yourself for this device to do something" and then do all the necessary parts of that (which includes the ports, but might include other stuff). I could imagine changing register_device() to prepare_device() or prepare_for_device() if those make more sense?

On Tue, Aug 31, 2021 at 11:02 AM henry54809 @.> wrote: @.* commented on this pull request. ------------------------------ In daq/runner.py <#917 (comment)>: > @@ -814,6 +816,9 @@ def _create_gateway(self, device): gateway.set_tap_intf(self.network.tap_intf) try: + if device: + device.gateway = gateway + self.network.register_device(device) I don't think the "register_device" name is appropriate here. This function is really for setting the network up for gateway (which should be device independent). I think rename to self.network.allocate_gateway_ports (or something like that) and pass in the port_set is all that's needed here. Then you can revert line 688, 794-795 and 819-821. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#917 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIEPD6LDO4H2IEB5ITFRSDT7UKMBANCNFSM5DEPHPYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

@grafnu
Copy link
Collaborator Author

grafnu commented Sep 1, 2021

Ok yes -- good call... updated to use direct_device_traffic

The "direct_device_traffic" or "direct_port_traffic" in target_set_activate essentially does the same thing, which is informing the network of this device and sets up necessary ports/ acls. There can be some consolidations so the network setups per device aren't in 2 places.

So, the underlying code (in topology) needs the device, not just the gateway and port set... It maintains a data structure that links devices and gateways, and therefore ports, so it's not sufficient to pass just the port_set, it needs the device (and device.gateway) to function properly without major refactoring. So, basically the gateway ports only exist if there is a device that references that gateway. There's no provision for a "gateway without a device" It comes down to this function in topology.py: def register_device(self, device): """Register a device with the network topology""" LOGGER.info('Registering device %s port_set %s', device, device.gateway.port_set) self.populate_set_devices(device, device.gateway.port_set) self.ensure_gateway_interfaces(device.gateway.port_set) ... which, as you see, needs both device and device.gateway.port_set As for the name. I was intending the function to be one level up from that specifics about what it's doing now (setting up the network ports), which is why I went with registering. It's not necessarily allocating anything, as such, and I don't think it should be specific to ports. Basically, it's trying to say "prepare yourself for this device to do something" and then do all the necessary parts of that (which includes the ports, but might include other stuff). I could imagine changing register_device() to prepare_device() or prepare_for_device() if those make more sense?

On Tue, Aug 31, 2021 at 11:02 AM henry54809 @.
> wrote: _@**
.**_* commented on this pull request. ------------------------------ In daq/runner.py <#917 (comment)>: > @@ -814,6 +816,9 @@ def _create_gateway(self, device): gateway.set_tap_intf(self.network.tap_intf) try: + if device: + device.gateway = gateway + self.network.register_device(device) I don't think the "register_device" name is appropriate here. This function is really for setting the network up for gateway (which should be device independent). I think rename to self.network.allocate_gateway_ports (or something like that) and pass in the port_set is all that's needed here. Then you can revert line 688, 794-795 and 819-821. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#917 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIEPD6LDO4H2IEB5ITFRSDT7UKMBANCNFSM5DEPHPYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

@grafnu grafnu requested a review from henry54809 September 1, 2021 02:43
time.sleep(self._settle_sec)

def direct_device_traffic(self, device, port_set):
def direct_device_traffic(self, device):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is gonna break FOT. By setting the arg port_set to None, the function cleans up the remote_tap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed. Verified with the new fot tests that check for this case as well.

@grafnu grafnu requested a review from henry54809 September 23, 2021 06:07
@grafnu
Copy link
Collaborator Author

grafnu commented Sep 27, 2021

Updated to not clear directories if block mode is enabled. PTAL.

@henry54809
Copy link
Collaborator

I think by adding a run_retention and make the block_time work across restarts can achieve your intended behavior. block_existing isn't clear on what existing thing DAQ is looking for and what is being blocked. Setting block_existing also changes whether to delete existing run data is implicit and can cause confusion.

That's odd... because empirically it was not cleaning the directories. I'll investigate, but something is not WAI here. Regardless of that -- what behavior were you suggesting? The short solution to this problem is that the system wouldn't clean out run-* if that flag is set... On Mon, Sep 27, 2021 at 11:29 AM henry54809 @.> wrote:

It also clears out run-
directories. https://github.com/faucetsdn/daq/blob/master/daq/runner.py#L248 cleanup_previous_runs cleans out inst/reports, this is using inst/run- as a key for blocking runs. There is no "run directory," as such (there is a reports/ directory, and there are run-/ directories, but no singular run/). I don't know what you mean by "clear existing" and "blocked list"... Right now, the configuration and implementation work relatively straightforward and are sufficient for what I'm looking for. Specifically, "run the scan if run-MACADDR does not exist" -- what additional control are you looking for or proposing? What actual behavior or use-case are you describing? Note that this is NOT meant to be the full general-purpose solution, since that case will have more cloud-centric control and repository. This is just meant to be something that allows a CLI operator to do the needful. On Mon, Sep 27, 2021 at 11:07 AM henry54809 @.
> wrote: … <#m_-8783244528624129951_> block_existing won't work as intended considering everything in the run directory is cleared every time DAQ is restarted under cleanup_previous_runs. I think a configuration option for "clear existing" and "blocked list" to not test ones that are fully tested offers more control. Block_existing may also block devices with a run directory but aren't actually tested. block_time is so that a chatty device doesn't get tested more frequently than a non-chatty device. block_existing is to have an on-disk mechanism for doing device scans, so the process can be restarted without losing all state. Basically, it means "only test devices for which there is not already a result sitting on the disk somewhere" The problem was with S2 with ~1000 devices... I wanted to be able to run the system for a bit, and then possibly restart it, and over time form a complete version of all the devices there. Without this, it wouldn't have (device) coverage since it would keep testing the same devices each restart. … <#m-7326509685151770393*> On Mon, Sep 27, 2021 at 10:01 AM henry54809 @.*> wrote: @.** commented on this pull request. ------------------------------ In proto/system_config.proto <#917 <#917> (comment) <#917 (comment) <#917 (comment)>>>: > @@ -252,6 +252,15 @@ message RunTrigger { // limit on the number of active testing hosts int32 max_hosts = 5; + + // blockout time for tested hosts in sec + int32 block_time = 6; + + // block devices with existing run directories + bool block_existing = 7; What's the motivation for this? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#917 <#917> (review) <#917 (review) <#917 (review)>>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIEPD7KMEUXNYP7AHK6YS3UECPM7ANCNFSM5DEPHPYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#917 (comment) <#917 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIEPD7TX4FUSO3IK35U24DUECXHPANCNFSM5DEPHPYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#917 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIEPD74PFVOIKJM26YWXPTUECZXFANCNFSM5DEPHPYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

native_vlan: 122
max_hosts: 3
max_hosts: 10
block_time: 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and scan_interval should have a _sec postfix to match existing default_timeout_sec long_dhcp_response_sec etc. Also could there be another name for scan_interval? The name is very close to monitor_scan_sec.

@grafnu grafnu requested a review from henry54809 September 30, 2021 02:44
@grafnu
Copy link
Collaborator Author

grafnu commented Sep 30, 2021

Ok, renamed parameters and adjusted behavior!

Copy link
Collaborator

@henry54809 henry54809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about the usage of block time in non-single shot mode, everything looks good to go.

"""Discovers a host using arp-scan in a list of subnets."""
cmd = 'arp-scan --retry=2 --bandwidth=512K --interface=%s --destaddr=%s -s %s %s'
host, intf = self._get_scan_interface()
scans = 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think about another way of doing this but we can leave it for now. Scanning it 5 times can take a long time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to make it configurable. I'm actually not sure what the right value is, so setting it down to 2 by default, and then easy to bump-up later in config if necessary. Also now possible to disable!

daq/runner.py Outdated
self._blocked_devices.add(device.mac)
LOGGER.info('Target device %s in now blocked (%s)', device, len(self._blocked_devices))
block_sec = self.run_trigger.get('device_block_sec') or LONG_TIME_SEC
device.set_block(block_sec)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the block time should also be honored in non-single shot mode right? It looks like block time is only set here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@grafnu
Copy link
Collaborator Author

grafnu commented Sep 30, 2021

Fixed/addressed both comments (making arp_scan_count configurable).

@grafnu grafnu merged commit 5a5f46d into faucetsdn:master Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants