Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facts for default interface #29

Closed
danfruehauf opened this issue Mar 20, 2013 · 20 comments
Closed

Facts for default interface #29

danfruehauf opened this issue Mar 20, 2013 · 20 comments

Comments

@danfruehauf
Copy link

Hey there, I have 4 facts that are relatively useful and I thought of joining them to this repository instead of opening a new one with just the facts.

The facts they expose are:

  • defaultgw_gw_address - default gateway address
  • defaultgw_interface - main interface on machine (one that is used with the default gateway)
  • defaultgw_address - address of main machine interface (one that is used on the default gateway)
  • defaultgw_internet_address - our external address as the internet sees us (because we might be behind NAT)

I'll attach the code in a comment as it is not that long...

@danfruehauf
Copy link
Author

Facter.add("defaultgw_gw_address") do
        setcode do
                Facter::Util::Resolution.exec('ip route get 8.8.8.8 | tr -s " " | cut -d" " -f3 | head -1')
        end
end
Facter.add("defaultgw_interface") do
        setcode do
                Facter::Util::Resolution.exec('ip route get 8.8.8.8 | tr -s " " | cut -d" " -f5 | head -1')
        end
end
Facter.add("defaultgw_address") do
        setcode do
                Facter::Util::Resolution.exec('ip route get 8.8.8.8 | tr -s " " | cut -d" " -f7 | head -1')
        end
end
Facter.add("defaultgw_internet_address") do
        setcode do
                # google returns an error with our ip address, nice!
                # this is the original way:
                #Facter::Util::Resolution.exec('curl --connect-timeout 10 -s "https://www.google.com/search?q=ip+address" | grep -o "Client IP address: [[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+" | grep -o "[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+"')
                # this is smarter as we are caching the address in /tmp/.myip
                iptables_wan_external_ipaddress_tmp = "/tmp/.myip"
                command = 'test `cat ' + iptables_wan_external_ipaddress_tmp + ' 2> /dev/null | wc -l` -eq 1 || curl --connect-timeout 10 -s "https://www.google.com/search?q=ip+address" | grep -o "Client IP address: [[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+" | grep -o "[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+" > ' + iptables_wan_external_ipaddress_tmp + ' && chmod 400 ' + iptables_wan_external_ipaddress_tmp + ' && cat ' + iptables_wan_external_ipaddress_tmp + ' | head -1'
                Facter::Util::Resolution.exec(command);
        end
end

@danfruehauf
Copy link
Author

Hope this can make it into the package.

@adrienthebo
Copy link
Member

Thanks for your interest and this contribution!

Since these facts hit the network, they could theoretically take a long time to run. If a machine has a poor connection or no Internet connection, what do you expect would happen? Would it be possible for these to start badly misbehaving?

Secondly, the current implementation of this shells out and does a lot of logic in a command, which is difficult to test and if it breaks, it can be hard to figure out why. Would you be comfortable/willing to translate this to a more pure ruby implementation?

@danfruehauf
Copy link
Author

Happy to contribute!

Regarding your questions:

  • The first 3 facts do not access the internet but merely calculate
    routing paths.
  • The fact that actually acccess the internet (default_internet_address)
    would do so only if the address is not cached in /tmp/.myip. I'm planning
    to further improve so it polls every configured number of minutes/hours
    rather than cache it just once in /tmp/.myip indefinitely. Should come up
    soon.

Just ported everything and now it is fully in Ruby.

On Fri, Mar 22, 2013 at 5:34 AM, Adrien Thebo notifications@github.comwrote:

Thanks for your interest and this contribution!

Since these facts hit the network, they could theoretically take a long
time to run. If a machine has a poor connection or no Internet connection,
what do you expect would happen? Would it be possible for these to start
badly misbehaving?

Secondly, the current implementation of this shells out and does a lot of
logic in a command, which is difficult to test and if it breaks, it can be
hard to figure out why. Would you be comfortable/willing to translate this
to a more pure ruby implementation?


Reply to this email directly or view it on GitHubhttps://github.com//issues/29#issuecomment-15257257
.

@danfruehauf
Copy link
Author

Sorry for the few quick consecutive emails, but attached is now a version
which considers also the age of the cache file holding the IP address.

On Fri, Mar 22, 2013 at 12:46 PM, Dan Fruehauf malkodan@gmail.com wrote:

Happy to contribute!

Regarding your questions:

  • The first 3 facts do not access the internet but merely calculate
    routing paths.
  • The fact that actually acccess the internet (default_internet_address)
    would do so only if the address is not cached in /tmp/.myip. I'm planning
    to further improve so it polls every configured number of minutes/hours
    rather than cache it just once in /tmp/.myip indefinitely. Should come up
    soon.

Just ported everything and now it is fully in Ruby.

On Fri, Mar 22, 2013 at 5:34 AM, Adrien Thebo notifications@github.comwrote:

Thanks for your interest and this contribution!

Since these facts hit the network, they could theoretically take a long
time to run. If a machine has a poor connection or no Internet connection,
what do you expect would happen? Would it be possible for these to start
badly misbehaving?

Secondly, the current implementation of this shells out and does a lot of
logic in a command, which is difficult to test and if it breaks, it can be
hard to figure out why. Would you be comfortable/willing to translate this
to a more pure ruby implementation?


Reply to this email directly or view it on GitHubhttps://github.com//issues/29#issuecomment-15257257
.

@danfruehauf
Copy link
Author

Please disregard the last one, this should be the final version.
Last one had a little bug when /tmp/.myip didn't exist.

On Fri, Mar 22, 2013 at 1:06 PM, Dan Fruehauf malkodan@gmail.com wrote:

Sorry for the few quick consecutive emails, but attached is now a version
which considers also the age of the cache file holding the IP address.

On Fri, Mar 22, 2013 at 12:46 PM, Dan Fruehauf malkodan@gmail.com wrote:

Happy to contribute!

Regarding your questions:

  • The first 3 facts do not access the internet but merely calculate
    routing paths.
  • The fact that actually acccess the internet (default_internet_address)
    would do so only if the address is not cached in /tmp/.myip. I'm planning
    to further improve so it polls every configured number of minutes/hours
    rather than cache it just once in /tmp/.myip indefinitely. Should come up
    soon.

Just ported everything and now it is fully in Ruby.

On Fri, Mar 22, 2013 at 5:34 AM, Adrien Thebo notifications@github.comwrote:

Thanks for your interest and this contribution!

Since these facts hit the network, they could theoretically take a long
time to run. If a machine has a poor connection or no Internet connection,
what do you expect would happen? Would it be possible for these to start
badly misbehaving?

Secondly, the current implementation of this shells out and does a lot
of logic in a command, which is difficult to test and if it breaks, it can
be hard to figure out why. Would you be comfortable/willing to translate
this to a more pure ruby implementation?


Reply to this email directly or view it on GitHubhttps://github.com//issues/29#issuecomment-15257257
.

@danfruehauf
Copy link
Author

Just in case the attachments didn't make it through, the current new implementation (in ruby) can be found here:
http://nm-ssh.cloudapp.net/defaultgw.rb

@adrienthebo
Copy link
Member

@danfruehauf thanks for the port and answering my questions, and sorry about the delayed response.

So the cache in /tmp/.myip is really interesting because I did a bunch of work on caching facter values, and it's a bit of a pain. First, /tmp/.myip is a bit fragile. What about putting something in the Puppet $vardir/facter-cache or something like that? Secondly, what happens if two instances of Facter try to modify the cache file at the same time?

@danfruehauf
Copy link
Author

Thanks for the response.

No, /tmp/.myip is definitely fragile, what is $vardir/facter-cache? Does it actually exist? I relied on you to provide as well some code review. I'm not happy with /tmp/.myip.

If a few instances of facter try to modify the cache file at the same time - undefined behaviour. Possibly what I can do is change it so it writes to a tmp file and then overwrites the file (transactional), what do you say?

@danfruehauf
Copy link
Author

The new verion is more "atomic", as renaming files should be atomic (linux and variants, not windows)

The new version is here (same URL):
http://nm-ssh.cloudapp.net/defaultgw.rb

About proper caching, this is an open issue:
http://projects.puppetlabs.com/issues/14667

If you have a better solution, please go ahead (merely modifying a single variable). I never claimed it is 200% robust, but this fact is rather useful...

@danfruehauf
Copy link
Author

Hi Adrien,

Following up on this one, as I didn't get any response for the last 7 days.

@adrienthebo
Copy link
Member

@danfruehauf apologies for my lack of response on this! I have been up to my eyeballs in backlogged work across my projects so I'm not intentionally ignoring this; I'm just having trouble making time for everything. I'll try to take a few moments today to look at this more closely.

@wolfspyre
Copy link
Contributor

@danfruehauf
This is awesome! We do something similar at $work. and I'd like to hop in here and help make this go... I have some refactoring to do, which will integrate the things I'm doing as well.

I have some concern about the namespacing of the facts, but before I twiddle with them, are you dead set on the names you've used here, or are you flexible?

@danfruehauf
Copy link
Author

@wolfspyre
Namespace? Don't really care about the namespace.

This whole thing came here as I wanted to post it for the community to use but realized that it can complement an already existing project rather than opening a new one.

It's very simple facts, but very useful as well, at least in my opinion...

@wolfspyre
Copy link
Contributor

@danfruehauf
agreed! I hope to have something to submit back here in a few days.

@wolfspyre
Copy link
Contributor

@danfruehauf
What do you think of these names:

network_nexthop_ip => 10.12.99.1
network_primary_interface => eth0
network_primary_ip => 10.12.99.99
network_public_ip => 118.231.246.201

I've refactored them so that all the route/nexthop/interface information is collected sans network activity by getting the nexthop for to 0/0 and then parsing the output of ip route get on that host.

I rewrote the public_ip fact to use ip-echo.appspot.com which is fed off google's appengine.
This avoids the need to parse the return page from google, and saves bandwidth by transmitting less data in both directions.
I elected not to use the caching you had in here, since most environments aren't going to run facter more frequently than once or twice an hour.

I am not dead set against putting it back in, but I'm uncertain the benefit outweights the added complexity. Thoughts?

I'm just cleaning the facts up a little bit more and I'll put them in a PR.
Before I do though, do you have any objections?

@danfruehauf
Copy link
Author

@wolfspyre
In terms of names - sounds good, I'm agnostic about that.

I'd still go for google.com for getting the IP, the parsing takes no time (comparing to network latency) and google.com would usually be up, whilst other providers I might not trust as much. For instance, ifconfig.me was heaps easier to parse, but significantly slower.

About caching, I'd still go for caching because when I test puppet configuration I'd like it to be fast. True, on average server would access these facts only once every 30 minutes, but still, if nothing changed, why not cache it?

@wolfspyre
Copy link
Contributor

@danfruehauf

ip-echo.appspot.com is actually hosted on google's app engine: http://googcloudlabs.appspot.com/
In cursory testing at a handful of locations around the internet, a wget to google.com in contrast to a wget to ip-echo.appspot.com was a little bit slower in the US (granted only by fractions of a second) and ip-echo is not that far off in other areas of the world:

for site in google.com ip-echo.appspot.com; do echo ${site};time wget http://${site} -qO -| wc -c;done

HOME:
google.com
10794

real 0m0.186s
user 0m0.004s
sys 0m0.010s
ip-echo.appspot.com
14

real 0m0.148s
user 0m0.003s
sys 0m0.008s

NJ:
google.com
10778

real 0m0.124s
user 0m0.010s
sys 0m0.010s
ip-echo.appspot.com
15

real 0m0.097s
user 0m0.010s
sys 0m0.020s

HKG:
google.com
11024

real 0m0.041s
user 0m0.001s
sys 0m0.005s
ip-echo.appspot.com
15

real 0m0.191s
user 0m0.002s
sys 0m0.001s

SJC:
google.com
10762

real 0m0.159s
user 0m0.001s
sys 0m0.003s
ip-echo.appspot.com
15

real 0m0.146s
user 0m0.001s
sys 0m0.003s

LON:
google.com
10858

real 0m0.119s
user 0m0.000s
sys 0m0.005s

ip-echo.appspot.com
15

real 0m0.142s
user 0m0.002s
sys 0m0.001s

add on top the extra processing time of the headers of the wget, and I'm still pretty convinced that ip-echo's going to be the more efficient solution more often than not. As you can see, the output of a wget to google.com is actually a little over 10kb, whereas the output from ip-echo is 14-15 bytes.

I'm using open-uri and timeout to add in a 1 or 2 second timeout so worst case this won't ever hold things up even on isolated systems for >2s. The reason I'm averse to the caching here, is I think it introduces fragility but doesn't bring any benefit. What situation do you see the caching helping?

@wolfspyre
Copy link
Contributor

#35

@danfruehauf
Copy link
Author

Pretty intensive testing. Difficult for me to argue with that.

I'd still go for some sort of caching as I'd like the fact to stay viable all the time. But still, looks solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants