-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootcode.bin randomly doesn't PXE boot correctly. #764
Comments
You're using a proxy server for the DHCP proxy and TFTP boot, do you also have a standard DHCP server as well replying with an IP address? From the output it looks like you've got everything set up correctly (option 43 looks fine), it will only continue if it finds both an option 43 which tells it that the dhcp server is also going to serve the files and it has an IP address. From the information above no IP address has been offered. Gordon |
My ADSL router is a DHCP server, yes. |
So can you dump that DHCP response as well? |
I ran "sudo tcpdump -A -i eth0 port tftp or port bootpc or port bootps or port 546 or port 547" I did not capture any DHCP requests from the pxeserver to the main router. Output was the same as before. |
To debug this what I would do is to use the managed switch I have on my desk to mirror all traffic to a port with a Raspberry Pi on it... I'm wondering whether it's a STP problem with the switch in the router? One thing we're going to do soon is to add the ability to provide serial debug from the bootcode, which should help with this... Gordon |
I have a dump from the router itself now: non working:
|
Someone just turned on a Windows laptop on the network and now it started working correctly: dnsmasq:
router, tcpdump -i br0 port tftp or port bootpc or port bootps or port 546 or port 547:
pxeserver, tcpdump -A -i enp0s31f6 port tftp or port bootpc or port bootps or port 546 or port 547:
|
Also I can run "brctl showstp br0" on my router. It shows a lot of information but I don't really know what I am looking for. |
It looks like a problem we had before where we needed a broadcast packet to trigger the receiving of one of the packets (which is why turning on the Windows machine suddenly starts it...) Be interesting to see if the receipt of the second DHCP reply was triggered by a broadcast packet... |
That is probably it. The Windows machine is spamming a lot of autoconfig junk continuously on both IPv4 and IPv6. The Pi and the pxe server are both connected directly to the router switch, but the windows machine is behind a second (unmanaged) switch. None of the machines are on wifi, but the router is bridging ethernet and two wifi radios. |
I might have to see if it's possible the USB->ETH bridge is holding onto a packet for some reason, or that I've dropped a packet... But can't understand why this would be... In the past we've found doing an occasional broadcast ping will fix the problem... |
I found it not working again this morning. Broadcast ping brought it to life:
|
I'm seeing this roughly one time in 3 when I network boot my RPi 3, tcpdump from the DHCP server:
I find it can take several power off reboots of the RPi before it works correctly. When it does, it only makes DHCP request once. I don't have any network switches on my home network which have mirror ports. |
@puck Is your test with the bootcode.bin as a single file on the SD card? |
Hey @ghollingworth, no I'm networking booting and loading it from a TFTP server. I have no SD card in my RPi 3. |
I've realised I can make a poor man's wire tap using a laptop and two USB ethernet dongles. If you'd like a traffic capture, I should be able to do that tonight. |
@ghollingworth putting bootcode.bin (only) on an SD card makes the boot process reliable - I rebooted about 10 times with no failures. However, it doesn't look for any files inside the serial number directory now, first boot hung because I had them all in the sub directory.. I had to symlink all the other files into my TFTP root for the boot process to succeed. |
@puck That is what I would expect to see from a bootcode.bin not built in the last week. Try a more recent one: https://github.com/Hexxeh/rpi-firmware/blob/170150d2210a3bb1801ae165d54794101f28fc54/bootcode.bin |
Heh, yes, I just find bug #754 and tested again with the newest bootcode.bin - it now works with the serial directory. I was hoping to update this issue before someone responded. ;) |
Sorry - you caught me on a good day. ;) |
is there a workaround apart from putting bootcode.bin on a sd-card? For me it also looks like the Pi is trying several times to get an DHCP offer, but it always discards the reply and never reach out for the tftp server. Could this be an DHCP implementation issue? Broadcast happens a lot on my network. |
If you're using bootcode.bin (and only that) on an SD card then it is using the fixed version of the code... If it is ignoring the reply then that will be because the offer doesn't contain the TFTP server address in a suitably understandable manor. Can you tcpdump (or wireshark) the reply? Thanks |
Here is a tcpdump, maybe there is a option missing, hopefully. BOOTP/DHCP, Request from b8:27:eb:eb:cc:6e, length: 320, hops:1, xid:0x26f30339, flags: [none] 16:02:37.431011 00:50:56:b7:34:a8 > 00:00:5e:00:01:0b, ethertype IPv4 (0x0800), length 342: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: UDP (17), length: 328) 10.11.4.11.bootps > 10.11.108.4.bootps: BOOTP/DHCP, Reply, length: 300, hops:1, xid:0x26f30339, flags: [none] |
Looks to me like you've got the serial and client on different subnets, the Raspberry Pi bootrom doesn't support this. The bootcode.bin option does though |
Yes, works with single bootcode.bin |
This still does not work for me. With latest bootcode.bin the results are exactly the same: dnsmasq recieves the request and sends the response, and the Pi ignores it, five times, then it stops. |
Checking in here after experiencing exactly the same problem with brand new pi3 with latest firmware update here https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=191778&p=1203002#p1203002 Reading through this issue it appears that it's not solved unless an SD card with |
@andig in my case it does not work reliably even with updated |
DHCP-reply-delay is required in some cases because there is a bug in the silicon such that if it receives both the DHCP reply and the tftpboot server address in less than 2 seconds then the device will lock up forever. bootcode.bin does not suffer from this problem But it's possible something else is going wrong or the packets are being dropped by the switch |
I have made a 30 raspberry pi setup. One 24 port PoE enabled managed switch (ES-24-250W), and two unmanaged PoE switch (TP-Link TL-SG1008P, TP-Link TL-SF1008P). I have a raspberry pi acting as master:
And I have 29 raspberry pi 3, without any SD cards. So they boot from network entirely, and gets its power from PoE. All the raspberry pi have the official touchscreen. (even the master) I can switch on and off each port of the managed PoE switch. Essentially the exact same as pluggin in and unplugging the cable. Here are my experiences (22 raspberry pi clients, 1 raspberry pi server, 1 laptop):
If the screen is on, the raspberry pi consumes between 5.5-7W.
Almost all raspberry pi boots up fine. Worst case was 20/22. The timing is as follow: I'm on this problem (unreliable starting) on a week now. I'm generating ICMPv6 packets, because I believed this helps. Because if I start a computer or a raspberry pi with sd card, then all the others raspberry pis starting up more likely. I'm now optimizing boot time from nfs. Disabling all services which is not needed. But the hard part was figuring out the unreliability. I think a pointer on the official site, or the netboot tutorial would be more appropriate, rather then the rather vague "packets on the network helps booting up".
phrase is totally misleading and wrong. If a raspberry pi does not start (rainbow picture) after 20 sec, 99% sure it will never start at all no matter how patient you are. When I started this adventure, I was totally unaware how untested this codepath is.:( |
@bunyevacz Thanks for the thorough analysis! I think I’m seeing the same issue here. Are there any news on this matter? Were you able to improve your setup in the meantime, or is the procedure you’ve outlined (dhcp-replay-delay + mechanism to power-cycle non-booting Pis) still the best option? |
It would be interesting to know if the latest version of bootcode.bin helps things, I've found it has made my netboot RPis boot up much more reliably. Sadly, you do need to have it on an SD card in each RPi. |
Hi @puck, I just tried this (formatting an SD card with FAT32 and putting the latest bootcode.bin onto it) and I can confirm it works much better. I was able to boot up the Pi 5 times in a row from network which never worked before. Do I understand this correctly that an older version of the bootcode.bin is burned into the Pi’s chip where it cannot be updated? So when a new Pi model is released some time in the future, can we assume this will have a newer bootcode burned into the chip which will solve this issue, so I can get rid of the SD card? |
Hey @aaronk6, Good to hear that you had a positive result! Yes, your understanding is correct. I'm not sure if the version of the bootcode.bin that is burned into the ROM is updated during each RPi models lifetime or not, that'd certainly be interesting to know. I've considered that a SD card which only contains the bootcode.bin is much less likely to be corrupted (only read on boot), and even if it is corrupted, I haven't lost anything, to make it something I've not stressed about having in my netboot RPi's. |
Hi @puck, yeah, the “bootcode.bin SD card” is definitely a workaround I can live with for the time being. As you say, when the card is rarely read from and never written to, it’s shouldn’t suffer from corruption any time soon, and even if it does, it’s very easy to replace with a spare SD card. Anyone here reading this who knows about the rollout procedure for the boot code in ROM? |
Changing the on chip ROM requires a respin of the chip, I think just the metal layer but not sure, and that is only done when a new chip stepping is produced, and this is very rare. It's very expensive. So as you will have seen from Pi history, it only really changes when a new major model with a new chip comes out, so 1->2, 2->3 3->3b+. In the 3B->3B+ case, the change was from an A0 to B0 chip. It would not be worth producing a new chip just to change the boot code because of the cost. (est. $500k at least IIRC) |
@JamesH65, thanks for the insights! So I guess I’ll re-try with the Pi 4 some time in 2020 🙂 |
Did anyone try whether things have improved with the Raspberry Pi 4? I don’t have mine yet. |
Pi 4 doesn't network- (or USB-) boot yet, and when it does it won't use bootcode.bin, so it's off-topic. |
I am using the latest version of bootcode.bin: https://github.com/raspberrypi/firmware/blob/f85646a8831d9579c2a745478149598da1ecfde5/boot/bootcode.bin
It is the only file on my SD card. I am using a Raspberry Pi 3.
Sometimes (but rarely) PXE boot works and sometimes it does not. I have to power cycle the Pi several times to make it boot.
Looking at the failed tcpdump log you can see that dnsmasq is replying to the boot request, but the Pi ignores it and sends another, for a total of 5 requests. Then it tries to request tftp files from 0.0.0.0.
dnsmasq log of failed session:
tcpdump port tftp or port bootpc from failed session:
The text was updated successfully, but these errors were encountered: