Various connectivity stability papercuts (mostly Wi-Fi and ethernet)

I have an Evergreen Librem 5 with Redpine Wi-Fi that is now about 5 years old that I use as my main device. Whilst doing all firmware and OS updates over the years some paper cuts got solved, but there are also a few things that got worse over the years that I want to highlight since they are involve core functionality (connectivity):

  • With the Wi-Fi kill switch off, on boot the Wi-Fi adapter usually isn’t recognised. Enabling the kill switch and disabling again fixes that.
  • If I enable the hotspot for Wi-Fi tethering on 4G, after a few minutes to an hour of use the hotspot just ‘dies’. Seems like driver of firmware errors on the adapter (see also logs below).
    • At that point, toggling Wi-Fi or the AP just reverts the toggle. The wlan0 adapter still exists at this point, but NetworkManager just claims it isn’t available. A reboot is then necessary to fix it.
      • Buta reboot at this point just freezes on shutdown as something probably got so stuck that it can’t successfully terminate all services.
  • Sometimes the hotspot weirdly enough ‘stops’ forwarding some types of traffic (if that is even possible) - e.g. on my laptop connected to it pings kept working fine over data but DNS requests started hanging all of a sudden, which was fixed by rebooting the phone (toggling data on the Librem didn’t suffice).
  • Usually I have no ethernet adapter in the network manager whatsoever; the adapter isn’t even present in the device list. About half of the boots I do get the adapter, but it is greyed out and the phone doesn’t detect a USB-C cable being plugged in so I can’t use it for tethering.
    • Recognition used to work really well when the phone was younger (about 100% success rate), so unless the hardware is degrading this regressed somewhere along the years.
  • I have spotty 4G reception at my home but the Librem copes relatively well despite that (it certainly improved over the years). For some reason though to use mobile data I need to explicitly toggle Data Roaming off and back on after 4G reception was established or ping will just say I have no network connection.

I’m wondering if there are any solutions to this or, if not, if there are bug reports I can follow or if there are locations I can report these issues separately for follow-up.

Wlan0 Crash Logs

mmc1 timeout waiting for hardware interrupt
redpine_91x: rsi_core_xmit: Failed to queue packet
redpine_91x: rsi_core_xmit: FSM state not open
redpine_91x: rsi_interrupt_handler: ==> Firmware Status is 0xa4
redpine_91x: rsi_interrupt_handler: ==> ==> FIRMWARE Assert <==
...
nl80211 driver interface is not designed to be used with ap_scan=2
...
(Modem manager complains that APN for MMS suddenly can't be found any more.)
...
2 Likes

This appears to be a known problem, discussed a few times in this forum e.g. Wifi and Bluetooth Have Never Worked - #35 by choboDOC and further link from there.

I don’t know the details of this problem but I wonder whether the only full workaround is to upgrade the WiFi card from Redpine to SparkLAN, which may or may not be something that you would contemplate doing.

I would probably use ping -s NNN ... with progressively higher packet sizes to see whether the problem is not the type of traffic but instead the packet size.

It may help to clarify though what you are pinging and which computer is doing the ping. More generally, other questions relating to a hotspot configuration are confusing because it is not crystal clear to the casual reader which computer a comment relates to (since most of the terminology and most of the commands will apply equally to the client big computer - desktop or laptop - and the Librem 5).

What about tether via WiFi? I haven’t done a lot of tethering but I think that’s how I have done it.

1 Like

Is this the correct replacement modem for the Librem5?

2 Likes

Looks very much like it.

1 Like

Thanks for the reply.

This appears to be a known problem, discussed a few times in this forum e.g. Wifi and Bluetooth Have Never Worked - #35 by choboDOC and further link from there.

I don’t know the details of this problem but I wonder whether the only full workaround is to upgrade the WiFi card from Redpine to SparkLAN, which may or may not be something that you would contemplate doing.

I’m wondering, if toggling power fixes it and it’s a firmware issue, if the phone can’t automatically repower the Wi-Fi chip until it works at boot.

I indeed notice more recent batches have the SparkLAN chip, but it’s a company phone so replacing it would need some ACKs. I also can’t imagine it can’t be worked around since Linux supports so many different Wi-Fi chips with their quirks. I also saw a note in another thread that Redpine was planning to pick up mainline Linux support again a few years ago, so that brings some hope I guess.

I would probably use ping -s NNN ... with progressively higher packet sizes to see whether the problem is not the type of traffic but instead the packet size.

Good idea, I will try that next time, thanks.

It may help to clarify though what you are pinging and which computer is doing the ping.

Indeed, I see how this was confusing. The above was with my laptop doing the ping whilst connected to the hotspot of the Librem which was tethering that to 4G.

What about tether via WiFi? I haven’t done a lot of tethering but I think that’s how I have done it.

Yes, that does work, when it works, i.e. with the aforementioned issues of the AP and Wi-Fi on the Librem dying every so often :sweat_smile: .

When I can get the ethernet adapter to show up and the wired connection to be recognised (which is now almost never due to the above issues), wired tethering over USB-C is noticeably more performant for me though. It works out of the box through the GUI and I originally started investigating it because Wi-Fi was so unreliable, because I remembered how reliable the cable connection used to be :sweat_smile: .

1 Like

But you still need to say what you are pinging e.g. the Librem 5 on which interface or e.g. another client on the same hotspot or e.g. some device on the local network (I assume not, because this may not even be possible) or e.g. some host on the public internet (e.g. 1.1.1.1) or …

Yes, I understood that in making the suggestion. :wink: But let’s say that you can get the WiFi issues sorted out then tethering via WiFi is an alternative to tethering via USB.

I know nothing about the technical details of why this problem occurs. So if you want that info then you will probably have to contact Purism Support.

However your suggestion sounds very much like “not fixing the underlying problem” i.e. a workaround. As a hypothetical, if the probability of this problem could be reduced to 1/10 and if each attempt is independent then maybe a few toggles of the switch is an acceptable workaround, without going as far as cutting power in software.

1 Like

Plans do not equate to actions or results.

Thanks for the response!

But you still need to say what you are pinging e.g. the Librem 5 on which interface or e.g. another client on the same hotspot or e.g. some device on the local network (I assume not, because this may not even be possible) or e.g. some host on the public internet (e.g. 1.1.1.1) or …

Indeed with a hotspot the ‘local network’ is the network of the hotspot itself so I didn’t think to mention it explicitly, but it was 8.8.8.8 in this case, so a Google DNS server to see if you have basic IPv4 routing working. This ping, as well as DNS queries were happening from the laptop connected to the Librem hotspot.

In other words, to reiterate: the ping kept working, but other types of traffic such as DNS resolution just started hanging on the laptop, as if the Librem stopped forwarding them properly. This only occurred once or twice, though, the drops occur much more frequently.

I know nothing about the technical details of why this problem occurs. So if you want that info then you will probably have to contact Purism Support.

I understand. I usually frequent Linux communities so I think I incorrectly assumed this to be more obvious since I didn’t mention it explicitly :sweat_smile: , but my intent was also to ask: which issue trackers and which projects on there (e.g. firmware jail on Purism GitLab) do I need to report these issues to or should I look at to find other issues that might already mention these problems (that I can then follow or monitor)?

For example, I know Purism has a GitLab instance, and forks and patches to various projects, so I don’t really know where to start - I know e.g. modemmanager and NetworkManager are involved in the stack but these are relatively stable components used in desktop Linux as well, so I probably need to direct my bug reports somewhere to a Purism GitLab project, I just don’t know which ones.

I could also contact support but typically with FOSS projects the issue trackers are out in the open so others can follow along as well and it is likely someone already reported some of these issues so I don’t need to create more noise for Purism support :slightly_smiling_face: .

1 Like

Only with additional details of your network configuration (which obviously you have but noone here does).

Just for basic sanity, you should use route -n on the laptop to confirm that the default gateway was correct and remains correct. However if ping works but DNS does not then it is very likely that routing is correct and so you would need to move on to diagnosing DNS.

Am I to take it that you are using 8.8.8.8 as your DNS server? (That’s not great for privacy but from a technology point of view it’s OK.) So you can ping 8.8.8.8 but when using 8.8.8.8 as the DNS server, it does not work?

Have you confirmed your DNS server configuration on the laptop? resolvectl

(I have a feeling that in configs like this, the Librem 5 may hand out its own IP address as the DNS server and then do DNS on behalf of the tethered client - which would mean that you would need a working DNS server on the Librem 5 and for it to remain working. But you would need to confirm exactly which DNS server is being used by the laptop - or whether it is not using a recursive DNS server for its DNS requests at all and is instead resolving all requests directly with the authoritative DNS servers.)

What does “DNS requests started hanging” mean?

So an nslookup for some domain gives output

;; communications error to 8.8.8.8#53: timed out
;; communications error to 8.8.8.8#53: timed out
;; communications error to 8.8.8.8#53: timed out
;; no servers could be reached

or something similar to that?


Honestly, I think Purism is aware of the problem with the Redpine WiFi card not coming up and requiring repeated power cycles via kill switch in order for it to come up. But I haven’t looked to see whether there is an Issue recorded for that.

I can’t provide sufficiently qualified advice on how to use the Issue tracker. Just my opinion but I don’t think it is user-friendly enough for customer use. You appear to have come up against the same general challenges with it that I did.

1 Like

Am I to take it that you are using 8.8.8.8 as your DNS server?

I actually use the DNS4EU ones on my laptop, indeed for privacy reasons. I mainly use 8.8.8.8 for pinging purposes to check if IPv4 internet connectivity is up and running as an old habit.

(I have a feeling that in configs like this, the Librem 5 may hand out its own IP address as the DNS server and then do DNS on behalf of the tethered client - which would mean that you would need a working DNS server on the Librem 5 and for it to remain working.

I’ve been checking my logs and this indeed seems to happen. I however use systemd-resolved on my laptop and have overruled the DNS servers, which correctly seems to happen, however…

What does “DNS requests started hanging” mean?

… this made me recall that the errors in fact were DNSSEC errors. What I was trying to do was systemd-resolve www.duckduckgo.com on my laptop whilst on the hotspot, which just hanged for over a minute or so, after which I got DNSSEC errors similar to this one:

DNSSEC validation failed for question www.duckduckgo.com IN A: no-signature

I’ve been looking back in my system logs of that day and my logs are full of dozens of these with various domains at the time the problem occurred. Before that happened, my laptop reconnected to the hotspot due to a drop, got the Librem phone set as DNS server (10.42.0.1), but then correctly applied the DNS servers I overruled.

This might also imply there is a bug or race condition in systemd-resolved on my laptop around DNSSEC, though. I’d have to wait until I get the problem again (which was rare) to know for sure. If that is the case the Librem isn’t really at fault beyond borking the hotspot (so one less problem for the Librem :wink:).

Honestly, I think Purism is aware of the problem with the Redpine WiFi card not coming up and requiring repeated power cycles via kill switch in order for it to come up. But I haven’t looked to see whether there is an Issue recorded for that.

That would be good news at least. It worries me a bit that these areas of connectivity got worse with updates over the years instead of better since the inception, which makes me feel like they weren’t QA tested very well (despite it admittedly being nice that kernel updates did continue happening on the Librem). I understand the pace of updates or developer bandwidth has been somewhat slow lately, though, and has only recently been picking up again with Crimson.

1 Like