Wi-Fi roaming issues between access points

Hi everyone,

I live in a 100m2 apartment, old (1890) building. The area is quite crowded, so I ended up installing a pair of Ciscos PoE access points AIR-3802I, with a central controller (meaning they should be handing over devices nicely from one to the other).

The setup works fine for all my devices. My girlfriend can take iPhone calls on facetime and walk from one side to the other, and I can see in the dashboard of the wireless controller that a handover happened. All good.

The Librem 5, however, seems to only stick to the access point it initially connected itself to. If I forget networks, go to the kitchen, connect it there and then come to the other side of the flat, I have almost no signal (or no signal at all) and it just never connects to the office access point. Or, if I register it to the network in the office, I have no signal in the kitchen.

I am happy to troubleshoot (I can dump and analyze the Cisco logs) but I don’t know exactly what I should be looking for in the phone. Any tips?

1 Like

I can’t answer your question but I would start by wanting to know lots of things:

  • Whether those APs are dual band (yeah, I know I could look it up but …) and then whether the Librem 5 is configured to use one, the other or either band. By band I mean 2.4 GHz v. 5 GHz.
  • Do the APs have the same SSID? (Does a given AP have the same SSID on each band?) And likewise do the APs have the same security type and PSK? And same channel width?
  • Which WiFi card do you have in the Librem 5? This would seem to be particularly important.
  • Which WiFi standard is the Librem 5 using on each AP? (Which standards do those APs support? e.g. 802.11ac or ax or n or …)

Hello there,

  • I have three SSIDs: I have a 2.4GHz only for my vacuum cleaner and for my wireless lamp that only take 2.4GHz. I have a second SSID, 5GHz-only, for IoT, and a third SSID, 5GHz-only, for personal devices. The Librem 5 is connected to the third one.

  • I only configure the Cisco wireless controller, and it pushes the settings to the APs, so I believe it should be configured correctly. I can take a dump of the settings and compare them, but I find very unlikely that they devices would be configured differently.

  • I am using ac.

  • My phone has the Redpine. I am considering upgrading during the next monmths.

You know if you asked me 10 years ago that we’d have WiFi for my vacuum cleaner I would have thought you were nuts…lol.

4 Likes

Yes, good point. (Cisco @ home is a bit beyond my budget. No problem with Cisco @ work, having dozens of Cisco devices, when someone else is paying. :rofl:)

With my equipment, be careful with the above-quoted statement. SSID attributes are pushed out to all APs in the relevant group of APs but channel attributes are set per AP per band.

Your setup is reasonably similar to mine at home (other than that the make of APs is definitely different). I may be able to test this systematically in my environment in the next few days. That will at least give you another data point.

Important information from my end:
My access points are not forming a mesh. They are connected via a wired trunk and, as far as I can tell from the Cisco information, they exchange client handover information via the wired network (it’s visible in the logs).

This kind of access point is beyond my budget as well but luckily my former big corp employer dumps good equipment often when moving offices and my friends always help me get something.

Back to my original question, though: What are the relevant logs I should be looking for? I can easily roll my phone and AP logs and pinpoint what is going on. When I jump into the weed these days, I am mostly on FreeBSD so my Linux knowledge is quite rusty.

sudo grep -i wpa /var/log/syslog

I see similar behaviour. Disclaimer: the following is from limited testing.

I start off in the vicinity of AP#1 and the phone associates with it. I then walk away from AP#1 until I am right on top of AP#2. The phone remains stubbornly associated with AP#1 even though the signal strength of #1 is -63dBm while the signal strength of #2 is -31dBm (i.e, about 1000x stronger!).

Let’s say that now, through brute force, I manage to get it to associate with #2. All good then.

Now walk back in the other direction (away from #2 and towards #1 and even going beyond #1). #2 is now -70dBm and #1 is now -44dBm. Again, no change of association.

Try

nmcli device wifi rescan

Yep, that actually seems to work!

So I suppose you could bodgy something up using either the above nmcli command by itself periodically or using the iw command just to monitor signal strength and then force a rescan when your chosen conditions are met. Or of course you could make a desktop shortcut to run the nmcli manually and explicitly.

It seems like this kind of “background scan” should be automatic but it doesn’t seem to be at the current time.

I would guess that most Librem 5 customers only have one AP (so are not affected by this) but you would think that Purism employees at head office might encounter this issue.

That’s very interesting.

It’s the first device I’ve ever had with such issues, including Linux laptops on non-free and also ath drivers.

What is more interesting is… in theory (I am a sysadmin, not net engineer, and I am only a basic operator of Cisco IOS), as far as I understand, when the wireless controller is monitoring the signal from all clients, if there’s such discrepancy in signal levels, it should send a deauth packet/frame, which would cause the client to disconnect, re-scan and reconnect (please, if there’s any net engineer around and I am speaking out of my a** please correct me).

I never see such kind of event with the Librem 5. Even if I restart the phone, it seems to only ever associate itself to the original access point it initially connected itself to. I really must FORGET the network and rejoin. The best I could do was to create new SSIDs, one for each AP, only for the L5, and then I can just change networks when I go to the other side of the flat, but it’s not ideal.

I’m abroad for 3 weeks - when I return, I will test further and also order the newer wireless card and compare the performance. I will try also nmcli.

It’s beyond my level of knowledge but
a) that may be limited to specific more advanced APs (like Cisco), and
b) that is only one direction from which this process can be driven - the other direction is from the client side (which is in some respects easier because the client effortlessly sees the received signal power for each AP as received at the client’s current location from their beacon frames)

OK, that I don’t experience. I was able to get the Librem 5 to change AP without rebooting much less having to forget the network. It just wasn’t automatic.

I think that might also be a factor because some of my attempts to get this to work automatically resulted in “operation not supported” errors - and there’s always the hope that the operation might be supported on another card (i.e. a different driver).

That can happen if the BSSID is stored against the SSID and it is restricted to that one BSSID. (It is normal however for it to store a list of some of the BSSIDs that it has seen.)

Thank you for the feedback.

I have two quite decent USB Wi-Fi dongles. I just got the idea that I will hook them up to the L5 and see if they roam ok.

If it works, then it’s device-specific and I will bite a wireless upgrade for my phone and report back here in any case.

1 Like

I think you may be on to something but even more generically with WiFi handoff. I have noticed sometimes when coming home or work that the L5 shows full bars connected to wifi but in actual fact it’s not displaying the SSID at all it just says wifi and there is no network connection at all. Toggling the wifi then restores the connection and the SSID shows up, so I think what you’re experiencing can be applied to more generic situations when simply moving from one building to another with a big pause between.

I am visiting my relatives and the house has a single access point and am experiencing something similar.

It shows the wifi signal with the … instead of a full signal bar. When I open the wireless settings, it shows like the network is connected and that the network is 5.6 GHz (sic) instead of the 2.4 that it should be. Sometimes it opens the popup asking for the SSID password. In either case, if I tap connect there or just tap the network name, then it reconnects.

It’s being quite a pain these days because I don’t have a data connection here and my girlfriend calls me on Signal under Waydroid (it works fine). Of course, if wifi is down, I don’t get notifications or calls.

I thought it was related to suspend, but it doesn’t seem to be the case, because I disabled suspend when connected to power.

Tomorrow I plan to roll the logs and see what happens.

Hmm. That’s possible. My normal MO is that WiFi is “off” when I leave the house and hence I do effectively toggle it when arriving home and hence if the problem happens as you describe then I will never see it.

Depends what country your rellies are in. In many countries in the world the nominally 5 GHz WiFi band goes up almost to 6 GHz (and doesn’t actually start at 5.0 GHz either). So 5.6 GHz is a plausible frequency. You probably want to use e.g. the iw command to get the actual frequency being used for the SSID with which you are associated (as well as the other available frequencies).

I guess it also depends on how radio quiet the location of your rellies is.

I believe that there is some nmcli incantation if you want to prevent the 5 GHz band being used at all. (Some APs do “band steering” and that may be working against you if you intend to associate only at 2.4 GHz.)

The thing is…
This is one of these cheapo APs that have 2 SSIDs, one for 5GHz and one for 2.4. So I think it would not try 5.

Let’s hope the weekend will allow me some time to investigate.

1 Like

All right - so I just spent 3 weeks abroad, without any mobile data, relying only on Wi-Fi.

It went mostly fine. My phone is set to suspend after 1 minute and 9 out of 10 times, Wi-Fi would reconnect in a few seconds after waking up. I also managed to use Skype and Signal for audio calls within Waydroid when required.

Sometimes, however, wifi would just not reconnect. It would either give me the window asking for the network password (pressing connect would suffice) and other times I’d manage to get online only by rebooting.

My Wi-Fi setup was not special - just one of the cheap router+AP combos given by service providers. Default configuration out of the box (I didn’t have the password for that box, so I am not aware of the details of the configuration)

My father also got starlink in his farm, and I never had any issues there. I will return for another visit in October and I will hopefully have the newer wifi card installed to compare performance.

Meanwhile, as soon as I get the chance, I will try to find out what the matter is with roaming and collect some logs.

Just an update from me … I am now not sure that roaming isn’t working. I need to test some more.

I have some spare access points that I could ship over - not sure about EU-US import charges, if it helps troubleshooting.

1 Like

Thanks for the offer but the challenge is not a shortage of access points. It’s just getting the right information and interpreting what is going on and getting consistent results - and finding the time to do so.

In some respects I would do my testing with 2 access points - because that is the simplest case in order to test WiFi roaming. Having more access points could just make things less predictable and more difficult to interpret.

To be clear, I am in Australia (not the US) and I have no connection with Purism other than as a customer. I am interested to do this testing because if there is a problem with WiFi roaming then I may be one of relatively few customers who would have a problem with this at home.

Maybe we need a forum poll of Librem 5 customers for how many APs they have in active use at home. :wink: