Most https sites no longer work through wifi hotspot

I noticed a strange regression since a couple of weeks ago. Wondering if anyone else is seeing it and what might be the cause:

Most websites don’t work (unable to connect, specifically to complete TLS handshake) when I try to access them from a computer connecting to the Librem5 wi-fi hotspot (the Librem5 itself using a 4G connection). But all those sites load instantly when trying on the phone. When I was using it in July it was working fine…

Some websites like https://en.wikipedia.org works fine.
https://puri.sm does not (curl’s last messages are client hello (1) followed by STATE: PROTOCONNECT). A website I need for work (https://lora.lombardodier.com) goes a little bit further (client and server hello, change cipher spec, then client and server hello (2), and hangs there).

How to troubleshoot this? It fails the same way no matter which computer connects to the hotspot (tried Windows, Linux and Android)

Packet length (MTU) issue?

Hi irvinewade, thanks for your message. I looked but I think not? ip link show on my computer connecting to the home wi-fi shows exactly the same as when connected to the Librem5 hotspot:

...
wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DORMANT group default qlen 1000
...

Running that command on the Librem 5 when it’s in hotspot mode shows:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: usb0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether 36:4f:fa:c6:6c:a8 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
    link/ether 88:da:1a:7c:75:6c brd ff:ff:ff:ff:ff:ff
4: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff
7: wwan0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1464 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/none 

To be honest I don’t understand all those keywords but I’m guessing the relevant part is mtu 1500 on the wlan0 line so the MTU should be ok I think?

This is a wireshark capture on the laptop doing curl https://duckduckgo.com. Packet 155’s “previous segment not captured” seems to say something from the server is missing? (And packets 157 and later are probably not related, they happened four seconds later) :

Hum, so I reflashed my phone. And that did NOT fix the issue (I restored my home directory and reinstalled a couple applications that should not be doing anything related to networking).

So I decided to be a good linux user and find out how to configure connection sharing from the command line.

I found this tutorial.

Coup de théâtre! (how do we say that in English?) Before starting creation of a new connection profile I figured I’d try enabling the Hotspot using nmcli:

sudo nmcli connection up Hotspot

and everything is working as before! What is the GUI doing differently?

1 Like

Google Translate says: Spectacular turn of events!

Good work getting it going again. A really good Linux user would then dig into the code for the GUI to answer the final question. :wink:

Maybe “plot twist”?

Yes, “plot twist” sounds right. And turns out it was a double plot twist because it worked only once. I tried again last night, and this morning, and it no longer works :sob:

Sometimes I’m tired of shit that just won’t work. Since reflashing I can’t connect to Signal anymore (Not getting verification code with Axolotl, and signald-purple is not recognised by Chatty). And I can’t report issues to Purism because they don’t let people create accounts on their Gitlab instance. /rant

You can, but you have to email support first and request it. They do that to mitigate spam.

Assuming that it’s the same issue as you initially posted, then it does appear most likely to be an issue of packet size as @irvinewade suggested.

From the ip link show output from the phone the wwan interface has an MTU of 1464, from the wireshark capture the 10.42.0.167 machine claims an MSS of 1460, you won’t get a 1460 byte payload packet through an interface with an MTU of 1464. The max packet size will have been negotiated down based on 40.114.177.156’s claimed MSS of 1440 but you still won’t get a 1440 byte payload packet through an interface with a MTU of 1464.

The quick workaround would be to clamp the MSS value to the PMTU value as it flows through the forwarding chain.

1 Like

My original post may have been too terse - and it is difficult to know what level of knowledge each forum participant has.

There are three numbers here relating to packet size and usually they will all be very roughly the same.

MTU = Maximum Transmission Unit - the largest packet that can be sent between adjacent IP hosts - I would expect it to be equal to the link layer frame size minus the link layer overheads

PMTU = Path MTU - the largest packet that can be sent between the source IP host and the destination IP host - I would expect it to be equal to the minimum over the path from source to destination of the individual MTU values

PMTUs are discovered (learned; and remembered) and in principle can change in the lifetime of a TCP connection (and presumably are intentionally forgotten after a while too i.e. expired).

I probably should actually have written PMTU rather than MTU in my original post.

MSS = Maximum Segment Size - the largest TCP payload that can be sent on the TCP connection - should be equal to the PMTU minus the IP and TCP overheads (NB: overheads will differ between IPv4 and IPv6, but we appear to be using only IPv4 here)

Because VPNs introduce extra overhead, and for related reasons, the above is a slight simplification if a VPN is involved. (Is a VPN involved?)

Normally all of this takes care of itself.

I’m not sure of the exact incantations for overriding the MSS value (and I didn’t read the above-linked tutorial and I don’t know precisely what commands you used to set up the hotspot) but this link gives two example commands, one for forcing MSS to a specific value (could be useful for confirming the fix, if chosen conservatively, but is inflexible and could in theory break) and one for forcing the MSS to the value calculated from the PMTU (a better long term choice).

2 Likes

Thank you two for the very informative posts! I had encountered the concept of MTU before (and guessed what the meaning should be) but until today not the other two.
I haven’t yet had a chance to experiment but am going to. I didn’t actually do anything with nmcli yet, other than displaying information about pre-existing connections and bringing them up and down. From the Wikipedia pages I gather my problem might be that my mobile provider or someone else along the path is blocking ICMP packets, preventing proper calculation of the PMTU? Although regular ping gets through just fine…
There’s no VPN involved, I have the problem with straight https connections from the computer to some host like https://puri.sm, through the shared mobile connection.

BTW @Gavaudan I did request access to Gitlab and didn’t get it. Maybe my request got lost in the queue somewhere… I’m going to ask again.

Thanks again to all! I really appreciate the help.

That is one possibility. To be clear … blocking ICMP packets that contain error notifications and those errors are being sent to the sender i.e. to you. At a minimum, you would need a packet trace to see what, if any, ICMP packets are coming back to you - and you might want to do that on the phone as well as on the client computer.

Based on the information from you in the OP, I wouldn’t be blaming the mobile provider - because you said that a) it was a regression and b) it works when accessing directly from the phone. Right?

So you might wonder whether it’s the Librem 5 that is blocking (or, perhaps more accurately, failing to forward) the ICMP error packets.

So I don’t really understand what I’m doing but I tried this (on my laptop, command adapted from the iptables-extensions man page)

$ sudo iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1300
$ curl -v https://puri.sm/                                                                  
*   Trying 138.68.253.24:443...
* Connected to puri.sm (138.68.253.24) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: Connexion ré-initialisée par le correspondant in connection to puri.sm:443 
* Closing connection 0
* TLSv1.0 (OUT), TLS header, Unknown (21):
* TLSv1.3 (OUT), TLS alert, decode error (562):
curl: (35) OpenSSL SSL_connect: Connexion ré-initialisée par le correspondant in connection to puri.sm:443 

(five minutes elapsed between the Client hello and the next line).
Not sure the incantation was the right one though :slight_smile: “Connexion ré-initialisée” means “connection reset”.

I created this issue to ask for help from Purism people, as I finally got my account enabled

When working with iptables it’s beneficial to know at least how to check that the rule(s) you are applying are being accepted and applied correctly and, after sending some traffic through the system you’d expect to hit the rule(s), check that the rules are actually being hit.

The first point is particularly important at the moment as many distros are moving to nftables on the back-end and the iptables executable is not talking with iptables directly on the back-end but rather it’s operating as a cli front-end to nftables, translating the given iptables rule(s) into nftable rule(s) sets. I have seen quite a few occasions where the translation is being borked on and silently swallowed for all but the most basic of iptables rules, i.e. the rules are not being applied and no error or warnings are being shown.

The cURL output suggest that you are not getting a TLS response from the server. The Wireshark capture screenshot you supplied in the issue you created shows packets are getting re-transmitted, given the info/details you’ve given in this thread I would guess that the re-transmission is a result of packet fragmentation. I’m not aware of the finer details of TLS in this context so can’t comment with absolutes, but, I do see this or very similar symptoms quite often which leads me to think that some implementations just can’t deal with fragmented packets.

I think it would be better to determine what sort of ( P )MTU figure you actually have to work with rather than throwing arbitrary numbers around. You could try some basic testing, take a problematic site/server, one that will respond to ping, and ping it with the “don’t fragment” (DF) bit set, varying the size of the packet narrowing down on what size passes or fails. For example…

loki@sputnik:~$ ping puri.sm -c 4 -s 1465 -M do
PING puri.sm (138.68.253.24) 1465(1493) bytes of data.
From router.internal.com (192.168.3.12) icmp_seq=1 Frag needed and DF set (mtu = 1492)
ping: local error: Message too long, mtu=1492
ping: local error: Message too long, mtu=1492
ping: local error: Message too long, mtu=1492

--- puri.sm ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 59ms

loki@sputnik:~$ ping puri.sm -c 4 -s 1464 -M do
PING puri.sm (138.68.253.24) 1464(1492) bytes of data.
1472 bytes from static1.puri.sm (138.68.253.24): icmp_seq=1 ttl=50 time=153 ms
1472 bytes from static1.puri.sm (138.68.253.24): icmp_seq=2 ttl=50 time=153 ms
1472 bytes from static1.puri.sm (138.68.253.24): icmp_seq=3 ttl=50 time=153 ms
1472 bytes from static1.puri.sm (138.68.253.24): icmp_seq=4 ttl=50 time=153 ms

--- puri.sm ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 152.551/152.764/153.084/0.340 ms

In my case, I started with a packet size of 1465 because I know I have an MTU of 1492 on the router interface here and a packet size of 1465 puts the payload at 1 over MTU which I expected to fail, decreasing the size by 1 to 1464 passes as it results in a payload of 1492 which is equal to the MTU.

Ping implementations vary quite a bit, you should check and confirm the correct argument switches for ping as implemented on your system. Outputs from ping also vary so you may not see exactly the same output at your end.

If the numbers you provided previously are still accurate (an MTU of 1464 on the Librem 5 WAN interface), I would expect pinging from your hot-spotted client with a packet size of 1436 would pass, while 1437 would fail, in which case the TCPMSS should be set or clamped to 1424. What ever the value is determined to be, it should be set on the router (in this case the Librem 5) rather than at client level as you’ll no doubt encounter situations where you’ll have clients on the hotspot that don’t have any mechanism to change/clamp/set their TCPMSS.

So I ran curl https://puri.sm from my laptop connected to the hotspot, while tcpdump was running on the librem5 (listening on the wwan0 interface) and wireshark on the laptop.

This is the outcome:

The top half is the laptop, the bottom half is the phone.

However running the same curl https://puri.sm from the phone itself (while the hotspot is still active), tcpdump -i wwan0 produces this nice and happy dump:

Seems the phone is passing all packets it received on its wwan interface to the laptop. So things are getting lost elsewhere? (seeing that “previous segment not captured” bit). But if so, why does it work when I initiate the connection from the phone?

I could add that I tried ping -M do commands from the phone and from the laptop, to a host where the connection doesn’t work (puri.sm) and to one where it does work (en.wikipedia.org). For all four combinations, -s 1436 works and -s 1437 doesn’t. When I do -s 1437 on the phone it says local error: message too long, mtu=1464. When I do -s 1437 on the laptop it hum… OK NOW IT’S WORKING huh

$ ping -s 1437 -M do -c 4 puri.sm
PING puri.sm (138.68.253.24) 1437(1465) bytes of data.
From _gateway (10.42.0.1) icmp_seq=1 Frag needed and DF set (mtu = 1464)
ping: local error: message too long, mtu=1464
ping: local error: message too long, mtu=1464
ping: local error: message too long, mtu=1464

--- puri.sm ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3027ms

22:57 tendays@tofu:~ {1 job} $ curl https://puri.sm
<!doctype html><html lang="en-US" prefix="og: http://ogp.me/ns#"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1"><link rel="profile" href="http://gmpg.org/xfn/11"><link rel="me" href="https://social.librem.one/@purism"/><link rel="icon" href="https://puri.sm/wp-content/uploads/2020/04/cropped-purism-logo-rectangle-1-32x32.png" sizes="32x32"><link rel="icon" href="https://puri.sm/wp-content/uploads/2020/04/cropped-purism-logo-rectangle-1-192x192.png" sizes="192x192"><link rel="apple-touch-icon-precomposed" href="https://puri.sm/wp-content/uploads/2020/04/cropped-purism-logo-rectangle-1-180x180.png"><meta name="msapplication-TileImage" content="https://puri.sm/wp-content/up
...

From what I remember, when it doesn’t work the ping just doesn’t do anything, it just times out. Definitely no “frag needed” when the issue is happening.

Seems when I look too closely at the trouble it goes away haha.

The results of your test and your Wireshark captures are to be expected.

The short answer is, from the terminal of the phone enter the following…

sudo iptables-legacy -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

The above command should resolve your issues as it will rewrite the MSS value on all SYN packets being forwarded through the phone based on the MTU of the phone’s interface.

Should you wish/need to remove the rule it’s almost identical except change -A to -D …

sudo iptables-legacy -D FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

The phone and the laptop are not issuing exactly the same request, it’s clear in the Wireshark captures. Specifically look at the MSS value.

Your laptop is declaring MSS to be 1460 while the phone is declaring MSS to be 1424. Your laptop is determining it’s MSS value based on the MTU of it’s interface which is 1500, it knows nothing of the MTU of any interfaces further along the pipe. The phone is declaring it’s MSS value to be 1424 based on the MTU of it’s interface which is 1464.

pinging from your laptop without the -M do (the “Don’t Fragment” bit) ping is just handling the fragmented packets as needed so is to be expected that ping will work. I would have also expected ping from the phone without -M do to fail, maybe it has different defaults and DF bit is set by default.

2 Likes

@Loki thank you, it works!

So, I have two questions, first, why isn’t that clamp-mss-to-pmtu the default behaviour? And why is the issue happening only to me, and not to anyone else? Might it be because my wwan interface has had a smaller MTU than the phone’s wlan interface and most people have an MTU larger or equal to 1500 bytes on their wwan interface?

I just looked at it and turns out the wwan0 interface is back to an MTU of 1500 today! Hotspot works fine without using the iptables-legacy command, so pretty sure the thing that started the problem last July was my ISP changing the MTU down to 1464.

Also, the ping -M do command with large packet size run on the phone used to do nothing. The Frag needed and DF set would not appear, there would just be timeouts. When I tried on January the 25th, that message appeared, and all connections were working fine from that point. As you can see I ran a curl and it got the HTML as expected. That command was done on the laptop without the clamp-mss-to-pmtu option enabled in the phone. It was one of those 1% of the time where things were working…

I guess your ISP reserves the right to change the MTU on the WAN link at any time and without notice. (Most people would have no idea what it means anyway.)

Probably because if it were there would be lots of packet headers getting rewritten for no reason, achieving nothing other than incurring unnecessary overhead. The iptables rule I gave was provided more for simplicity, even if MMS values were required to be rewritten, with that rule there is possibility for unnecessary overhead.

There is some configuration being done under the hood when the Hotspot feature is being enabled, that configuration should probably be extended to account for such things as local interface MTU.

You are not the only person to experience this issue, directly with a Librem 5 or otherwise. There probably is a (small) number of people who could experience the issue but don’t use the connection sharing feature so are not seeing it. I know of one other person directly that had the same or very similar scenario. They ended up using a NetworkManager dispatcher script to add/remove configuration and rules on demand, it’s probably enough for you to make the iptables rule persistent (i.e. save the rules so they survive reboots). I should also probably mention that in a world of best practices that should you make that rule persistent it should live in the mangle table rather than in the forward chain directly.

You’ll probably find the MTU of your ISPs connection will vary depending on what they are doing and how traffic it’s being routed on their end. It may be back at 1500 today but it may drop down again at some point in the future.