Librem 5: Unexpected hard resets when on high load

Hi everybody,

I am using Librem 5 for a long time. Since the beginning I experienced unexpected freezes and hard reboots when the phone was tasked with something hard (like accidentally opening unusable Firefox while doing something else). I initially attributed it to low amount of RAM and out-of-memory conditions. Or maybe a faulty MicroSD where I placed an encrypted swap in addition to ZRAM configured by default.

Some time ago a Linux kernel update arrived which dramatically improved performance and made these hard resets much more rare. As mentioned in this post, now the kernel “never to swap out a page that was touched in the last 1 second”. Maybe this nonstop swapping of RAM pages and unswapping them next split second was what triggered these resets.

Nevertheless, I recently still stumbled upon a way to reproducibly trigger hardware reset. I was moving Flatpak installation path to MicroSD card as described here. I started removing apps from “system” “installation” one by one and installing them to a new “installation” on a MicroSD. Each time I started flatpak install ... there was a hard reset after some work was done.

By the way, the same behavior was also before the kernel update and adding a new flatpak “installation”. But back then I did not pay attention because resets were all the time.

After a few such reboots I created a cgroup, set read and write limits on a block device mounted from MicroSD through the io.max mechanism and started flatpak install ... again inside this cgroup. I also temporarily disabled swap on the MicroSD. This time it was somewhat long, but if finally completed without hard resets.

Has anyone else experienced something like that? It is long past the warranty period, so I just want to know whether this is a universal problem with Librem 5 or maybe it is just a single device with faulty hardware.

Also, I am not sure whether hard resets are really linked to high IO on MicroSD. Maybe it is a high load on something else. Determining the exact cause will hopefully help me to find out what exactly should I limit to completely avoid unexpected hard resets.

And the last question: Does anyone know how to determine from which hardware batch is my Librem 5? I am still unsure.

3 Likes
3 Likes
  1. Have you looked at the log files when there are hard resets?

  2. When you “temporarily disabled swap on the microSD”, did you disable swap completely? Did you also disable ZRAM (e.g. “sudo swapoff -a” which would disable /dev/zram too ) ?

  3. How was your ZRAM set up? Some people in this thread Why and how to extend ZRAM on L5. A revolution for Librem 5 stability! recommended having ZRAM much greater than RAM … which could result in overcommits ; i.e. if compressed RAM exceeds actual RAM … there will be a problem).

3 Likes

1. Logs

I have looked logs. Here are my findings:

  1. A lot of redpine_91x errors throughout the logs. There are 5663 such
    errors in 6 and a half hours (approx. 14.5 such errors per minute).

    20:30:37 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:37 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:46 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:46 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:46 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:47 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:47 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:48 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:48 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:54 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:54 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:57 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:57 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:59 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:30:59 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:31:00 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:31:08 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    20:31:08 pureos kernel: redpine_91x: Packet Dropped as Key ID not matched with both current and previous Key ID
    

    There was also 9 instances of the following error:

    22:21:46 pureos kernel: redpine_91x: Packet Dropped as RX PN is less than last  received PN
    
  2. There was 6 of boots which did not have expected pureos systemd-journald[472]: Journal stopped before them. 4 of them had
    messages like this right before (or several seconds before) sudden reboot.

    22:47:57 pureos flatpak[2503]: libostree pull from 'FlatHub' for runtime/org.freedesktop.Platform.GL.default/aarch64/23.08 complete
                                          security: GPG: summary+commit 
                                          security: SIGN: disabled http: TLS
                                          delta: parts: 1 loose: 41
                                          transfer: secs: 136 size: 150,0 МБ
    -- Boot 6679b777f1cc4ab384b20303a3ef1b0b --
    
  3. Kernel panics also happen from time to time, but there was none
    during me trying to install something with flatpak.

    pureos kernel: ------------[ cut here ]------------
    pureos kernel: WARNING: CPU: 3 PID: 61429 at net/mac80211/scan.c:427 __ieee80211_scan_completed+0x2bc/0x320 [mac80211]
    pureos kernel: Modules linked in: sch_ingress af_key nfnetlink_log xfrm_user xfrm_algo xfrm_interface xfrm6_tunnel tunnel6 udp_diag veth iptable_mangle iptable_nat iptable_filter aes_ce_ccm algif_hash algif_skcipher af_alg rfcomm xt_MASQUERADE bridge stp bnep usb_f_acm u_serial usb_f_ncm u_ether ip6t_REJECT nf_reject_ipv6 nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_CHECKSUM xt_tcpudp nft_compat nf_tables libcrc32c nfnetlink binfmt_misc st_magn_i2c st_sensors_i2c st_magn st_sensors st_lsm6dsx_i2c st_lsm6dsx_spi st_lsm6dsx option qmi_wwan cdc_wdm usb_wwan usbnet usbserial mii caam_jr caamhash_desc caamalg_desc crypto_engine ledtrig_pattern redpine_sdio redpine_91x bluetooth mac80211 hantro_vpu snd_soc_simple_card v4l2_vp9 snd_soc_imx_hdmi snd_soc_simple_card_utils cfg80211 snd_soc_gtm601 v4l2_h264 v4l2_mem2mem snd_soc_hdmi_codec videobuf2_dma_contig videobuf2_memops leds_lm3560 v4l2_flash_led_class videobuf2_v4l2 dw9714 v4l2_fwnode v4l2_async mousedev videobuf2_common
    pureos kernel:  videodev vcnl4000 industrialio_triggered_buffer snd_soc_wm8962 kfifo_buf mc gnss_mtk snd_soc_fsl_sai gnss_serial gnss snd_soc_fsl_utils imx_pcm_dma caam error snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer imx2_wdt rfkill_hks watchdog snd soundcore rfkill imx_rproc leds_pwm_multicolor led_class_multicolor libcomposite ledtrig_timer fuse zram ip_tables x_tables ipv6 autofs4 uas usb_storage mtdblock mtd_blkdevs overlay ofpart xhci_plat_hcd xhci_hcd spi_nor mtd dwc3 ulpi udc_core aes_ce_blk cdns_mhdp_imx crct10dif_ce usbcore pwm_vibra ghash_ce cdns_mhdp_drmcore sha2_ce sha1_ce phy_fsl_imx8mq_usb usb_common drm_display_helper imx_dcss bq25890_charger edt_ft5x06 tps6598x typec clk_bd718x7 roles snvs_pwrkey imx_sdma virt_dma [last unloaded: hi846]
    pureos kernel: CPU: 3 PID: 61429 Comm: kworker/u8:4 Tainted: G        W          6.6.0-1-librem5 #1
    pureos kernel: Hardware name: Purism Librem 5r4 (DT)
    pureos kernel: Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
    pureos kernel: pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pureos kernel: pc : __ieee80211_scan_completed+0x2bc/0x320 [mac80211]
    pureos kernel: lr : ieee80211_scan_work+0x15c/0x6b8 [mac80211]
    pureos kernel: sp : ffff80008b2cbc80
    pureos kernel: x29: ffff80008b2cbc80 x28: ffff00000c630900 x27: ffff8000816f6000
    pureos kernel: x26: ffff000035f18b80 x25: 0000000000000000 x24: ffff000001105e05
    pureos kernel: x23: 0000000000000000 x22: 0000000000000000 x21: ffff00000c631b30
    pureos kernel: x20: ffff00000c630900 x19: 0000000000000000 x18: 0000000000000000
    pureos kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    pureos kernel: x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
    pureos kernel: x11: 7f7f7f7f7f7f7f7f x10: feff636d746e616d x9 : ffff80007a268c74
    pureos kernel: x8 : fefefefefefefeff x7 : 000000000000000f x6 : ffffffffffffec00
    pureos kernel: x5 : 755f73746e657665 x4 : ffff00000c631b50 x3 : ffffffffffffece8
    pureos kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
    pureos kernel: Call trace:
    pureos kernel:  __ieee80211_scan_completed+0x2bc/0x320 [mac80211]
    pureos kernel:  ieee80211_scan_work+0x15c/0x6b8 [mac80211]
    pureos kernel:  cfg80211_wiphy_work+0xbc/0x108 [cfg80211]
    pureos kernel:  process_one_work+0x18c/0x428
    pureos kernel:  worker_thread+0x338/0x450
    pureos kernel:  kthread+0x120/0x130
    pureos kernel:  ret_from_fork+0x10/0x20
    pureos kernel: ---[ end trace 0000000000000000 ]---
    pureos kernel: ------------[ cut here ]------------
    

2. ZRAM

No, I did not touch ZRAM. It was not disabled and I did not change its configuration.

pureos kernel: Adding 5941244k swap on /dev/zram0.  Priority:100 extents:1 across:5941244k SS
1 Like

It looks like you’ve got 6GB of ZRAM. That’s probably (assuming you have a non-Liberty version) about 2 times actual RAM. If the stuff in RAM is not very compressible that can result in over-commitment which could lead to a panic. I’m assuming Purism set up the ZRAM. However, I will note that on most systems the default ZRAM is 1/2 the actual RAM total rather than 2 times the RAM total. [Edit: Are you sure you didn’t touch ZRAM? I should note that according to defaults: Enable zram using systemd-zram-generator (!312) · Merge requests · Librem5 / librem5-base · GitLab … when Purism enabled ZRAM it was also set to have ZRAM equal to 1/2 the actual RAM.]

Also: Are you sure you didn’t temporarily disable the ZRAM? You said that you “temporarily disabled swap on the MicroSD”. If you disabled swap with a “sudo swapoff -a” instead of directly specifying the device … that would have disabled ZRAM.

In regard to logs … I was mainly looking for kernel panics or OOM killer messages. It sounds like you didn’t encounter kernel panics in your most recent freezes. I will say that other than broken hardware (e.g. RAM) I haven’t had a kernel panic outside of OOM issues —> a kernel panic is something that should generally be tracked down.

My main speculation is that the way that flatpak deals with packages results in a problem with ZRAM over-commitment. flatpak packages are compressed and, as a result, will not compress further when in memory —> what I don’t know is whether flatpak will try to load a full compressed package into memory (I doubt it, but I don’t know the inner workings of ostree). This would result in having issues with ZRAM
on.

3 Likes

I really wouldn’t put swap on a SD card, it’s slow and sounds like asking for troubles.

The current default on Librem 5 is to use 2 times the RAM total. Filling the RAM up will trigger OOM killer rather than hard resets.

2 Likes

I’ve had very bad luck with the OOM killer. I’ve had it take out crucial systems even when there was plenty (e.g 25GB left) of swap left. And one other time, while it wasn’t a kernel panic, I had my system freeze where not even REISUB did anything.

1 Like

Yes, I checked logs, the ZRAM was not disabled.

Yes, at least they were not recorded into logs. There is a possibility that there was a kernel panic, but it did not have a chance to be recorded into log file on a filesystem.

Yes, I also think that swap on a SD card is unreliable. When I added it, the phone for me was mostly unusable because of low RAM which supposedly provoked unending swapping-unswapping cycle. It was an act of despair. The new kernel update helped a lot, so I’ll try to disable swap on SD for some time and see if there will be any new hard resets. Maybe there was a memory corruption before I disabled swap on SD :sweat:.

Also, I have just tried another thing which previously always triggered hard resets after short time: connecting a display through hub. Of course, after disabling swap on SD card in /etc/fstab and rebooting. Now it is miraculously working as intended. It was something I was trying and was not able to do for years. :partying_face:

I’ll mark your (@dos) answer as solution.

1 Like

5 posts were split to a new topic: Microsd card quality