Is anyone else experiencing freezing issues with Librem 15 v3?

I am also experiencing periodic system freezes that require restart using the power button.
It is not firefox related. I have had it happen even when I only use the terminal.

I have found that the last message on the system log is:

Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!

repeated over and over.

I’m using kernel 4.12.0-2-amd64

I guess your terminal or wayland compositor might be using opengl / graphics acceleration so it’s hard to tell. I’ve never had a crash here with plain ol’ X11, wmii and urxvt. It only starts when I add mpv or firefox or something that will be using more graphical features.

@kakaroto: Tried today’s master (4.14-rc2) and an old 4.4.88 (latest stable in 4.4 branch); neither seem to change much.
I wanted to try further but can’t boot into 3.10 on this system as the default formatting for pureOS is ext4 with some non-backwards-compatible flags… I’ll try with some live OS maybe later but not convinced at this point, the librem 15v3 didn’t exist back then and 3.10 will likely just not have skylake igpu support.
The “good” thing is that both had neat crashes in i915 driver, I’ll just post to intel-gfx@lists.freedesktop.org with these trace and see if I can get anything out of it. Also noticed some “strict debug” options while compiling, I’ll turn them on now, maybe that’ll get me neater stacks.

Hummm… I didn’t experience this problem on my librems, but I use them mostly for testing/debugging, not for everyday-use, so that’s probably why I didn’t trigger the error.
Do you have the crash log from the i915 driver? It might not be the driver itself, but a combo between the driver and wayland. I wonder how easy it would be to switch your OS to using X instead of wayland.
A quick search gave me this : https://bugs.freedesktop.org/show_bug.cgi?id=100181
It’s possibly the same bug you’re experiencing, and it seems to have been fixed in wayland itself, could you check which version of wayland you have ?

Hmm, actually good point. I’ve already had switched to X11 (I’m using wmii, a lightweight tiling WM ; working on sway as wayland replacement but it’s not good enough yet for me), so I didn’t try much on wayland.
Even when I had, I had left mpv to its default so using the opengl/x11 backend (sigh, mpv…).
I’ve just switched back to gnome-wayland and tried forcing opengl-output=wayland however to no difference.
It was worth a try, though!

This also got me tempted to try without output (–vo=null) to disable opengl usage, and it looks like there is no hang so we can say it’s something to do with opengl and not about the decoding part of ffmpeg/mpv. I’ll re-run that longer tonight to confirm.

As for the traces, well, it’s hard to say. I had actually mistaken the boot warning stack (something about not enough wires for DP? will post when I get home if it’s not obvious) on the 4.12 kernel, so only the old 4.4 kernel had an i915 stack and frankly I don’t want to barge in on the list with such an old kernel report, so this weekend I’ll post what I have anyway without that one.

What I see basically looks like memory corruptions. This morning’s logs are actually quite good, starts with something random:

Sep 29 08:03:58 fenrir kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000246
Sep 29 08:03:58 fenrir kernel: IP: __list_del_entry_valid+0x29/0x90
Sep 29 08:03:58 fenrir kernel: PGD 0 P4D 0
Sep 29 08:03:58 fenrir kernel: Oops: 0000 [#1] SMP
Sep 29 08:03:58 fenrir kernel: Modules linked in: ctr ccm fuse cpufreq_powersave cpufreq_userspace cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt arc4 nf_conntrack_ipv6 ath9k nf_defrag_ipv6 ath9k_common ipt_REJECT nf_reject_ipv4 ath9k_hw nf_log_ipv4 nf_log_common xt_LOG xt_recent ath xt_limit xt_tcpudp snd_soc_skl mac80211 xt_addrtype snd_soc_skl_ipc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hwdep snd_hda_core kvm cfg80211 snd_pcm snd_timer irqbypass snd intel_cstate intel_uncore joydev intel_rapl_perf pcspkr serio_raw sg iTCO_wdt iTCO_vendor_support soundcore rfkill nf_conntrack_ipv4 nf_defrag_ipv4
Sep 29 08:03:58 fenrir kernel:  xt_conntrack shpchp intel_pch_thermal battery ac topstar_laptop sparse_keymap processor_thermal_device evdev intel_soc_dts_iosf int340x_thermal_zone ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack libcrc32c crc32c_generic parport_pc ppdev lp parport iptable_filter ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel i915 ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper i2c_i801 psmouse aesni_intel prime_numbers ahci xhci_pci aes_x86_64 crypto_simd cryptd glue_helper libahci nvme xhci_hcd drm libata nvme_core usbcore scsi_mod button
Sep 29 08:03:58 fenrir kernel: CPU: 1 PID: 5781 Comm: mpv/ao Tainted: G        W       4.14.0-rc2 #14
Sep 29 08:03:58 fenrir kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017
Sep 29 08:03:58 fenrir kernel: task: ffff924822b20040 task.stack: ffffa4fa83880000
Sep 29 08:03:58 fenrir kernel: RIP: 0010:__list_del_entry_valid+0x29/0x90
Sep 29 08:03:58 fenrir kernel: RSP: 0018:ffffa4fa83883cb0 EFLAGS: 00010203
Sep 29 08:03:58 fenrir kernel: RAX: 0000000000000000 RBX: ffffa4fa837fbd58 RCX: dead000000000200
Sep 29 08:03:58 fenrir kernel: RDX: 0000000000000246 RSI: ffffa4fa80d88448 RDI: ffffa4fa837fbd60
Sep 29 08:03:58 fenrir kernel: RBP: ffffa4fa83883cb0 R08: ffffa4fa837fbdb8 R09: ffffa4fa80d88448
Sep 29 08:03:58 fenrir kernel: R10: 0000000000000001 R11: 000000007fffffff R12: ffffa4fa837fbd60
Sep 29 08:03:58 fenrir kernel: R13: ffffa4fa837fbdd0 R14: ffffa4fa837fbdc0 R15: ffffa4fa80d88448
Sep 29 08:03:58 fenrir kernel: FS:  00007f54175c0700(0000) GS:ffff92483ec80000(0000) knlGS:0000000000000000
Sep 29 08:03:58 fenrir kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 08:03:58 fenrir kernel: CR2: 0000000000000246 CR3: 000000026c196004 CR4: 00000000003606e0
Sep 29 08:03:58 fenrir kernel: Call Trace:
Sep 29 08:03:58 fenrir kernel:  plist_del+0x3b/0xc0
Sep 29 08:03:58 fenrir kernel:  __unqueue_futex+0x2f/0x40
Sep 29 08:03:58 fenrir kernel:  mark_wake_futex+0x3d/0x50
Sep 29 08:03:58 fenrir kernel:  futex_requeue+0x8a9/0xa40
Sep 29 08:03:58 fenrir kernel:  do_futex+0x2ae/0xb10
Sep 29 08:03:58 fenrir kernel:  SyS_futex+0x13b/0x180
Sep 29 08:03:58 fenrir kernel:  ? SyS_write+0x79/0xc0
Sep 29 08:03:58 fenrir kernel:  entry_SYSCALL_64_fastpath+0x1e/0xa9
Sep 29 08:03:58 fenrir kernel: RIP: 0033:0x7f5454a2d91d
Sep 29 08:03:58 fenrir kernel: RSP: 002b:00007f54175bf8e8 EFLAGS: 00000283 ORIG_RAX: 00000000000000ca
Sep 29 08:03:58 fenrir kernel: RAX: ffffffffffffffda RBX: 0000560690f967a0 RCX: 00007f5454a2d91d
Sep 29 08:03:58 fenrir kernel: RDX: 0000000000000001 RSI: 0000000000000084 RDI: 0000560690847fbc
Sep 29 08:03:58 fenrir kernel: RBP: 0000560690f96938 R08: 0000560690847f90 R09: 000000000001a394
Sep 29 08:03:58 fenrir kernel: R10: 000000007fffffff R11: 0000000000000283 R12: 0000000000000e50
Sep 29 08:03:58 fenrir kernel: R13: 0000560690f95a78 R14: 0000560690f95a70 R15: 0000560690f625c0
Sep 29 08:03:58 fenrir kernel: Code: 00 00 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48 89 e5 48 39 c8 74 27 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 2c <48> 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00 
Sep 29 08:03:58 fenrir kernel: RIP: __list_del_entry_valid+0x29/0x90 RSP: ffffa4fa83883cb0
Sep 29 08:03:58 fenrir kernel: CR2: 0000000000000246
Sep 29 08:03:58 fenrir kernel: ---[ end trace a2a9a3f9d58c176b ]---
Sep 29 08:03:58 fenrir kernel: note: mpv/ao[5781] exited with preempt_count 2

Followed by something i915 related:

Sep 29 08:04:09 fenrir kernel: asynchronous wait on fence i915:gnome-shell[5272]/1:2fb1 timed out
Sep 29 08:04:09 fenrir kernel: pipe A vblank wait timed out
Sep 29 08:04:09 fenrir kernel: ------------[ cut here ]------------
Sep 29 08:04:09 fenrir kernel: WARNING: CPU: 1 PID: 5899 at drivers/gpu/drm/i915/intel_display.c:12172 intel_atomic_commit_tail+0xf7c/0xf90 [i915]
Sep 29 08:04:09 fenrir kernel: Modules linked in: ctr ccm fuse cpufreq_powersave cpufreq_userspace cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt arc4 nf_conntrack_ipv6 ath9k nf_defrag_ipv6 ath9k_common ipt_REJECT nf_reject_ipv4 ath9k_hw nf_log_ipv4 nf_log_common xt_LOG xt_recent ath xt_limit xt_tcpudp snd_soc_skl mac80211 xt_addrtype snd_soc_skl_ipc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hwdep snd_hda_core kvm cfg80211 snd_pcm snd_timer irqbypass snd intel_cstate intel_uncore joydev intel_rapl_perf pcspkr serio_raw sg iTCO_wdt iTCO_vendor_support soundcore rfkill nf_conntrack_ipv4 nf_defrag_ipv4
Sep 29 08:04:09 fenrir kernel:  xt_conntrack shpchp intel_pch_thermal battery ac topstar_laptop sparse_keymap processor_thermal_device evdev intel_soc_dts_iosf int340x_thermal_zone ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack libcrc32c crc32c_generic parport_pc ppdev lp parport iptable_filter ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel i915 ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper i2c_i801 psmouse aesni_intel prime_numbers ahci xhci_pci aes_x86_64 crypto_simd cryptd glue_helper libahci nvme xhci_hcd drm libata nvme_core usbcore scsi_mod button
Sep 29 08:04:09 fenrir kernel: CPU: 1 PID: 5899 Comm: kworker/u8:4 Tainted: G      D W       4.14.0-rc2 #14                              
Sep 29 08:04:09 fenrir kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017                      
Sep 29 08:04:09 fenrir kernel: Workqueue: events_unbound intel_atomic_commit_work [i915]
Sep 29 08:04:09 fenrir kernel: task: ffff9247cd54f040 task.stack: ffffa4fa831c0000
Sep 29 08:04:09 fenrir kernel: RIP: 0010:intel_atomic_commit_tail+0xf7c/0xf90 [i915]
Sep 29 08:04:09 fenrir kernel: RSP: 0018:ffffa4fa831c3da8 EFLAGS: 00010286
Sep 29 08:04:09 fenrir kernel: RAX: 000000000000001c RBX: 0000000000000000 RCX: 0000000000000000
Sep 29 08:04:09 fenrir kernel: RDX: 0000000000000000 RSI: ffff92483ec8de98 RDI: ffff92483ec8de98
Sep 29 08:04:09 fenrir kernel: RBP: ffffa4fa831c3e60 R08: 0000000000000000 R09: 00000000000002d0
Sep 29 08:04:09 fenrir kernel: R10: ffffa4fa831c3da8 R11: ffffffff8a4d7b4d R12: 000000000001c980
Sep 29 08:04:09 fenrir kernel: R13: ffff924825308000 R14: ffff924825e5e000 R15: 0000000000000001
Sep 29 08:04:09 fenrir kernel: FS:  0000000000000000(0000) GS:ffff92483ec80000(0000) knlGS:0000000000000000                              
Sep 29 08:04:09 fenrir kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 08:04:09 fenrir kernel: CR2: 00007f8284686340 CR3: 0000000267235006 CR4: 00000000003606e0
Sep 29 08:04:09 fenrir kernel: Call Trace:
Sep 29 08:04:09 fenrir kernel:  ? finish_wait+0x80/0x80
Sep 29 08:04:09 fenrir kernel:  intel_atomic_commit_work+0x12/0x20 [i915]
Sep 29 08:04:09 fenrir kernel:  process_one_work+0x19f/0x3c0
Sep 29 08:04:09 fenrir kernel:  worker_thread+0x39/0x3c0
Sep 29 08:04:09 fenrir kernel:  kthread+0x125/0x140
Sep 29 08:04:09 fenrir kernel:  ? process_one_work+0x3c0/0x3c0
Sep 29 08:04:09 fenrir kernel:  ? kthread_create_on_node+0x70/0x70
Sep 29 08:04:09 fenrir kernel:  ? kthread_create_on_node+0x70/0x70
Sep 29 08:04:09 fenrir kernel:  ret_from_fork+0x25/0x30
Sep 29 08:04:09 fenrir kernel: Code: ff ff ff 48 83 c7 08 e8 03 af 10 c9 4c 8b 85 70 ff ff ff 4d 85 c0 0f 85 b7 fa ff ff 8d 73 41 48 c7 c7 d8 9a 63 c0 e8 15 60 12 c9 <0f> ff e9 a1 fa ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f             
Sep 29 08:04:09 fenrir kernel: ---[ end trace a2a9a3f9d58c176c ]---

Sep 29 08:04:19 fenrir kernel: NMI watchdog: Watchdog detected hard LOCKUP on cpu 2
Sep 29 08:04:19 fenrir kernel: Modules linked in: ctr ccm fuse cpufreq_powersave cpufreq_userspace cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt arc4 nf_conntrack_ipv6 ath9k nf_defrag_ipv6 ath9k_common ipt_REJECT nf_reject_ipv4 ath9k_hw nf_log_ipv4 nf_log_common xt_LOG xt_recent ath xt_limit xt_tcpudp snd_soc_skl mac80211 xt_addrtype snd_soc_skl_ipc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hwdep snd_hda_core kvm cfg80211 snd_pcm snd_timer irqbypass snd intel_cstate intel_uncore joydev intel_rapl_perf pcspkr serio_raw sg iTCO_wdt iTCO_vendor_support soundcore rfkill nf_conntrack_ipv4 nf_defrag_ipv4
Sep 29 08:04:19 fenrir kernel:  xt_conntrack shpchp intel_pch_thermal battery ac topstar_laptop sparse_keymap processor_thermal_device evdev intel_soc_dts_iosf int340x_thermal_zone ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack libcrc32c crc32c_generic parport_pc ppdev lp parport iptable_filter ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel i915 ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper i2c_i801 psmouse aesni_intel prime_numbers ahci xhci_pci aes_x86_64 crypto_simd cryptd glue_helper libahci nvme xhci_hcd drm libata nvme_core usbcore scsi_mod button
Sep 29 08:04:19 fenrir kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D W       4.14.0-rc2 #14
Sep 29 08:04:19 fenrir kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017                      
Sep 29 08:04:19 fenrir kernel: task: ffff924833e1b040 task.stack: ffffa4fa80cc4000
Sep 29 08:04:19 fenrir kernel: RIP: 0010:__remove_hrtimer+0x6/0x70
Sep 29 08:04:19 fenrir kernel: RSP: 0018:ffff92483ed03f18 EFLAGS: 00000046
Sep 29 08:04:19 fenrir kernel: RAX: 14e8a8f363b6b333 RBX: ffff92483ed14480 RCX: 0000000000000000
Sep 29 08:04:19 fenrir kernel: RDX: 0000000000000000 RSI: ffff92483ed14500 RDI: ffffa4fa837fbd10
Sep 29 08:04:19 fenrir kernel: RBP: ffff92483ed03f70 R08: 0000000000000101 R09: 0000000000000000
Sep 29 08:04:19 fenrir kernel: R10: 000000000000b6d9 R11: 0000000000000083 R12: ffffa4fa837fbd10
Sep 29 08:04:19 fenrir kernel: R13: ffff92483ed14500 R14: 0000000000000000 R15: ffff92483ed145a8
Sep 29 08:04:19 fenrir kernel: FS:  0000000000000000(0000) GS:ffff92483ed00000(0000) knlGS:0000000000000000                              
Sep 29 08:04:19 fenrir kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 08:04:19 fenrir kernel: CR2: 00007f3f13f9c000 CR3: 0000000266a5a004 CR4: 00000000003606e0
Sep 29 08:04:19 fenrir kernel: Call Trace:
Sep 29 08:04:19 fenrir kernel:  <IRQ>
Sep 29 08:04:19 fenrir kernel:  ? __hrtimer_run_queues+0xc3/0x260
Sep 29 08:04:19 fenrir kernel:  hrtimer_interrupt+0xa0/0x1e0
Sep 29 08:04:19 fenrir kernel:  smp_apic_timer_interrupt+0x5f/0x130
Sep 29 08:04:19 fenrir kernel:  apic_timer_interrupt+0x93/0xa0
Sep 29 08:04:19 fenrir kernel:  </IRQ>
Sep 29 08:04:19 fenrir kernel: RIP: 0010:cpuidle_enter_state+0x130/0x2f0
Sep 29 08:04:19 fenrir kernel: RSP: 0018:ffffa4fa80cc7e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
Sep 29 08:04:19 fenrir kernel: RAX: ffff92483ed1ae80 RBX: 000001c496916739 RCX: 000000000000001f
Sep 29 08:04:19 fenrir kernel: RDX: 000001c496916739 RSI: fffffffdf3fade99 RDI: 0000000000000000
Sep 29 08:04:19 fenrir kernel: RBP: ffffa4fa80cc7eb0 R08: 0000000000000ebe R09: 0000000000000018
Sep 29 08:04:19 fenrir kernel: R10: ffffa4fa80cc7e40 R11: 0000000000000e30 R12: ffffc4fa7fd089a0
Sep 29 08:04:19 fenrir kernel: R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff8a2adf98
Sep 29 08:04:19 fenrir kernel:  cpuidle_enter+0x17/0x20
Sep 29 08:04:19 fenrir kernel:  call_cpuidle+0x23/0x40
Sep 29 08:04:19 fenrir kernel:  do_idle+0x189/0x1e0
Sep 29 08:04:19 fenrir kernel:  cpu_startup_entry+0x73/0x80
Sep 29 08:04:19 fenrir kernel:  start_secondary+0x179/0x1c0
Sep 29 08:04:19 fenrir kernel:  secondary_startup_64+0xa5/0xa5
Sep 29 08:04:19 fenrir kernel: Code: 21 ff ff ff 48 89 df c6 07 00 0f 1f 40 00 65 ff 0d 80 b3 91 76 5b 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <48> 89 e5 41 56 41 55 41 54 53 0f b6 47 38 4c 8b 36 88 57 38 a8             
Sep 29 08:04:19 fenrir kernel: INFO: rcu_sched self-detected stall on CPU
Sep 29 08:04:19 fenrir kernel:         0-...: (5248 ticks this GP) idle=dfa/140000000000001/0 softirq=60475/60475 fqs=2624               
Sep 29 08:04:19 fenrir kernel:          (t=5250 jiffies g=24728 c=24727 q=1279)
Sep 29 08:04:19 fenrir kernel: NMI backtrace for cpu 0
Sep 29 08:04:19 fenrir kernel: CPU: 0 PID: 5774 Comm: mpv/vo Tainted: G      D W       4.14.0-rc2 #14
Sep 29 08:04:19 fenrir kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017                      
Sep 29 08:04:19 fenrir kernel: Call Trace:
Sep 29 08:04:19 fenrir kernel:  <IRQ>
Sep 29 08:04:19 fenrir kernel:  dump_stack+0x63/0x82
Sep 29 08:04:19 fenrir kernel:  nmi_cpu_backtrace+0xca/0xd0
Sep 29 08:04:19 fenrir kernel:  ? irq_force_complete_move+0x150/0x150
Sep 29 08:04:19 fenrir kernel:  nmi_trigger_cpumask_backtrace+0x10d/0x140
Sep 29 08:04:19 fenrir kernel:  arch_trigger_cpumask_backtrace+0x19/0x20
Sep 29 08:04:19 fenrir kernel:  rcu_dump_cpu_stacks+0xa3/0xd7
Sep 29 08:04:19 fenrir kernel:  rcu_check_callbacks+0x60a/0x840
Sep 29 08:04:19 fenrir kernel:  ? account_system_index_time+0x63/0x70
Sep 29 08:04:19 fenrir kernel:  ? tick_sched_do_timer+0x50/0x50
Sep 29 08:04:19 fenrir kernel:  update_process_times+0x2f/0x60
Sep 29 08:04:19 fenrir kernel:  tick_sched_handle+0x26/0x70
Sep 29 08:04:19 fenrir kernel:  ? tick_sched_do_timer+0x3f/0x50
Sep 29 08:04:19 fenrir kernel:  tick_sched_timer+0x39/0x80
Sep 29 08:04:19 fenrir kernel:  __hrtimer_run_queues+0xe4/0x260
Sep 29 08:04:19 fenrir kernel:  hrtimer_interrupt+0xa0/0x1e0
Sep 29 08:04:19 fenrir kernel:  smp_apic_timer_interrupt+0x5f/0x130
Sep 29 08:04:19 fenrir kernel:  apic_timer_interrupt+0x93/0xa0
Sep 29 08:04:19 fenrir kernel:  </IRQ>
Sep 29 08:04:19 fenrir kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x135/0x1a0
Sep 29 08:04:19 fenrir kernel: RSP: 0018:ffffa4fa83843c68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
Sep 29 08:04:19 fenrir kernel: RAX: 0000000000000101 RBX: 0000560690847f90 RCX: 0000000000000001
Sep 29 08:04:19 fenrir kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffa4fa80d88504
Sep 29 08:04:19 fenrir kernel: RBP: ffffa4fa83843c68 R08: 0000000000000101 R09: 0000000000000000
Sep 29 08:04:19 fenrir kernel: R10: 0000000000000002 R11: ffff9247cd0d0040 R12: ffffa4fa83843d08
Sep 29 08:04:19 fenrir kernel: R13: ffffa4fa83843d58 R14: ffffa4fa83843d90 R15: ffffa4fa80d88500
Sep 29 08:04:19 fenrir kernel:  _raw_spin_lock+0x28/0x30
Sep 29 08:04:19 fenrir kernel:  futex_wait_setup+0x82/0x130
Sep 29 08:04:19 fenrir kernel:  futex_wait+0xed/0x260
Sep 29 08:04:19 fenrir kernel:  ? ___sys_sendmsg+0xa4/0x2e0
Sep 29 08:04:19 fenrir kernel:  do_futex+0x506/0xb10
Sep 29 08:04:19 fenrir kernel:  SyS_futex+0x13b/0x180
Sep 29 08:04:19 fenrir kernel:  entry_SYSCALL_64_fastpath+0x1e/0xa9
Sep 29 08:04:19 fenrir kernel: RIP: 0033:0x7f5454a2ff5c
Sep 29 08:04:19 fenrir kernel: RSP: 002b:00007f54293878e8 EFLAGS: 00000202 ORIG_RAX: 00000000000000ca
Sep 29 08:04:19 fenrir kernel: RAX: ffffffffffffffda RBX: 00000000001c2000 RCX: 00007f5454a2ff5c
Sep 29 08:04:19 fenrir kernel: RDX: 0000000000000002 RSI: 0000000000000080 RDI: 0000560690847f90
Sep 29 08:04:19 fenrir kernel: RBP: 00007f5429387338 R08: 0000560690847f90 R09: 0000000000009e82
Sep 29 08:04:19 fenrir kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 00000000001c2000
Sep 29 08:04:19 fenrir kernel: R13: 00007f53ecb7fb20 R14: 00007f5429387338 R15: 0000000000000002
Sep 29 08:04:19 fenrir kernel: [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:36:pipe A] flip_done timed out 

And more soft lockups:

Sep 29 08:04:46 fenrir kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mpv/vo:5774]
Sep 29 08:04:46 fenrir kernel: Modules linked in: ctr ccm fuse cpufreq_powersave cpufreq_userspace cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt arc4 nf_conntrack_ipv6 ath9k nf_defrag_ipv6 ath9k_common ipt_REJECT nf_reject_ipv4 ath9k_hw nf_log_ipv4 nf_log_common xt_LOG xt_recent ath xt_limit xt_tcpudp snd_soc_skl mac80211 xt_addrtype snd_soc_skl_ipc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hwdep snd_hda_core kvm cfg80211 snd_pcm snd_timer irqbypass snd intel_cstate intel_uncore joydev intel_rapl_perf pcspkr serio_raw sg iTCO_wdt iTCO_vendor_support soundcore rfkill nf_conntrack_ipv4 nf_defrag_ipv4
Sep 29 08:04:46 fenrir kernel:  xt_conntrack shpchp intel_pch_thermal battery ac topstar_laptop sparse_keymap processor_thermal_device evdev intel_soc_dts_iosf int340x_thermal_zone ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack libcrc32c crc32c_generic parport_pc ppdev lp parport iptable_filter ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel i915 ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper i2c_i801 psmouse aesni_intel prime_numbers ahci xhci_pci aes_x86_64 crypto_simd cryptd glue_helper libahci nvme xhci_hcd drm libata nvme_core usbcore scsi_mod button
Sep 29 08:04:46 fenrir kernel: CPU: 0 PID: 5774 Comm: mpv/vo Tainted: G      D W       4.14.0-rc2 #14
Sep 29 08:04:46 fenrir kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017                      
Sep 29 08:04:46 fenrir kernel: task: ffff9247cd0d0040 task.stack: ffffa4fa83840000
Sep 29 08:04:46 fenrir kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x135/0x1a0
Sep 29 08:04:46 fenrir kernel: RSP: 0018:ffffa4fa83843c68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
Sep 29 08:04:46 fenrir kernel: RAX: 0000000000000101 RBX: 0000560690847f90 RCX: 0000000000000001
Sep 29 08:04:46 fenrir kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffa4fa80d88504
Sep 29 08:04:46 fenrir kernel: RBP: ffffa4fa83843c68 R08: 0000000000000101 R09: 0000000000000000
Sep 29 08:04:46 fenrir kernel: R10: 0000000000000002 R11: ffff9247cd0d0040 R12: ffffa4fa83843d08
Sep 29 08:04:46 fenrir kernel: R13: ffffa4fa83843d58 R14: ffffa4fa83843d90 R15: ffffa4fa80d88500
Sep 29 08:04:46 fenrir kernel: FS:  00007f5429388700(0000) GS:ffff92483ec00000(0000) knlGS:0000000000000000                              
Sep 29 08:04:46 fenrir kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 08:04:46 fenrir kernel: CR2: 00007f8284498510 CR3: 000000026c196004 CR4: 00000000003606f0
Sep 29 08:04:46 fenrir kernel: Call Trace:
Sep 29 08:04:46 fenrir kernel:  _raw_spin_lock+0x28/0x30
Sep 29 08:04:46 fenrir kernel:  futex_wait_setup+0x82/0x130
Sep 29 08:04:46 fenrir kernel:  futex_wait+0xed/0x260
Sep 29 08:04:46 fenrir kernel:  ? ___sys_sendmsg+0xa4/0x2e0
Sep 29 08:04:46 fenrir kernel:  do_futex+0x506/0xb10
Sep 29 08:04:46 fenrir kernel:  SyS_futex+0x13b/0x180
Sep 29 08:04:46 fenrir kernel:  entry_SYSCALL_64_fastpath+0x1e/0xa9
Sep 29 08:04:46 fenrir kernel: RIP: 0033:0x7f5454a2ff5c
Sep 29 08:04:46 fenrir kernel: RSP: 002b:00007f54293878e8 EFLAGS: 00000202 ORIG_RAX: 00000000000000ca
Sep 29 08:04:46 fenrir kernel: RAX: ffffffffffffffda RBX: 00000000001c2000 RCX: 00007f5454a2ff5c
Sep 29 08:04:46 fenrir kernel: RDX: 0000000000000002 RSI: 0000000000000080 RDI: 0000560690847f90
Sep 29 08:04:46 fenrir kernel: RBP: 00007f5429387338 R08: 0000560690847f90 R09: 0000000000009e82
Sep 29 08:04:46 fenrir kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 00000000001c2000
Sep 29 08:04:46 fenrir kernel: R13: 00007f53ecb7fb20 R14: 00007f5429387338 R15: 0000000000000002
Sep 29 08:04:46 fenrir kernel: Code: 66 31 c0 41 39 c0 74 ea 4d 85 c9 c6 07 01 74 2d 41 c7 41 08 01 00 00 00 eb 96 83 fa 01 0f 84 f4 fe ff ff 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4c 8b 09             


Sep 29 08:05:14 fenrir kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mpv/vo:5774]
Sep 29 08:05:14 fenrir kernel: Modules linked in: ctr ccm fuse cpufreq_powersave cpufreq_userspace cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt arc4 nf_conntrack_ipv6 ath9k nf_defrag_ipv6 ath9k_common ipt_REJECT nf_reject_ipv4 ath9k_hw nf_log_ipv4 nf_log_common xt_LOG xt_recent ath xt_limit xt_tcpudp snd_soc_skl mac80211 xt_addrtype snd_soc_skl_ipc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hwdep snd_hda_core kvm cfg80211 snd_pcm snd_timer irqbypass snd intel_cstate intel_uncore joydev intel_rapl_perf pcspkr serio_raw sg iTCO_wdt iTCO_vendor_support soundcore rfkill nf_conntrack_ipv4 nf_defrag_ipv4
Sep 29 08:05:14 fenrir kernel:  xt_conntrack shpchp intel_pch_thermal battery ac topstar_laptop sparse_keymap processor_thermal_device evdev intel_soc_dts_iosf int340x_thermal_zone ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack libcrc32c crc32c_generic parport_pc ppdev lp parport iptable_filter ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel i915 ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper i2c_i801 psmouse aesni_intel prime_numbers ahci xhci_pci aes_x86_64 crypto_simd cryptd glue_helper libahci nvme xhci_hcd drm libata nvme_core usbcore scsi_mod button
Sep 29 08:05:14 fenrir kernel: CPU: 0 PID: 5774 Comm: mpv/vo Tainted: G      D W    L  4.14.0-rc2 #14
Sep 29 08:05:14 fenrir kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017                      
Sep 29 08:05:14 fenrir kernel: task: ffff9247cd0d0040 task.stack: ffffa4fa83840000
Sep 29 08:05:14 fenrir kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x135/0x1a0
Sep 29 08:05:14 fenrir kernel: RSP: 0018:ffffa4fa83843c68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
Sep 29 08:05:14 fenrir kernel: RAX: 0000000000000101 RBX: 0000560690847f90 RCX: 0000000000000001
Sep 29 08:05:14 fenrir kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffa4fa80d88504
Sep 29 08:05:14 fenrir kernel: RBP: ffffa4fa83843c68 R08: 0000000000000101 R09: 0000000000000000
Sep 29 08:05:14 fenrir kernel: R10: 0000000000000002 R11: ffff9247cd0d0040 R12: ffffa4fa83843d08
Sep 29 08:05:14 fenrir kernel: R13: ffffa4fa83843d58 R14: ffffa4fa83843d90 R15: ffffa4fa80d88500
Sep 29 08:05:14 fenrir kernel: FS:  00007f5429388700(0000) GS:ffff92483ec00000(0000) knlGS:0000000000000000                              
Sep 29 08:05:14 fenrir kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 29 08:05:14 fenrir kernel: CR2: 00007f8284498510 CR3: 000000026c196004 CR4: 00000000003606f0
Sep 29 08:05:14 fenrir kernel: Call Trace:
Sep 29 08:05:14 fenrir kernel:  _raw_spin_lock+0x28/0x30
Sep 29 08:05:14 fenrir kernel:  futex_wait_setup+0x82/0x130
Sep 29 08:05:14 fenrir kernel:  futex_wait+0xed/0x260
Sep 29 08:05:14 fenrir kernel:  ? ___sys_sendmsg+0xa4/0x2e0
Sep 29 08:05:14 fenrir kernel:  do_futex+0x506/0xb10
Sep 29 08:05:14 fenrir kernel:  SyS_futex+0x13b/0x180
Sep 29 08:05:14 fenrir kernel:  entry_SYSCALL_64_fastpath+0x1e/0xa9
Sep 29 08:05:14 fenrir kernel: RIP: 0033:0x7f5454a2ff5c
Sep 29 08:05:14 fenrir kernel: RSP: 002b:00007f54293878e8 EFLAGS: 00000202 ORIG_RAX: 00000000000000ca
Sep 29 08:05:14 fenrir kernel: RAX: ffffffffffffffda RBX: 00000000001c2000 RCX: 00007f5454a2ff5c
Sep 29 08:05:14 fenrir kernel: RDX: 0000000000000002 RSI: 0000000000000080 RDI: 0000560690847f90
Sep 29 08:05:14 fenrir kernel: RBP: 00007f5429387338 R08: 0000560690847f90 R09: 0000000000009e82
Sep 29 08:05:14 fenrir kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 00000000001c2000
Sep 29 08:05:14 fenrir kernel: R13: 00007f53ecb7fb20 R14: 00007f5429387338 R15: 0000000000000002
Sep 29 08:05:14 fenrir kernel: Code: 66 31 c0 41 39 c0 74 ea 4d 85 c9 c6 07 01 74 2d 41 c7 41 08 01 00 00 00 eb 96 83 fa 01 0f 84 f4 fe ff ff 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4c 8b 09             

Sep 29 08:05:22 fenrir kernel: INFO: rcu_sched self-detected stall on CPU
Sep 29 08:05:22 fenrir kernel:         0-...: (21001 ticks this GP) idle=dfa/140000000000001/0 softirq=60475/60475 fqs=10501             
Sep 29 08:05:22 fenrir kernel:          (t=21003 jiffies g=24728 c=24727 q=1887)
Sep 29 08:05:22 fenrir kernel: NMI backtrace for cpu 0
Sep 29 08:05:22 fenrir kernel: CPU: 0 PID: 5774 Comm: mpv/vo Tainted: G      D W    L  4.14.0-rc2 #14
Sep 29 08:05:22 fenrir kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017                      
Sep 29 08:05:22 fenrir kernel: Call Trace:
Sep 29 08:05:22 fenrir kernel:  <IRQ>
Sep 29 08:05:22 fenrir kernel:  dump_stack+0x63/0x82
Sep 29 08:05:22 fenrir kernel:  nmi_cpu_backtrace+0xca/0xd0
Sep 29 08:05:22 fenrir kernel:  ? irq_force_complete_move+0x150/0x150
Sep 29 08:05:22 fenrir kernel:  nmi_trigger_cpumask_backtrace+0x10d/0x140
Sep 29 08:05:22 fenrir kernel:  arch_trigger_cpumask_backtrace+0x19/0x20
Sep 29 08:05:22 fenrir kernel:  rcu_dump_cpu_stacks+0xa3/0xd7
Sep 29 08:05:22 fenrir kernel:  rcu_check_callbacks+0x60a/0x840
Sep 29 08:05:22 fenrir kernel:  ? account_system_index_time+0x63/0x70
Sep 29 08:05:22 fenrir kernel:  ? tick_sched_do_timer+0x50/0x50
Sep 29 08:05:22 fenrir kernel:  update_process_times+0x2f/0x60
Sep 29 08:05:22 fenrir kernel:  tick_sched_handle+0x26/0x70
Sep 29 08:05:22 fenrir kernel:  ? tick_sched_do_timer+0x3f/0x50
Sep 29 08:05:22 fenrir kernel:  tick_sched_timer+0x39/0x80
Sep 29 08:05:22 fenrir kernel:  __hrtimer_run_queues+0xe4/0x260
Sep 29 08:05:22 fenrir kernel:  hrtimer_interrupt+0xa0/0x1e0
Sep 29 08:05:22 fenrir kernel:  smp_apic_timer_interrupt+0x5f/0x130
Sep 29 08:05:22 fenrir kernel:  apic_timer_interrupt+0x93/0xa0
Sep 29 08:05:22 fenrir kernel:  </IRQ>
Sep 29 08:05:22 fenrir kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x137/0x1a0
Sep 29 08:05:22 fenrir kernel: RSP: 0018:ffffa4fa83843c68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
Sep 29 08:05:22 fenrir kernel: RAX: 0000000000000101 RBX: 0000560690847f90 RCX: 0000000000000001
Sep 29 08:05:22 fenrir kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffa4fa80d88504
Sep 29 08:05:22 fenrir kernel: RBP: ffffa4fa83843c68 R08: 0000000000000101 R09: 0000000000000000
Sep 29 08:05:22 fenrir kernel: R10: 0000000000000002 R11: ffff9247cd0d0040 R12: ffffa4fa83843d08
Sep 29 08:05:22 fenrir kernel: R13: ffffa4fa83843d58 R14: ffffa4fa83843d90 R15: ffffa4fa80d88500
Sep 29 08:05:22 fenrir kernel:  _raw_spin_lock+0x28/0x30
Sep 29 08:05:22 fenrir kernel:  futex_wait_setup+0x82/0x130
Sep 29 08:05:22 fenrir kernel:  futex_wait+0xed/0x260
Sep 29 08:05:22 fenrir kernel:  ? ___sys_sendmsg+0xa4/0x2e0
Sep 29 08:05:22 fenrir kernel:  do_futex+0x506/0xb10
Sep 29 08:05:22 fenrir kernel:  SyS_futex+0x13b/0x180
Sep 29 08:05:22 fenrir kernel:  entry_SYSCALL_64_fastpath+0x1e/0xa9
Sep 29 08:05:22 fenrir kernel: RIP: 0033:0x7f5454a2ff5c
Sep 29 08:05:22 fenrir kernel: RSP: 002b:00007f54293878e8 EFLAGS: 00000202 ORIG_RAX: 00000000000000ca
Sep 29 08:05:22 fenrir kernel: RAX: ffffffffffffffda RBX: 00000000001c2000 RCX: 00007f5454a2ff5c
Sep 29 08:05:22 fenrir kernel: RDX: 0000000000000002 RSI: 0000000000000080 RDI: 0000560690847f90
Sep 29 08:05:22 fenrir kernel: RBP: 00007f5429387338 R08: 0000560690847f90 R09: 0000000000009e82
Sep 29 08:05:22 fenrir kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 00000000001c2000
Sep 29 08:05:22 fenrir kernel: R13: 00007f53ecb7fb20 R14: 00007f5429387338 R15: 0000000000000002

etc etc.

FWIW, I’ve also tried X11 with Option “DRI” “False” on the card (so it shouldn’t be using any acceleration feature?!), I’m really surprised that didn’t help.

Hopefully the intel-graphics list will have an idea what this is about…

EDIT: Also found out about /sys/class/drm/card0/error thanks to that bug you pointed at, I’ll try to see if I can have a look at it before ssh stops responding next time as well. It’s tricky to get traces on this stuff…

1 Like

Uh, mpv just crashed with -vo null as well… So we’re actually looking at some weird CPU microcode issue? Go figure what ffmpeg does do decode h264 :confused:

No big crash this time, just mpv segfaults - but it doesn’t segfault playing the very same video with the very same software if I only have one mpv instance running, or with cpufreq governor set to performance. I’ll retry that one more time tomorrow to double check.
I guess I can give up on intel-graphics though, let’s try the intel community “processor” forum instead… There were other threads about freezes a while back, although prime95 with these settings didn’t crash at all here there might have been another problem…

EDIT: Created a post over there, let’s see: https://communities.intel.com/thread/118352

1 Like

I experienced a major freeze today. I had firefox opened with 2 tabs running live streams as well as chrome browser opened with another live stream video. I am going to try to reproduce the crash and report back here.

I’ve had three freezes today alone!
The first one unlocked itself after waiting ~10 minutes but shortly thereafter it froze again and after 20 minutes I did a hard reboot.

My suspicion is that it is graphics related (wayland?) but I don’t know how to confirm or deny.

Yeah this is annoying :confused:

I was also thinking graphics at first but I was able to get crashes without any graphics (removed all graphics drivers, rebooted in single mode and just ran mpv - decoding and throwing the result away - and got it to crash as it does with wayland/gnome running).

I’m fairly sure it has to do with frequency changes - if you want a quick workaround you can run either of these commands:

# disable 3/4 of the CPUs and set the policy to powersave
sudo sh -c 'for f in /sys/devices/system/cpu/cpu[123]/online; do echo 0 > $f; done; echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'

# or -- renable/leave all 4 CPUs online, but set them to performance
sudo sh -c 'for cpu in /sys/devices/system/cpu/cpu[123]; do echo 1 > $cpu/online; echo performance > $cpu/cpufreq/scaling_governor; done; echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Both of these are working rather well for me atm. I usually stick to one CPU and it honestly is good enough for most of what I’m doing, and it’s simple enough to switch whenever I need to.
I would have wanted to try something else (e.g. limit the maximum frequency) but it seems the scaling_max_freq tuning does not look like it is working…

My current hunch at the moment is that there is a problem with frequent frequency changes regarding power consumption. graphical/video decoding (vectorial operations) are power hungry, and I noticed I only get crash if the CPUs are “half used” (and looking at stats I would see the frequency swing from 1 to 3GHz all the time).
If there is a sudden tension drop at the CPU’s input voltage there is no shortage of bad behavior we can observe, but I’m not too sure how to confirm that.
Talking with @mladen and the support team about how to help investigate through mails…

4 Likes

Asmadeus, the problem is that I do not have the hardware (Librem 15v3) so I cannot reproduce these issues. I must point out again that this looks to me like a software issue, not hardware. I’ll ping our devs who have the hardware to try to reproduce this bug.

1 Like

interesting analysis. I hope we are able to nail this one.

I really do not think this can be a software issue. I’ve tried many different kernels and different distros (with slightly different versions of ffmpeg/mpv as well as many different media). The exact same software on the same laptop works if I just set the cpu governor to performance.
Heck, from what I can understand linux is actually hardly involved with the variable cpu frequency once the governor is set… (and while I checked the driver, I see it really does ignore the max freq directive /sigh. intel pstate is complicated stuff.)

After disabling all drm modules / running mpv with null output from the framebuffer I really do not see what else I could try to convince you otherwise – I agree this likely can be solved by a microcode update (I guess it depends on how you define that, firmware is somewhere in between) but it’s not really something we can deal with directly and the feedback I have had on the intel ‘community forum’ is pretty disappointing so far.

Anyway, it would already be great if you can just get them to see if they can reproduce. It really shouldn’t take much time as mpv is packaged in PureOS, it’s really just a matter of installing a package, downloading a file, starting a couple of playbacks with default settings and wait :slight_smile:
The main thing I am curious about right now is whether all the librem 15v3 have the problem or if only a handful do, and if so maybe find a common factor (does the nvme drive consume too much power or something? I can’t really play with that until I get back from this trip)

Thanks!

1 Like

I am actually experiencing this with different distros after updates, but it looks at this moment that it is combination of few stacks itself (GNOME, Wayland, mpv, Firefox changes) - there are upstream reports of some/similar bugs. That said, last few days I am fine so probably new flow of updates will fix things - I tested now mpv on my Librem15v3 for more then a half an hour and it is all running fine (I have tons of tabs in FF, Thunderbird is open as well) and this is all on Debian unstable, so PureOS should get those updates once they land in Debian testing.

4 Likes

Ooh. Thanks for these news!

Would it be much to ask to either leave the video in background when you do things or have more than one? Depending on the video I have found that one playback is not enough to generate crashes, my first problems were always when I had playback + something actively running in the background (like apt update/install. firefox/thunderbird might eat a lot of ram, but they should not be hogging CPU in the background… hopefully)
I’ve had best results with two 720p (recent) h264 videos, but basically just need to check through /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq that the frequencies do vary wildly from <=1GHz to >=2.8GHz.

I’ll also install a debian sid to check, probably next week though.

1 Like

Here’s a datapoint. Running Librem 15v3 with an up-to-date PureOS green (installed from the Calameres ISO) with no post-install power configuration that I’m aware of:

~> cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave

~> cpufreq-info | grep 'current CPU fr' # no videos running
  current CPU frequency is 500 MHz.
  current CPU frequency is 488 MHz.
  current CPU frequency is 500 MHz.
  current CPU frequency is 500 MHz.

Just ran 5 background videos in MPV for ~1 hour with no issues. I have previously had occasional system freeze-ups (forced to hard reboot) while interacting with PureBrowser (and no video running) – I assumed it was a PureBrowser issue that will be resolved when it’s next synched with Firefox.

1 Like

Cheers! 5 playbacks at a time is probably over doing it, it might be stuck at high CPU usage all the time, but it’s good to know that isn’t a proper reproducer (especially since you say you’ve had system freezes before, we can assume your librem has a similar problem…)

I’ll try to write a simple C program that attempts reproducing for me.

1 Like

any update on this issue. I recently updated my Librem 15v3, I am now running on kernel 4.13.0-1-amd64 and I got 3 freezes in the last 24 hours.

I was running Firefox and a youtube video in a tab when the first freeze occurred.
Chrome with Hangouts video when then 2nd/3rd freeze occurred.

Any luck in reproducing the bug systematically?

I’m now running with intel_pstate=disable on kernel args and haven’t had the problem since (looks like it still happened when setting everything to performance or with just one core on powersave, just way less often), so the problem kind of lost priority for me given the lack of interest.

Playing multiple videos really is a reliable reproducer for me (with default powersaving options); so given some said they can’t get it to crash and more importantly given that almost no-one else reported the problem (we’re, what, 5 so far? For a problem that happens multiple times a day with regular computer usage anyway!), I’m thinking this is some subtle hardware problem in a bad series of chips and call it a day; if more folks show up here it might revive some interest though, curiosity stll wants me to probe the cpu voltage when frequency varies and things like that, just not enough time to do everything.

1 Like

thanks for that. So from what I understand this disables the intel cpu built-in governor. However, does that mean that a a generic ACPI module takes over the cpu governance as suggested in this discussion?

I actually had no replacement (acpi-cpufreq module is not compiled on my kernel), my original goal in disabling intel_pstate was to use the userspace governor and manually change frequency when going on battery but I never really finished that either… I assume that if nothing cares the cpu frequency is stuck at some default value and never changes from there, which should be pretty much the same as the userspace governor with a fixed value.

I’ll play along and test acpi-cpufreq a bit today.

1 Like

I checked my currently loaded kernel modules lsmod but don’t see intel_pstate listed, does that mean it ain’t running on my system?

[EDIT]
ok, I found that to check if intel_pstate is enabled, I need to run the following command:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver

it is indeed running.