Asmadeus, the problem is that I do not have the hardware (Librem 15v3) so I cannot reproduce these issues. I must point out again that this looks to me like a software issue, not hardware. I’ll ping our devs who have the hardware to try to reproduce this bug.
interesting analysis. I hope we are able to nail this one.
I really do not think this can be a software issue. I’ve tried many different kernels and different distros (with slightly different versions of ffmpeg/mpv as well as many different media). The exact same software on the same laptop works if I just set the cpu governor to performance.
Heck, from what I can understand linux is actually hardly involved with the variable cpu frequency once the governor is set… (and while I checked the driver, I see it really does ignore the max freq directive /sigh. intel pstate is complicated stuff.)
After disabling all drm modules / running mpv with null output from the framebuffer I really do not see what else I could try to convince you otherwise – I agree this likely can be solved by a microcode update (I guess it depends on how you define that, firmware is somewhere in between) but it’s not really something we can deal with directly and the feedback I have had on the intel ‘community forum’ is pretty disappointing so far.
Anyway, it would already be great if you can just get them to see if they can reproduce. It really shouldn’t take much time as mpv is packaged in PureOS, it’s really just a matter of installing a package, downloading a file, starting a couple of playbacks with default settings and wait
The main thing I am curious about right now is whether all the librem 15v3 have the problem or if only a handful do, and if so maybe find a common factor (does the nvme drive consume too much power or something? I can’t really play with that until I get back from this trip)
Thanks!
I am actually experiencing this with different distros after updates, but it looks at this moment that it is combination of few stacks itself (GNOME, Wayland, mpv, Firefox changes) - there are upstream reports of some/similar bugs. That said, last few days I am fine so probably new flow of updates will fix things - I tested now mpv on my Librem15v3 for more then a half an hour and it is all running fine (I have tons of tabs in FF, Thunderbird is open as well) and this is all on Debian unstable, so PureOS should get those updates once they land in Debian testing.
Ooh. Thanks for these news!
Would it be much to ask to either leave the video in background when you do things or have more than one? Depending on the video I have found that one playback is not enough to generate crashes, my first problems were always when I had playback + something actively running in the background (like apt update/install. firefox/thunderbird might eat a lot of ram, but they should not be hogging CPU in the background… hopefully)
I’ve had best results with two 720p (recent) h264 videos, but basically just need to check through /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
that the frequencies do vary wildly from <=1GHz to >=2.8GHz.
I’ll also install a debian sid to check, probably next week though.
Here’s a datapoint. Running Librem 15v3 with an up-to-date PureOS green (installed from the Calameres ISO) with no post-install power configuration that I’m aware of:
~> cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
~> cpufreq-info | grep 'current CPU fr' # no videos running
current CPU frequency is 500 MHz.
current CPU frequency is 488 MHz.
current CPU frequency is 500 MHz.
current CPU frequency is 500 MHz.
Just ran 5 background videos in MPV for ~1 hour with no issues. I have previously had occasional system freeze-ups (forced to hard reboot) while interacting with PureBrowser (and no video running) – I assumed it was a PureBrowser issue that will be resolved when it’s next synched with Firefox.
Cheers! 5 playbacks at a time is probably over doing it, it might be stuck at high CPU usage all the time, but it’s good to know that isn’t a proper reproducer (especially since you say you’ve had system freezes before, we can assume your librem has a similar problem…)
I’ll try to write a simple C program that attempts reproducing for me.
any update on this issue. I recently updated my Librem 15v3, I am now running on kernel 4.13.0-1-amd64
and I got 3 freezes in the last 24 hours.
I was running Firefox and a youtube video in a tab when the first freeze occurred.
Chrome with Hangouts video when then 2nd/3rd freeze occurred.
Any luck in reproducing the bug systematically?
I’m now running with intel_pstate=disable
on kernel args and haven’t had the problem since (looks like it still happened when setting everything to performance or with just one core on powersave, just way less often), so the problem kind of lost priority for me given the lack of interest.
Playing multiple videos really is a reliable reproducer for me (with default powersaving options); so given some said they can’t get it to crash and more importantly given that almost no-one else reported the problem (we’re, what, 5 so far? For a problem that happens multiple times a day with regular computer usage anyway!), I’m thinking this is some subtle hardware problem in a bad series of chips and call it a day; if more folks show up here it might revive some interest though, curiosity stll wants me to probe the cpu voltage when frequency varies and things like that, just not enough time to do everything.
thanks for that. So from what I understand this disables the intel cpu built-in governor. However, does that mean that a a generic ACPI module takes over the cpu governance as suggested in this discussion?
I actually had no replacement (acpi-cpufreq module is not compiled on my kernel), my original goal in disabling intel_pstate was to use the userspace governor and manually change frequency when going on battery but I never really finished that either… I assume that if nothing cares the cpu frequency is stuck at some default value and never changes from there, which should be pretty much the same as the userspace governor with a fixed value.
I’ll play along and test acpi-cpufreq
a bit today.
I checked my currently loaded kernel modules lsmod
but don’t see intel_pstate listed, does that mean it ain’t running on my system?
[EDIT]
ok, I found that to check if intel_pstate is enabled, I need to run the following command:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
it is indeed running.
Just some general thoughts, hopefully helpful in some way:
Userspace code can never (directly) cause a real freeze, it is either hardware or kernel/driver.
Almost always if a system appears frozen, it can still do more than most people assume:
- try to ping the system
- try to ssh into the system
If successful, kill/restart possibly offending processes or do a “reboot” - try Alt+SysReq+R,E,I,S,U,B
This is the lowest possible access to the kernel
I don’t have a Librem, but i assume you have to press Fn+PrtSc for SysRequest and release it after each letter
Or, just use an external keyboard
It would be interesting to know if the result is the same fore everybody affected.
On the video decoding:
This might only create reproducible results if everybody decodes the same file, or at least the same encoding, maybe resolution etc. If hardware decoding on the GPU is involved, your screen may be dead, but ssh could still work. Then, a kernel (GPU driver) update could be the cause.
But even if “only” the GPU is stuck, I could (hardly, but still) imagine that some hardware problem is at the root, like the voltage regulator can’t handle high GPU usage plus CPU freq changes at the same time. This, in turn, could have been hidden with an older (less optimized?) kernel.
I went back to debian’s kernel with acpi-cpufreq available, but the module won’t load despite disabling intel_pstate as your link suggested it would. There’s nothing in dmesg either.
Looking it up it looks like there normally is a bios setting necessary for that, but with coreboot it should just be possible to enable cpu configuration bits later on instead, I just wish I had a clue how
@Caliga: I don’t know what others have, but what I observe is “just” some operations sometimes not behaving as expected. Sometimes only the video player crashes, sometimes the compositor crashes, sometimes something in the kernel oopses to the point even sysrq won’t work anymore… It’s the same with video - playing the same media twice can work without any error once and crash something the next time; I have tried various media and they all exhibit the same behavior when there are frequency changes.
More than 5 freezes since I received Librem 15 v3 last week, mostly when I am watching video content using purebrowser, that makes me 6th person so far. I also tried firefox on Debian 9 Gnome but same problem. I been trying to disable intel pstate in PureOS for several hours without any success . I need help with that.
thanks for the detailed post. I will test out this lead in my next freeze session. However, now that you mention it I get the following msg on my boot screen,
firmware: failed to load i915/skl_dmc_ver1_26.
i915 is the open source intel graphics driver, so does the above refer to a specific version of the driver ?
lsmod
shows that the i915 module is loaded, furthermore
$ modinfo i915
filename: /lib/modules/4.13.0-1-amd64/kernel/drivers/gpu/drm/i915/i915.ko
license: GPL and additional rights
description: Intel Graphics
author: Intel Corporation
author: Tungsten Graphics, Inc.
firmware: i915/bxt_dmc_ver1_07.bin
firmware: i915/skl_dmc_ver1_26.bin
firmware: i915/kbl_dmc_ver1_01.bin
firmware: i915/kbl_guc_ver9_14.bin
firmware: i915/bxt_guc_ver8_7.bin
firmware: i915/skl_guc_ver6_1.bin
firmware: i915/kbl_huc_ver02_00_1810.bin
firmware: i915/bxt_huc_ver01_07_1398.bin
firmware: i915/skl_huc_ver01_07_1398.bin
the i915/skl_dmc_ver1_26
firmware is listed, does that means it is loaded after-all?
@gEck0: Thanks for letting us know! That definitely renews interest a bit
Since you’re using PureOS I can help, you need to add intel_pstate=disable
to GRUB_CMDLINE_LINUX_DEFAULT
in your /etc/default/grub
file then regenerate your grub config with update-grub
. The whole session should look like:
sudo nano /etc/default/grub
# at this point find the GRUB_CMDLINE_LINUX_DEFAULT and add
# intel_pstate=disable within the quotes, then exit with ctrl+X,
# press Y to save changes, enter to validate filename
sudo update-grub
You’ll need to reboot after this change for it to take effect.
@vrata: this file is “just” a firmware blob that helps with a feature called DMC, which provides additional graphics low-power idle states according to the documentation. I have tried installing it (you can find it in the non-free linux-firmwares package, or straight in the linux firmware git https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git ) – that did not help me.
Thanks @Asmadeus that worked without any problems, a little addition to that ‘GRUB_CMD…’ line must look like this to work:
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash intel_pstate=disable”
I will keep posting the results for next couple of days. Hope this solves freezing problem on this Librem.
@Asmadeus Today (means after 1 day) I again experienced a freeze, I just clicked on a youtube link listed here:
with intel_pstate=disable
?