Librem 14 sudden crash when unplugged

Handle 0x0009, DMI type 16, 23 bytes
Physical Memory Array
	Location: System Board Or Motherboard
	Use: System Memory
	Error Correction Type: None
	Maximum Capacity: 64 GB
	Error Information Handle: Not Provided
	Number Of Devices: 2

Handle 0x000A, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0009
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 16 GB
	Form Factor: SODIMM
	Set: None
	Locator: Channel-0-DIMM-0
	Bank Locator: BANK 0
	Type: DDR4
	Type Detail: Unknown Synchronous
	Speed: 2133 MT/s
	Manufacturer: Corsair
	Serial Number: 00000000
	Asset Tag: Channel-0-DIMM-0-AssetTag
	Part Number: CMSO16GX4M1A2133C15
	Rank: 2
	Configured Memory Speed: 2133 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

@drs 2133 MT/s slower bus 

not big difference, however @pini can you show your dmidecode?


mine

Handle 0x000A, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0009
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: SODIMM
	Set: None
	Locator: Channel-0-DIMM-0
	Bank Locator: BANK 0
	Type: DDR4
	Type Detail: Unknown Synchronous
	Speed: 2667 MT/s
	Manufacturer: Samsung
	Serial Number: 018801e0
	Asset Tag: Channel-0-DIMM-0-AssetTag
	Part Number: M471A4G43MB1-CTD
	Rank: 2
	Configured Memory Speed: 2667 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

Handle 0x000B, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0009
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: SODIMM
	Set: None
	Locator: Channel-1-DIMM-0
	Bank Locator: BANK 0
	Type: DDR4
	Type Detail: Unknown Synchronous
	Speed: 2667 MT/s
	Manufacturer: Samsung
	Serial Number: 01982098
	Asset Tag: Channel-1-DIMM-0-AssetTag
	Part Number: M471A4G43MB1-CTD
	Rank: 2
	Configured Memory Speed: 2667 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

Here it is (one RAM stick ATM):

Handle 0x000A, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0009
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 16 GB
	Form Factor: SODIMM
	Set: None
	Locator: Channel-1-DIMM-0
	Bank Locator: BANK 0
	Type: DDR4
	Type Detail: Unknown Synchronous
	Speed: 2667 MT/s
	Manufacturer: Crucial
	Serial Number: e5e0a9f4
	Asset Tag: Channel-1-DIMM-0-AssetTag
	Part Number: CT16G4SFRA266.M16FRS
	Rank: 2
	Configured Memory Speed: 2667 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

i am searching because i remember @nicole.faerber posted once, how to switch cpu to lower Watt mode, it will decarase powerconsumption, as it lowers some frequencies on board.
when i find it i will post it, i will as one of affected users, to switch his cpu to lower power mode (put power Cap) and test stability of the system.
i know it’s not an solution, but i am interested if instabilities are not connected with power regulator. (there are many of them on board.)

everyone affected may you try

echo 10000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
echo 15000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw

and then do your test run???
that will reduce powerusage on cpu -5W
defaults are

echo 15000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
echo 20000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw

I don’t know if that helps, but here are the specs of my RAM setup 2x32GB.

dmidecode 3.3

Getting SMBIOS data from sysfs.
SMBIOS 3.0 present.

Handle 0x0009, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 64 GB
Error Information Handle: Not Provided
Number Of Devices: 2

Handle 0x000A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0009
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: SODIMM
Set: None
Locator: Channel-0-DIMM-0
Bank Locator: BANK 0
Type: DDR4
Type Detail: Unknown Synchronous
Speed: 2667 MT/s
Manufacturer: Samsung
Serial Number: 019820d8
Asset Tag: Channel-0-DIMM-0-AssetTag
Part Number: M471A4G43MB1-CTD
Rank: 2
Configured Memory Speed: 2667 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V

Handle 0x000B, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0009
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: SODIMM
Set: None
Locator: Channel-1-DIMM-0
Bank Locator: BANK 0
Type: DDR4
Type Detail: Unknown Synchronous
Speed: 2667 MT/s
Manufacturer: Samsung
Serial Number: 0198202a
Asset Tag: Channel-1-DIMM-0-AssetTag
Part Number: M471A4G43MB1-CTD
Rank: 2
Configured Memory Speed: 2667 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V

OK so it’s not memory model related.

ok @gam your setup is closest to mine.
so let’s try to find pattern.
describe: what exactly you are doing when laptop crash.
what programs you are running what is system load, what kind of external hardware you are using.
how your crash looks like? try to be as precise as possible.
because i will try to play same/as similar as possible scenario, and make my laptop crash in order to find a patern


So, it has been I while since my last comment on this thread. Here is my current status.

Again, a big thank you to @NineX for his commitment to help us all with this issue.
What I can say so far is that reducing the power usage of the cpu by -5W help to avoid crashes.

echo 10000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
echo 15000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw

For example, I watched two Youtube Videos and did some programming on the side.
My L14 did no crash, when the battery level dropped under the 20% mark. But still around 5% it the laptop switched off. Better than before, but not really satisfying.

So, I gave a shot to PopOS 21.04. With similar stress tests the L14 works great. Furthermore, when you reach a critical battery level, the OS starts to notify you that you should consider to plug in the power supply. But no hard sudden crash.

So what I can say is, that somehow the power management of the latest PureOS seems to have some issues to handle low battery levels. PopOS is an alternative but may not be a solution for people who want stay on PureOS.

I have to admit that I really like PureOS, but as long as this battery issue is not solved, I will have to use PopOS.

Thats all folks. Thank you for all the support so far.

hard switch off around 5% it’s not an “Crash”
it’s normal behavior of every system that i know.
To be more precise: if you switch off couple settings in windows or mac, you will get same results on any hardware.

So please to not mix 2 things.

We had crashes , like documented above: those with graphical “fireworks” or total freeze, reboots

With those i am interested, because only thing i was able find , reproduce, and help to fix, were memory corruption in PureBOOT , and this was pureboot specific.
i not managed to crash Coreboot/Seabios (regular bios) , Nor reproduce any crashes reported.
And that is actually case i am seeking, to solve.

and we have system just switch off without warning on battery critical.
that case is just matter of tuning settings, can be done by individual up to own Preferences. (i agree current defaults are not the best) , however we can’t protect user from it’s own actios. (battery indicator with pureos go red around 10-15%[i do not sure 110%] and system shows notification about battery going low
)

Unfortunately these settings are not available under Qubes. Any way to use intel-rapl power caps under Qubes?

1 Like

Then I guess this rules out Pureboot or EC having issues with battery management, doesn’t it?

Yes, it looks like. Though, my L14 runs on Seaboot and the latest EC.

I don’t know. I had the same issues yesterday. It always crashed after around 10-20 minutes on battery, although the battery was over 20%. Than I did a PureBoot and EC Firmeware update yesterday evening and today it ran on battery until it was under 10% completely without any crashes


But I also only noticed the issue after upgrading the RAM, unfortunately I haven’t worried about it at first and haven’t used the Librem 14 much the last few weeks after changing the RAM. So I will see if it really works reliably right now.

It happened again today at 18% battery. But it seams to be better than before the update.

Hey I am just posting here to let others know that this problem seems completely resolved by the most recent EC updates. I hadn’t used my L14 for a long while because of my annoyance with this issue, but it looks like Purism pulled through and fixed things through the EC update process.

The laptop had previously been crashing completely, graphical fireworks or freezing with looped audio at battery levels anywhere between 30%-70%, effectively making it unusable. Since I updated the EC I have been testing for two days and none of these problems have happened. The battery runs all the way down to 5%. I’m not sure if it still does a hard shutdown at low levels because I plugged it back in (I was in the middle of something).

I’ll do some more testing over the coming days and if any issues persist I will post another comment here. Thanks to Purism for getting this update out. I just wish the laptops had been more thoroughly tested before shipping to avoid these issues.

Thank you very much for this valuable information. I think this information is important for new customers, who buy a L14.

Because of this hard shutdowns I switched to PopOS, even though I was happy in general with PureOS.

Yes, the end of October EC/PureBoot update now seems to have fixed it for me too (at least so that I can live with it). It now was reliable until around 10% every time. But then still crashes somewhere between 5 and 10%. So if I set the OS to shut down the laptop or go to sleep at 5%, it still crashes before that.

Same with my L14. Around 10% or less it might crash
 Hope, the net EC update will fix this issue.

There was an EC update release last week (v1.6). I’ve flashed it and sadly it brought the problem back as it was before
 At around 50% I can reliably crash the Librem 14 with some CPU load. Something like stress --cpu 2 --timeout 30 using only two threads works pretty reliable
 With a power supply connected it doesn’t crash. Weirdly also the fan was working a lot less (during the stress command) when plugged in compared to the three times I’ve tried on battery. Even at full CPU utilization when plugged in. @nicole.faerber it seams like you’re working on the EC firmware. Can you have a look at this problem? And is it safe for me to re-flash the old firmware (v1.5) over the new one?

Oh dear
 so we reverted a change in the EC firmware in this new release that was done in the previous release, more exactly the setting of PL4 of the Intel CPU based on the charger state.

Let’s take a step back for a second. A laptop is a pretty complex system when looking at it from a power consumption standpoint. There are a number of components in that can draw various amounts of power at different times and depending on their use - the main CPU is of course one (15W TDP), the DDR4 RAM, SSD(s), the LCD backlight, WiFi/BT, SD card reader etc. etc. And then there are the external USB port - we have four, two type-A and two type-C ports which can draw significant power if populated. The total power consumption of the device can thus vary a lot.

Coming back to the issue what we tried to do is to set the Intel CPUs PL4 to some sane value when in battery only mode in order not to overload the battery. The battery has a limited power budget which is lower than the total maximum power that can be consumed by the system. In order not to overload the battery we clamped down the Intel CPU’s PL4 to 20W and allowed it to go up to a much higher value when on AC.

Since about the same time we introduced this change we observe two things: 1. the sudden power down seems to be gone (and now reappearing) but worse 2. we also saw quite a number of dying main boards, you probably have read about this too already. In the broken main board in most cases the charging does not work anymore. The boards will still work from battery but will not charge anymore. In the path of the charging current is a main power switching IC that has to regulate the current from the DC charger input to the whole system, i.e. all power supplies plus charging the battery.

These currents are hard to measure, you would have to break up traces etc. to measure individual current flows. My current work hypothesis is that we did set the max power when connected to charger too high and overloaded this switch when the system was under higher load (plus charging) and thus caused the dying main boards.

With reverting this change we may now have some default PL4 (which is badly documented in the Intel docs) which is higher than the battery case we set it to in the previous version and thus the sudden power off happens again. But I hope that the dying main boards do not happen anymore! It may sound a bit crude but for now I prefer system to shut down rather than causing permanent hardware damage.

If during the next couple of weeks we do not see the dying mainboard problems happening anymore we will need to revisit the PL4 settings or other means again.

In the meantime you could also try to limit the CPU power consumption while using it from battery using the PL1 and PL2 settings within Linux:

/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw

These two values are for the long term (0 = PL1) and short term (1 = PL2) power budget of the Intel CPU, in ”W. The TDP of the 10710U is 15W, PL2 can be something like PL2= PL1 + (PL1 / 3)

With that you can limit the Intel CPUs short and long term average power consumption. In the L14 the default PL1 is 15W and PL2 is 20W, bringing these down by a few Watt should help already, e.g. 10/15W:

echo 10000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
echo 15000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw

Please accept my apologies for this inconvenience, we are working on it and will implement better / safe defaults as soon as we have narrowed down the reason for the dying main boards, which right now is my primary concern.

Cheers
nicole

5 Likes

Sorry for the long delay. But I’ve read your update that time and really appreciated the amount of background information and technical details! This is something you won’t ever get from other companies and to me makes it ok to wait a little, until everything works completely.
I haven’t had time to test the manual configuration back than (needed the laptop to just work and used an USB-C power bank). And then you published v1.7 which resolved the issues, so there was no need anymore :slight_smile:

EC Version v1.7 all in all worked pretty flawlessly.

Unfortunately last weekend I’ve updated to v1.9 (and PureBoot 21) which again brought some problems. The laptop now doesn’t charge anymore while it’s turned on (over USB-C, I haven’t tried using the DC jack, but expect it to be the same). The LED turns green when the charger is connected and it doesn’t lose any battery charge, but it won’t recharge (or only around 1% per hour). But when I turn the laptop off and reconnect the charger, it will charge.

Also I had one sudden power off so far with 1.9, but this time it was while it was connected to my USB-C to DP docking station and thus also to power (it was also connected to a USB-A HUB + 5GbE adapter and keyboard and mouse, also HDMI was connected).
I haven’t experienced this with 1.7. But unlike with 1.6 I couldn’t reproduce this so far.

It could also have something to do with my USB equipment. I’ve experienced before, that the USB-C 5GbE adapter led to a sudden power off or system freeze (also one of my USB-C Hubs did). I thought it was a driver/software problem (or a hardware problem of the hub). But maybe it was also related to power management. My PC (running the same Debian version) haven’t had such a problem with these adapters, but also was a bit picky on how the 5GbE adapter was connected and so far using a different hub and the USB-A Port instead of USB-C Port of the laptop helped mitigate these problems.