Problems with memory upgrade: Librem 14 won't boot

stefann · November 7, 2024, 7:38pm

Up until now I used only one RAM module with 8 GB, it was a Samsung SO-DIMM 8GB module, DDR4-3200, CL22 (product number M471A1G44AB0-CWE).

Now I decided to upgrade my Librem’s memory, so I took a look at the Intel Specs for the i7-10710U and I knew: I’ll by a kit with two 16 GB DDR4-2666 modules. So I bought a Kingston FURY Impact SO-DIMM Kit 32GB, DDR4-2666, CL16-18-18 (product number HX426S16IB2K2/32).

I put in the new modules and started my Librem 14, but it wouldn’t boot. The LEDs are on, but the display stays black. Now when I put only one of the new modules (no matter which) in I have no problem. And when I pick one of the new modules and put it in together with my old module – no problem…

I tried every combination and the only combination which doesn’t work is when I put in both modules of my new memory kit. Am I the only one with this problem? Can someone give me a hint what I could do to solve this? Thanks!

I use PureBoot-Release-30 and Librem EC 1.13_2023-03-22.

Edit 2024-11-19: It’s quite embarrassing but after more testing I can say that my testing results stated above are wrong… I don’t know if I was sloppy first, but I guess so. Sorry. The only way I can use the new RAM is when I use one of the new modules together with my old one. This always works, no matter which of the modules is in Slot 0 / Slot 2. There’s no other way I can use the new modules. It doesn’t work with one of those modules alone (no matter which slot I use) and it doesn’t work with both (no matter which module is inserted in Slot 0 / Slot 2).

Privacy2 · November 7, 2024, 8:33pm

It sounds like there is a problem with those two sticks operating in dual channel mode. It could be an issue with the EC or Coreboot or even the sticks themselves (one not quite meeting the timing specs they advertise during dual channel access). I would contact support.

FranklyFlawless · November 7, 2024, 10:56pm

Yes.

Try swapping the locations of each RAM module.

irvinewade · November 7, 2024, 11:37pm

More comprehensive fault isolation would say to try those two modules together in a different computer. However I understand that you may not have another computer or, if you do, another suitable computer.

Theoretically sudo dmidecode --type memory | grep Channel would possibly diagnose whether dual channel is working. However obviously that only works if you can boot! Have you tried any Live Boots? Have you tried a Live Boot that gives you a memory test option?

I guess it depends on how much you want to persevere with this but have you tried changing (reflash firmware) to BIOS?

And are the two existing firmware versions the latest?

FranklyFlawless · November 7, 2024, 11:47pm

Both @stefann’s PureBoot and EC firmware are the latest version, and I have very similar RAM within my Librem 14 as well, except that mine functions just fine.

stefann · November 8, 2024, 9:16am

Thanks for your replies!

That’s what I thought and why I asked if I’m the only one affected by this. Because I’m quite sure there are some people out there using two modules together in dual channel mode. So if this was a issue with the EC or Coreboot I shouldn’t be alone in this… I’ll contact support later, maybe it sounds familiar to them.

I did that. But no luck

Yes, that makes sense, but the only other computers I have left are my ROCK Pi 4 and my Pinebook Pro… So no suitable ones

Normally I see the “Librem 14” lettering first, long before boot devices come into play… The screen stays black and I wouldn’t even be able to enter PureBoot menu. However, for the sake of completeness I’ll try a live system later.

Oh, that’s an idea. I’ll try it with coreboot/SeaBIOS. Thanks!

FranklyFlawless · November 8, 2024, 11:18am

For reference, here is my post regarding my Librem 14’s RAM:

Here is a specification sheet from Kingston that is more accurate then the one within my referenced post:

stefann · November 8, 2024, 12:22pm

Thanks, one difference I read about on an Austrian price comparison website (in German) is that your modules are “dual rank” and mine are “single rank”… Apart from that (and the amount of memory) they seem to be quite similar…

I tried that without any luck, same problem. I’ll contact support.

stefann · November 19, 2024, 10:56am

I added an Edit-paragraph to my original post because my testing results stated there where simply wrong. Apart from that I bought another memory kit – and I get the same results… The part number of the new memory kit is HX426S15IB2K2/32.

For the new memory I used Memtest86+ to look for errors. But the memory seems to be OK (one screenshot for each of the new modules together with the old module):

(Memtest86+ shows a different part number for my new memory modules than what is printed on them, these KHX2666C15S4/16G are my HX426S15IB2K2 modules)

I wanted to find a way to see what happens (apart from LEDs on and display off), so I had a look at the coreboot cbmem tool (which I found on the website of MrChromebox) to get some logs. I put in the new RAM, tried to boot, then turned the Librem 14 off, put in the old memory module, booted and then used the command cbmem -2 to “print cbmem console for the boot that came before the last one only” – but it was empty…

I didn’t get an answer from support yet, maybe someone here can give me a hint what I could try next.

Privacy2 · November 19, 2024, 5:30pm

I think that you will need to go to support. The only thing I noticed is that the new sticks had XMP support while the old ones didn’t. The i7-10710U supports XMP so that shouldn’t be a problem, but it’s possible that coreboot is having a problem reading/following the XMP profiles which it might only try to do if there isn’t a non-XMP stick slotted. At least that would explain (as you’ve now clarified) that it won’t boot with a single new stick, double new stick … but boots only when a new stick is paired with an old (which is non-XMP) stick.

If I’ve read things correctly, Coreboot should work with XMP sticks … but I’m not sure how well it is supported.

stefann · November 19, 2024, 8:59pm

Thank you for your help. I thought about XMP support before but FranklyFlawless posted his/her RAM specs above and it supports Intel XMP 2.0 too…

I was able to get a coreboot log file, I had forgotten to add the -V option to get verbose (debugging) output… But it’s strange. It stops before memory initialization as far as I understand it: cbmem-1.log is cbmem console for last (successful) boot (with my old memory module) only and cbmem-2.log is cbmem console for the boot that came before the last one only (with my new memory modules).

Yes, I’m just waiting for them. And I sent them a link to this thread

FranklyFlawless · November 21, 2024, 1:34am

@jonathon.hall

TiX0 · November 21, 2024, 2:23am

I am not entirely sure about this, but:

The cbmem buffer is constructed in RAM, hence when you turned off the machine to change DRAM, it is erased (volatile) This is the reason you can only see what was logged during the last boot. I never could see a former log but the last one still in memory. For me, the command cbmem -2 never worked and the result is ./cbmem: invalid option -- '2'
On the other hand ./cbmem -1 gives me the log for the current coreboot run, which is still there in the cbmem buffer.

jonathon.hall · November 21, 2024, 1:52pm

That’s a good idea, I’m impressed you got that far But as others said, yes, these logs are held in RAM. You’d only see a “prior boot” if you had done a warm reboot or a suspend/resume cycle.

I’ve made a build that will write the console to SPI flash, so you can collect it. Just to set expectations, memory training is performed by FSP, and we don’t have access to debug builds of FSP from Intel, so we may have limited ability to address this. I’d still like to try though.

Could you please flash this preview build of coreboot/SeaBIOS?

mkdir ~/updates
cd ~/updates
wget https://source.puri.sm/firmware/utility/-/raw/24.02.01-Purism-1-flashconsole-1/coreboot_util.sh
sudo bash coreboot_util.sh
# Choose 'update' if you have coreboot/SeaBIOS already, or switch firmware otherwise
# When it asks to reboot, say 'n' and shut down instead, then follow the instructions below

After flashing that build and shutting down, do the following:

Install the non-working memory
Try to boot once, wait 2 minutes for memory training to complete
If the system fails to boot after 2 minutes (as before), hold the power button to shut off
Install any working memory
Boot your OS and open a terminal
Collect the log using flashrom that was prepared by the coreboot_util.sh script (if you did not put the script in ~/updates, adjust the path)
- sudo ~/updates/tools/flashrom/flashrom -p internal -r dump.rom
Send me dump.rom via email or DM and I will check out the log

The SPI flash log has limited space, so be sure to do this test soon after flashing the preview firmware. After that, you can return to the release firmware or stay on the preview firmware and wait for me to check out the logs. There’s no harm staying on the build with SPI flash logging, it stops logging once the log area is full to ensure it does not wear out flash.

stefann · November 21, 2024, 8:58pm

Oh, thanks for the info!

I tried again now to (re)boot two times with my old memory and cbmem option -2 seems to work. I’m using cbmem from MrChromebox’s website (link see above), that’s version 1.1.

stefann · November 21, 2024, 9:52pm

Great, thank you I sent you a private message with the dump file.

I tried cbfstool (part of Debian package coreboot-utils) myself to get the console log out of that dump file. I have no experience with all this but especially this one line looks quite discouraging: FspMemoryInit returned with error 0x80000007!…

TiX0 · November 22, 2024, 4:34am

That’s bad news.
FSP-M (DRAM) and FSP-S (Silicon init) are closed source blobs and undocumented - so we will probably never know what this error returned by FspMemoryInit means…
But I remember that a former employee (pseudo Kakaroto) had started reverse-engineering this particular blob and had succeeded in posting the main and another key routine. Positive Technologies (now banned) had done some work on this too. This was - at the time - stopped by Intel under the threat of NDA infringement…everybody got scared to be sued and this work never went any further.
The PT blog post showing the code does not exist anymore but Purism had also posted this work so maybe it can be found and there could be some valuable information.

TiX0 · November 23, 2024, 2:47am

More on this after doing some research:
I have found the above mentioned Purism blog post by Kakaroto

The code has been removed from the post.
So has the attribution.
The last update from this post states that:

2018-05-10 UPDATE: Intel politely asked Purism to remove this document which Intel believes may conflict with a licensing term. Since this post was informational only and has no impact on the future goals of Purism, we have complied. If you would like the repository link of the Intel FSP provided from Intel, please visit their publicly available code on the subject.

BUT:
I have finally found the original work posted on Kakaroto’s Blog
https://kakaroto.ca/2019/10/intel-fsp-reverse-engineering-finding-the-real-entry-point/
it has an interesting disclaimer explaining why the article was put back up again.
Anyway, it’s a very difficult technical article that you have to be a confirmed system software engineer in order to fully understand.
But it tells us at least this: things you would like to make disappear from the public Internet…in reality are never quite entirely gone!

FranklyFlawless · November 23, 2024, 7:45am

Wayback Machine (April 3rd, 2018):

Intel FSP reverse engineering: finding the real entry point! – Purism

irvinewade · November 24, 2024, 12:56am

And even if it disappears from the public web, that doesn’t mean that it disappears. Any number of people may have downloaded a copy before it disappeared from the web (if it had done so).

Coming back to the specific problem … just because Intel may be the fascist gatekeeper doesn’t mean that they would definitely refuse to explain what that error code means and what can be done about it. In fact, being the gatekeeper puts that responsibility solely on them.

The public web produces some hits on that error.