Alternative CPU for a Librem Product

Was just recently made aware of this CPU.

Judging from the benchmarks, this is quite the capable ARM chip: https://www.phoronix.com/scan.php?page=news_item&px=SolidRun-ClearFog-ARM-ITX

It claims to be for embedded/networking, but so much speed makes it suitable for a small workstation.

I kind of wish the Librem Mini has the LX2160. Maybe a future product?

Workstation sounds a bit optimistic.

Sure it has many cores, but on the single core tests it does not seem to be much faster than an Intel Atom (PyBench 2x faster and WAV to FLAC encoding it’s equivalent).

Workstation use-cases benefit from many cores. I can’t think of any such situation where you’d be forced to use just one core.

The thing that stands out to me about this CPU is the fact that it uses A72s rather than the slower A53s found in other ARM workstations that I’ve seen.

I’d really love to get away from Intel some day. But it always boils down to two things:

  1. GPU with free drivers and no firmware blobs needed - this ARM CPU is a nice compute horse but has no GPU so you would need to add a PCIe graphics card and there choices for GPUs without proprietary drivers and/or firmware blobs are close to zero - and it add to:
  2. price - the HoneyComb LX2K board alone costs $750 which gives you a pretty good idea what the hw cost is - plus case, power supply etc. etc. this will become pretty expensive quickly, more than most people would be willing to pay for such a type of Mini workstation

Apple shows that it is possible to make workstation class ARM based CPUs which are at least up to par or even better than comparable Intel CPUs. But sadly this kind of CPU is not so easily available on the free market for companies like us to use. And the few that are in a similar ballpark region concerning CPU and GPU performance are not usable for us due to tons of proprietary bits and pieces - which will BTW also apply to the Apple ARM CPUs, which Apple of course does not care about, but we do.

Cheers
nicole

7 Likes

There also are OpenPower for cpu, that have more power, but for the gpu there is the same issue.

@nicole.faerber What about vivante gpu? doesn’t run free firmware?

1 Like

Not all workloads are parallelizable, for example when compiling big projects with lots of static linking.

1 Like

In that context, as one of the commenters says, it would be nice to see all those benchmark charts include at least one somewhat current x86 CPU (e.g. the i7-8565U of the actual Librem Mini, or e.g. the i7-10710U of the newly announced Librem 14). That would put things in much better perspective.

Regarding the number of cores … when looking at CPU chip grunt, I always look both at single-threaded performance and aggregate multi-threaded performance. Which one is most relevant depends on the intended nature of and mix of workload.

As an example of a poor workload for multi-threads … any iterative compute-intensive process where the output of one iteration is the input to the next iteration - such as occurs in many hashing or encryption scenarios. In fact, they are sometimes designed to be hard to parallelize. In that case you just want one core or a small number of cores that run like the clappers.

In a closed source environment, you can even be hosed with multi-thread hardware with an algorithm that parallelizes well, if the software happens not to have been written to take advantage of multiple cores or all your cores. That doesn’t generally apply here though.

1 Like

As I understand it, gaming is pretty much single-core, though that’s not the primary purpose of librem computers.

Modern AAA games are very rarely single core, but a few years ago that was the case.

With modern games the performance scales badly after around 4-6 threads and can sometimes even hurt the performance if the scheduler does not pin the threads to the same core when context switching.
Usually the max frequency also isn’t as high on CPUs with more cores because of either heat or because it’s harder to manufacture quality chips with larger die sizes.

1 Like

Yes it does and that’s what we are using in the Librem5. But AFAIK there is no Vivante GPU as external GPU for PC alike systems so this does not help much. And conversely there is no PC like CPU using the Vivante - except for the i.MX8 family of course but that’s not what I would call a “PC like CPU”, it is an embedded SOC. It is pretty powerful for what we need and want to do with the Librem5 but compared to the Core i7 in the Librem laptops it clearly lags behind. So while yes sure, you build a PC like system like a laptop computer with it but what is the point then? The only point I would see is that you get rid of more Intel legacy cruft and get more control over BIOS/firmware. But concerning performance it would clearly be a significant regression.

Cheers
nicole

1 Like

@nicole.faerber
Could be an idea to join the forces with raptorcs, system76 and someone else to make a discrete vivante pcie?

@johan-bjareholt @kieran This comment on r/linuxhardware is the first place I read about it. The poster says it compiles FreeBSD in nearly half the time of their i7. I don’t know what gen i7, but still impressive.

This blog post by an NXP employee discusses how people are using this CPU for general purpose computing.

Not to deviate too far off topic, but I saw it come up in ryzen vs Intel comparisons. Intel keeps up in gaming because it does better in single threaded processing, but perhaps my reading was too cursory.

The Vivante GPU can use the free Etnaviv driver.

On the question of firmware, Vivante doesn’t publicly release info on its GPUs or the software for them. However, in the latest i.MX Linux Release, the description of the firmware-imx-8.8.bin file says “i.MX Firmware including firmware for VPU, DDR, EPDC, HDMI, and SDMA” so it looks like the Vivante GPU doesn’t require firmware (or at least doesn’t require updates).

You will need a lot more demand than that to convince Vivante to make a discrete GPU and I’m not sure what you are gaining by doing this. We have plenty of GPU choices with free drivers (Vivante, Mali, VideoCore, Adreno, Intel, AMD). Vivante, Mali and Adreno). Vivante isn’t the best choice anyway, since since its free driver doesn’t support OpenGL ES 3.0+ and Vulkan. Go with VideoCore, Intel or AMD if you want that.

It does not and that’s why we are using it or rather why we chose the iMX8 in the first place - along with the availability of free drivers.

This is close to impossible. The Vivante is not a chip, it is a building block for chip designers like NXP. So you would not just have to create a PCIe card you would have to license this IP and create a piece of silicon with it on first. Unlikely to happen even if joining forces with someone since such an enterprise goes into the several million US$.

Cheers
nicole

1 Like

@nicole.faerber
Thank you for the explanation, if no one else will do an open gpu i think we should deal with it, i know raptor will “isolate” the system access from the pcie lane with IOMMU i’m not so technical but as i understand the only way we have is to “jail” as much as possible the untrusted gpu.
For 2d gpu they are using the as2600 i don’t know how it perform for office, browsing and play fullhd content, but if does it’s job it could worth looking for it, what do you think about this 2 ideas (isolating the closed gpu and the as2600)?

It would be helpful to find out i.e. to ask the person who made the comment.

It has been far too long since I compiled any C - so take with a grain of salt - but I think it is hard to generalize about compiling a large project because it depends on

  • the sophistication of the build environment i.e. its ability to detect opportunities for parallelism, to use those opportunities and to adjust (limit) itself to the available CPU resources
  • there’s a fair bit of disk I/O going on to muddy the picture for a straight CPU speed comparison (maybe the poster has RAID-0 dual PCIe 3.0x4 NVMe disks, but you don’t :slight_smile: ) - and amount of RAM available to cache disk affects things too
  • the specific dependencies between modules make it hard to generalize across projects i.e. a flatter structure of layers will parallelize better than a deeper structure of layers

I understand that the poster was basically compiling the same thing, which could eliminate the first and third bullet points - but then you get complications about whether the compiler on x86 is performing more substantial optimization (which makes the resulting code better but makes the compile take longer).

That’s why I went for a more vanilla example - a CPU-intensive iterative algorithm

See also Raspberry pi software information
(NB: That’s running something that actually would parallelize well but I believe has not been written to do that.) Any followup comments on that topic are probably best in that topic.