Dogwood Shipping Out Today

Customers who have chosen to receive Dogwood should have their Dogwood shipments go out today. But wait, you might say, didn’t you post that Dogwood was already “finalizing testing before shipping to those who are part of this batch” a few weeks ago?

It’s true, at the time of that post we had started our final burn-in testing and started contacting Dogwood backers to confirm their current shipping address and also to give them an opportunity to switch to Evergreen if they wanted. It took a week or more of back-and-forth emails until we started getting responses settled.

In parallel with contacting customers we prepared each Dogwood device and performed a 96 hour burn-in test to identify any problematic hardware before we would ship it out. During this burn in we started to discover that devices were hard shutting down out of the blue somewhere around the 5-6 hour mark on average (although some lasted longer than a day before this happened). We then started an incredibly extensive set of testing across a fleet of Dogwood devices to troubleshoot the issue.

This has proven to be a really tricky issue to troubleshoot, first because of the long debug cycle (to test whether a fix works, you have to demonstrate a device can run for multiple days w/o a fault). Over the past few weeks we’ve gone down a number of false paths in the troubleshooting process and each time we thought we had tracked down a cause, further tests disproved it.

Up to this point we are still researching whether this is an actual hardware fault and if so, if it is something that can be corrected/mitigated in software, of if it’s a software configuration issue, such as if it’s being triggered by incorrect voltage or clock settings that need to be changed for the new Dogwood hardware. As of the time of this post, while we have tightened the debug cycle in terms of triggering the bug, and conversely have found settings and tweaks that make the hardware more stable, we haven’t yet tracked down the root cause or fix.

As I mentioned, in parallel we had been getting responses back from customers who still wanted Dogwood. Since we have not yet tracked down the cause of this hardware issue, we didn’t want Dogwood customers to be surprised by the flaw. We went through that list and contacted them again to inform them of this bug and advised them that:

If you are a developer wanting the Librem 5 to hack on, test, write software, submit an App to PureOS, these issues are likely not going to impact you too significantly.

If you are desiring Dogwood to be mass-production quality like Evergreen and be your daily driver without hardware issues, this will impact you significantly.

Please let us know which you prefer:

A. Ship Dogwood with known issues now!
B. Move your order to the front-of the line for Evergreen.

So after another round of back-and-forth emails last week into the weekend, we now have a list of customers who still want Dogwood and are shipping those out today.

As someone who does use a Dogwood device as a daily driver, for me personally I tend to mostly hit this flaw when I wake up in the morning and pick up the device to listen to podcasts. Sometimes the shutdown has happened overnight and sometimes not. For the most part in my case, at least, I tend to use my own phone throughout the day without hitting the flaw. So at least in my case, it’s been annoying but not a showstopper.

Obviously we are working incredibly hard to get to the bottom of the issue and this issue will not be present in Evergreen.

40 Likes

Thanks for the update! I look forward to hearing from those who chose to recieve a Dogwood device given this news. Is there somewhere I can find more information on what is currently known about the issue? Personally, I’d like to hear more about the settings and tweaks that make the hardware more stable in addtion to what has been found to trigger the bug more quickly.

6 Likes

I completely understand where you are coming from. The problem with posting this (and the reason we haven’t been giving a blow-by-blow of each day’s troubleshooting progress) is that often the advice we’d give one day (when we think we’ve found the problem) is outdated the next day when we discover it’s not the solution or it has a bad side-effect of its own. Even though I have a Dogwood device and have been following the troubleshooting process, I haven’t been applying any of the tweaks that have been suggested yet myself, because so far a day or two later we’ve discovered (after a system using them displays the bug) the problem lies elsewhere.

Ultimately if we find a solution that can be applied in software, it’ll be something ideally people could apply just by updating the software on the device.

16 Likes

Good luck finding the root cause of that problem!

4 Likes

Thanks for the correctness and sharing of the news that are always welcome, even if it is a problem that, I’m sure, you will solve.
Good job.

13 Likes

I see. Thanks again for providing us with this update.

Ultimately if we find a solution that can be applied in software, it’ll be something ideally people could apply just by updating the software on the device.

I’m interested to see what the fix ends up being for this issue. Like you said, hopefully it is merely a software issue.

3 Likes

YEEEEEET , one down evergreen left to go before my Fir batch ! Albeit probably sometime a year or so from now but .

2 Likes

That maybe challenging but it’s the wrong instinct not to tell about them. Those are exactly the kind of things that should be shared. It’s only about how to do that. A weekly rundown, for instance, and it could be less than a full blog text.

“These 6 things are happening and at first it seemed it could be related to this because of X. So we tried Y but it didn’t really do what we expected and now there is also D. Possibly we need to test Z with it to rule out F. There are about 345 variations left on this but this still seemed likely… And after testing all those combinations we only know that X2Z15 gives slightly better emotional connection with the device and the OS has 0.0001% more giraffes. Next we took a look at K and Q, which felt hard, because I’ve never liked those. Luckily person T wanted to spend a couple of days going over the 50000 lines of code related to it, just to learn it - or avoid doing anything else. Meanwhile person B was running 10 different patches for a day while simultaneously streaming SD Comic Con and GUADEC to give networking a heavy load. Apparently that may have enhanced tester morale a bit but effect may have been temporary. All in all this week…”

I see no difference in this to the previous progression reporting other than it should now be weekly in stead of monthly (which it hasn’t even been that for a while - more than a month late), now that deadlines have gotten closer and been skipped due to problems. It’s not all about the results. It’s nice to have your report and Guido’s blog post - both are great - but over all this part could be done better. With challenges, more communication is needed, not less. I hope to hear more - even if I won’t understand everything.

5 Likes

This is not a new debate in the development of the Librem 5. The counterargument is that every minute spent misinforming us about the latest red herring is a minute not spent finding the true source of the crash problem.

I would be happy with a weekly “The crash problem is not yet solved” until eventually “The crash problem is believed to be solved”.

5 Likes

And that’s the wrong point argued. It’s about how it’s done and “every minute” is wrong, illogical extreme. That’s nothing but an excuse - or a red herring. Hence “weekly” and not even a blog post needed. I agree, this has been talked, but since it repeats, I don’t see it unnecessary to bring it up again - perhaps one day…

2 Likes

I guess that means that you’re… dogfooding Dogwood? (sorry)

(not sorry at all)

8 Likes

what kind of burn-in test did you put those dogwoods through in the first place to observe the problem ?
what kind of “extensive testing” are you refering to ?
is that detectable by using a current L5 image in the QEMU ?

2 Likes

It doesn’t sound like a software problem to me - those should be relatively easy to debug. I would guess there’s some hardware part overheating or something like that. Would remove any removeable parts (wifi, modem) and test again. But I think it’s something the Purism employees have already done :sweat_smile: But the fact that it is perhaps (to my mind) a hardware thing, doesn’t mean it isn’t fixable with software. Maybe something is not running within the specification because of some initial driver or firmware.

4 Likes

Since we are throwing ideas out there, I would run the same test software on Chestnut batch in order to see whether this might be a regression.

7 Likes

Regular updates concerning one confusing and perplexing bug equates to a “string of failures” and “why can’t you figure this out? Its your job!” from the average customer. I can’t imagine the type of press that would come from a week or two of “this didn’t work, trying that…” reports.

I’m currently in the exact same boat. I am facing program builds that keep getting hung in the exact same place of the build process with exactly nothing else in common: build times, build machines, code commits, build machine OSes (either windows or centos), time of day, phase of moon… the only consistency is “it hangs at this spot.” I’ve told the customer we’re having build issues, but not the details because A) they wouldn’t understand it because it doesn’t make any damned sense and B) all they’re going to come away with is “these idiots are bumbling around wasting money.” I don’t think, for the Public At Large, it would be any different here. Details are OK and perhaps preferable for one person, but when the crowd/mob mentality gets ahold of them, bad things happen.

9 Likes

I’m in Dogwood batch yet I haven’t got any emails from Purism since June update one.

2 Likes

One thing we are pretty sure about is that the issue is unrelated to heat. Beyond the fact that Dogwood runs much cooler than previous batches, the issue happens even when the phone is cool and idle.

4 Likes

You may want to contact the support team, as I believe we contacted everyone that was tagged as being in Dogwood.

1 Like

I just got your email

2 Likes

I appreciate @Kyle_Rankin keeping us informed with a post on the forum about what is happening, but it would be better for us to just follow what is happening through the public bug tracker. That way Purism employees don’t have to take any extra time to inform us.

Unfortunately, I can’t find any reports about this issue in the public bug tracker, so I would encourage Purism to do more of its development in public. Then, we wouldn’t have a constant string of questions about why Dogwood wasn’t shipping and what is happening.

9 Likes