File names not sorted properly

AlexYst · August 30, 2019, 6:50am

PureBrowser uses it’s own internal file browser for, say, uploads and downloads. However, after you download something, there’s a folder icon that appears next to the download. When you click on that folder icon, it opens the directory that you downloaded the file to in Nautilus, not in its own, built-in file manager.

Also, the internal file manager PureBrowser/Firefox uses has the same obnoxious behaviour, at least on my system.

I’ve managed to track the behaviour to glib now. Glib seems to be unconfigurable in this regard, and most of the applications that use this functionality don’t seem to offer the configuration option themselves. It seems I’ve got to edit and recompile glib to fix the problem. That’s rather obnoxious. It means I can’t simply use the package manager to install updates, but instead compile glib from source every time there’s an update. I’m experimenting on my Debian system to figure out how to get the recompiled glib to install properly (I have it compiled, but applications still used the packaged version for now), and once I figure that out, I’ll try the same on PureOS. What a pain though.

Oh, yeah, it also seems that numerous people have complained about this issue in the past and the GNOME team just don’t care. Several patches to add the option to disable this obnoxious feature have been written and submitted, but they always get rejected because the GNOME team want to enforce this one-size-fits-all solution that doesn’t actually work for everyone’s use cases. Wow.

kieran · August 30, 2019, 8:00am

Surely you would be better off compiling nautilus from source every time there’s an update - less overall system impact?

AlexYst · August 30, 2019, 9:19am

If you know how to fix the problem from the Nautilus source code, that may be a better idea, but I have no idea how to even locate the problem from there. So far, everyone that has offered suggestions in this thread have given vague ideas, such as “edit the Nautilus source”, but when I ask how to even do that, they go silent and don’t offer any help as to how to actually do what they recommend. If you want to be different and actually help, I’d love to edit the Nautilus source to lessen the impact on the system. In particular, this would lessen the chance of issues popping up during package updates.

Also, this feature ranges from mildly annoying in some contexts to outright aggravating in others. There’s not a single application in which this obnoxious sorting algorithm is the way I want things sorted. At least if I edit glib, all applications on the system should start sorting things correctly.

kieran · August 30, 2019, 12:22pm

I’ll try next week to work it out, and if so, post more specific instructions.

SteveC · August 30, 2019, 4:19pm

Oh, yeah, it also seems that numerous people have complained about this issue in the past and the GNOME team just don’t care. Several patches to add the option to disable this obnoxious feature have been written and submitted, but they always get rejected because the GNOME team want to enforce this one-size-fits-all solution that doesn’t actually work for everyone’s use cases. Wow

That sounds like XFCE’s response to complaints from people who find the window borders too skinny to “grab” with the mouse. Just basically: We like it better this way, go pound sand.

AlexYst · August 30, 2019, 7:17pm

I recompiled glib on my Debian machine overnight as an experiment, seeing as I have to wipe that machine soon anyway. It’s running Debian 9, but I recently learned Debian 10 is out, and I always wipe the machine and perform a clean install, so it really doesn’t matter how much I break that system before the update. When I installed the fixed version this morning, everything worked perfectly. Well, almost. Some applications still impose case-insensitivity, but for the most part, everything is now sorted correctly in every application as long as I make sure LC_COLATE is set to C. With help on this forum, I’ve located the PureOS source code repository, so tonight, I’ll get glib recompiled on the PureOS machine, and try installing that tomorrow morning. It should work as well as on the Debian machine, and if it does, the problem will be solved. It’s really stupid that this isn’t a configuration option though, especially considering how many complaints the GNOME team have gotten over this and how many working patches to make this an option have been submitted to them (so they wouldn’t even need to put any effort toward coding this themselves).

*UPDATE*: I had more time than I thought I did, so I got this compiled while I was in the shower. Instead fixing the problem though, the newly-compiled package is causing the packages that depend on glib to crash. I made the exact same change on the PureOS system as I did on the Debian system, so I’m not sure why there are now problems on the PureOS system.

AlexYst · August 31, 2019, 7:52am

I did some stupid stuff despite knowing it was stupid (I installed the Debian version of my edited glib on the PureOS machine), broke the desktop, and ended up needing to reinstall the system in order to get the graphical interface back up and running. Long story short, I’m on a fresh and up-to-date installation of PureOS now. The problem persists though. When I edit glib on PureOS, using the PureOS glib sources, many of the GUI applications start crashing. However, when I do the same thing on the Debian machine using the Debian glib sources, it fixes the problem in Debian’s GNOME’s file-sorting. I’m not sure what I’m doing wrong, or what the difference is between Debian and PureOS in this regard.

Basically, all I do is replace the body of g_utf8_collate_key_for_filename() so it simply returns its first argument. Again, this has exactly the intended effect when I do it on Debian, but instead breaks many of the graphical applications when I do it on PureOS.

Any ideas?

*UPDATE*: Alright, I’ve gotten the problem solved on PureOS. I’m a bit behind in logging though, so I can’t write up a guide to the solution just now. I should have a solution post written up tonight or tomorrow though for anyone else that can’t stand this obnoxious sorting algorithm. It was a pain to solve because I had next to nothing to go off of, but it would actually be exceedingly easy to implement the solved solution if you had a short guide to follow.

*UPDATE*: Sorry, it’s taking longer than I thought. I had some a sudden extra workload to take care of from work, taking more of my time and putting me even further behind here at home. I should have the solution written up nicely by Wednesday though at the latest. If I’m not caught up by then, I’m really in trouble.

kieran · September 3, 2019, 11:09am

You may have to duplicate the string that is the first argument and return the duplicate, rather than returning the first argument.

AlexYst · September 4, 2019, 9:28am

Now that you mention it, I bet you’re exactly correct as to what I was doing wrong. The comments in the file said something about how the return value needed to have its memory freed when you’re done with it. The impression I got was that you should do this to avoid memory leaks. But if the return value is getting freed before the program is done with the argument value, and the two are the same exact string with the same exact memory address, freeing this string would make it so it’s no longer there to be referenced later. I’m not great with low-level stuff, but I think that might be a segmentation fault, right there. Copying the value and returning the copy as you recommended would clear that right up.

I’ll need to look into this more later, and try some experiments. I guess I’ll postpone writing up a solution post until then. I don’t exactly know C, which makes working in C a tad bit difficult, while I’m also dealing with a bunch of other life garbage at the moment, eating up the time I’d otherwise use toward learning how to copy strings in C. For anyone looking for a solution in the mean time, the problem is the g_utf8_collate_key_for_filename() function, which is located in the glib/gunicollate.c file. It’s actually the last function in the file, making it very easy to find. You’ve got to change the output. My current solution copies the body of the g_utf8_collate_key() function in the same file over the body of the g_utf8_collate_key_for_filename(), so they both behave the same way, but returning a copy of the string as Kieran mentioned would be so much more concise and should fix some issues that pop up in regards to having to set the localisation exactly right. With the wrong localisation, my solution doesn’t work, while Kieran’s probably would.

There’s a problem with some non-GNOME applications (mostly image viewers, as far as I can tell) that don’t use the output of g_utf8_collate_key_for_filename() correctly though. g_utf8_collate_key_for_filename() is supposed to take care of the case insensitivity, but some applications that use that function redundantly add their own case insensitivity as well. This means that my solution, as well as probably Kieran’s, doesn’t fully fix the problem in such applications. One thing I’m considering is trying to make the application return double-length strings in which each character of the input string is translated into what is essentially a hexadecimal pair, although instead of ranging from “0” to “f”, the sixteen digits would be represented as “a” through “p” or something. Because every character in the string would be a lower-case letter, nothing that treats non-letters as special and nothing that tries to force case insensitivity on the output would have any effect. Instead, basic Unicode sort-per-bit order would be preserved. I’m going to need a bit more time to work on this though.

kieran · September 4, 2019, 12:04pm

It has been quite some years since I looked at C.

Yes. It could be a use-after-free, often with unpredictable results.

Regarding your other comments, maybe the 80/20 rule applies here.

I don’t exactly like the numeric sorting behaviour. If I had the option, as a preference within nautilus, to turn it off, I probably would (can always turn it on again for the occasions that it is doing the right thing). However nautilus is far and away the most common place that I see this behaviour. So I would be happy if that’s the only place that it was fixed.

In that case, it is enough to replace the calls in nautilus-file.c with something more flexible e.g. a wrapper for g_utf8_collate_key_for_filename() that can either invoke it or override it with a simple strdup() (?).

While looking into this I noticed that a Save As dialog box in LibreOffice behaves the same as nautilus i.e. numeric sorting. So this behaviour is potentially embedded in lots and lots of places - hence the suggestion just to cover the one case that you see “80%” of the time.

AlexYst · September 5, 2019, 12:23am

Like you said though, it doesn’t only appear in Nautilus. It outright aggravates me when I see it in Nautilus, but it also bothers me to a lesser extent when I see it in, as you pointed out that it’s used in, LibreOffice. And it bothers me in PureBrowser. And it bothers me in the image viewers that do it. And it bothers me in other applications that list files.

You mention wanting the option to enable this sorting behaviour in certain circumstances. I am absolutely in favour of making this an option and being able to enable or disable it wherever we please. But for me, that only helps if that ability to enable/disable it is accepted into some project that I’m getting the software from. It it was accepted upstream by the GNOME project - which the GNOME team has made clear isn’t happening - I could disable the feature without editing the source and recompiling. If it was accepted as a distribution-specific patch in either Debian or PureOS, I again wouldn’t need to run a custom version of the software specific to my machine.

However, if I’m editing the source code and running a version custom to me specifically, there is absolutely zero advantage in limiting myself to disabling this only in certain places. It takes a lot more effort to edit it out of each and every package individually, and on my system, this behaviour is unwanted in any place whatsoever. This is never the sorting algorithm I would choose. I find it very difficult to work with in a few cases, and a mild nuisance in all the rest. I understand that for some people and for some use cases, this obnoxious sorting algorithm can actually provide value, which is why I would love to have it as an optional feature instead of having it patched out of the upstream source code entirely. But if I have to patch the source code and recompile for my local machine, there’s no reason not to completely purge my system of this algorithm.

If upstream - any of the three points I know of upstream from me, really (GNOME, Debian, or PureOS) - implemented the wrapper and configuration option you mentioned, that would be absolutely fantastic. It’d be sad to see it only in Nautilus, but that’d be far better than the current situation. I have no idea how to write code for that myself though. I still can’t get the algorithm to do custom work for me yet, let alone have it check user settings and other more-complex tasks. That said, if upstream hasn’t fixed this by the time I’ve managed to replace this algorithm with one that not only stops the stupid digit-handling, but also enforces case sensitivity in applications that redundantly apply case insensitivity, I’ll be patching all my future copies of glib too. At that point, I’ll have the knowledge I need to force all applications that use g_utf8_collate_key_for_filename() to sort files in Unicode point order, even the ones that lack configuration options, and I’m not going to want to give that proper filename sorting up again. Having this as a per-application configuration option would be nice, but no application I know of that uses g_utf8_collate_key_for_filename() even offers the option to turn it off at all. They all just sort of assume that the user is going to be okay with certain characters being treated as somehow special and different from other characters.

This obnoxious sorting algorithm has been the bane of my computing existence for years. The entire reason I switched to LXDE on my other machine was just to rid the file manager of this mess. Otherwise, I’d still be on Xfce. In fact, now that I’ve got a working method here, I’ve switched the other laptop back to Xfce. I like that desktop a lot better. The sorting algorithm still showed up in other applications, such as Firefox and the LibreOffice suite, but while I put up with it in these places, it was never ideal. I brought it up here in the PureOS forums because now I’m on a PureOS machine and PureOS is GNOME-centric, but if it weren’t for this machine coming with PureOS and my desire to at least make an honest attempt to get PureOS to be a functional system, I wouldn’t even use GNOME at all. Even without this problem, GNOME3 isn’t something I particularly like, but the rest, I can at least tolerate.

I feel like I’ve made a lot of progress here. With this algorithm patched out, PureOS is definitely something I can get used to and even enjoy using. I still have a few other issues I’d like to bring up in the forum when I have time, but they’re all minor issues, and even if I can’t get them fixed, it’ll be plenty easy to just deal with them.

kieran · September 5, 2019, 1:07am

All good.

Just remember that just when you have fixed g_utf8_collate_key_for_filename() and all applications that use it, you will find an application that completely rolls its own filename sorting and is a law unto itself, not behaving like anything else, or even anything else you have ever seen.

After reading all of this topic, I have a feeling that someone is going to ask for the existing algorithm to work correctly with hex numerics.

Someone else will ask for per-directory configurability.

AlexYst · September 5, 2019, 4:17am

You make a good point, but by fixing g_utf8_collate_key_for_filename(), I’ll fix nearly every application that has been bothering me with its sorting order. If I have to patch one other application because it does its own thing, that’s still much easier than patching a few dozen applications to make them not call g_utf8_collate_key_for_filename() in the first place.

Hex numerics are one of the many things I find wrong with this algorithm. It treats decimal numerics specially, but other number bases aren’t handled correctly. It really shows how treating arbitrary characters as a special case like this is such a broken concept. Personally, I use the case of hex numerics as a counterpoint when people claim this to be a more-intuitive way to sort. Hex numerics can be sorted correctly if they’re the same length when using a normal sorting algorithm, but when using this algorithm, hex nemerics can’t be sorted correctly any more.

I’ve never understood why people like per-directory configuration of file-sorting. If it helps their workflow, more power to them, but it seems like you have to switch in your mind where to expect files based on where inthe file tree, which seems like a pain. Before we can get per-directory configurability though, we need to have basic configurability at all, which is something we don’t have.

kieran · September 5, 2019, 4:35am

As an example, as mentioned above, FileZilla by default does normal (lexicographic) sorting i.e. Case Sensitive. However it also offers Case Insensitive and it also offers Natural (those are the three choices in the Settings). That would be a bit of a clue that it isn’t using the glib routine.

PS (maybe for someone else because maybe you already know) There’s a reason why this stuff exists. Pure lexicographic sorting simply doesn’t work in some cultures. The Unicode code points aren’t in the ‘right’ order. Not for words and hence not for filenames.

AlexYst · September 5, 2019, 6:08am

The problem isn’t finding out which applications use g_utf8_collate_key_for_filename(). If I’m not using the application enough to notice the behaviour, the behaviour isn’t bothering me in that application and I don’t need to patch it. If I’m seeing that behaviour though, and I’ve tried several times to find a way to configure the application to knock it off and failed every time, that application would be one I need to patch. Telling which applications need to be patched is the easy part. The problem is patching each and every one of them to make them not use g_utf8_collate_key_for_filename(). If the behaviour of g_utf8_collate_key_for_filename() isn’t wanted under any circumstances on my system, patching the behaviour of g_utf8_collate_key_for_filename() - being a single patch instead of a dozen - is the easier and more-effective option.