Motion JPEG is hazardous

You might want more details then. :slight_smile:

The term is used in a number of different ways.

The distinction that I would make is:

  • to capture an image that exceeds the dynamic range of your hardware (your sensor), you may use the technique that I described
  • to store an image that exceeds the dynamic range of the standard encoding, you may need to use a different encoding, one that supports a higher dynamic range (and that’s where the greater color depth comes in)

I suggest reading https://en.wikipedia.org/wiki/High-dynamic-range_imaging for the former and https://en.wikipedia.org/wiki/High-dynamic-range_video for the latter.

Good question. That’s the non-beauty of a closed source system. There is often no way of answering a question like that.

The answer, on the client, is going to depend on how the system presents the image to any client that is able to access images on it. It could present it as a single safely combined JPEG, it could present it as 3 x JPEGs / MJPEG, it could present it as a single combined JPEG but with enough bad stuff hiding in the JPEG metadata. The answer on the original device is anyone’s guess.

However as I started out by saying, due to the short interval of time in which the 3 images are captured, it is unlikely that there will be anywhere near as much risk with this scenario as compared with what prompted you to start this topic.

Not before image sensors detect light polarization. But you can use a polarized filter in the meantime, although it might be difficult to attach it to a phone.

2 Likes

See kieran’s wiki. It depends on what exactly one means by “HDR”. In the JPEG context, it means finding an exposure level which maximizes decompressed image entropy. What you actually want is more than 8 bits per RGB component, but you can’t get that with JPEG because the spec was frozen in ancient history, so you need to solve the problem on the analog side. Presumably other image formats can employ deeper pixels, in which case HDR would just be “photography as usual”.

I toggled HDR on my vanilla Android phone, taking a photo of the same complex but very constant scene each time. The file size difference was trivial, so it’s clear that they’re just taking the exposure with the least amount of saturation (minimum information loss), rather than concatenating multiple JPEGs (which could make sense if the point was to provide downstream image processing software with more choices of dynamic range, similar to what is done with after-the-fact focus). You’ve convinced me that HDR isn’t nearly as hazardous as MJPEG. It’s nice to have my paranoia mooted, for once.

Nope, not unless the reflections are entirely polarized, which is rare and tends to occur on chemically uniform surfaces such as oil or ice. Apart from this, a polarizing filter would just reduce the brightness of everything in the scene, although I do admire your lateral thinking.

On the spiPhone, with HDR enabled but asking it to retain the “normal exposure” version, with a test photo that I took, the HDR was smaller than the normal.

However the documentation strongly suggests that the HDR processing involves not concatenating multiple JPEGs (really images) but combining multiple JPEGs (really images).

I note also that the spiPhone is able to tell which JPEGs were taken with HDR enabled and which were taken with HDR disabled. My informed guess is that this information is encoded in the CustomRendered tag value in the metadata.

JPEG however is intrinsically hazardous from a privacy point of view. There are a myriad of places where a blackbox camera or encoder may be hiding privacy-sensitive content (which you may then leak unwittingly).

  • in the value of the MakerNote tag
  • in extraneous data in any tag’s value or in any segment
  • in extraneous data beyond the EOI segment
  • in the value of any non-standard or unrecognized tag or segment
    just for some examples.

So even for the L5, if the JPEG encoding is being done by a blackbox then the user should have some control over metadata purification - which is what you started out by saying(!).

I think at this stage we have no information about this e.g. the camera hardware in the L5 supplies a raw image and that is then JPEG-encoded in the GPU or in software or e.g. the camera hardware in the L5 supplies an already JPEG-encoded image.

The spiPhone equivalent appears to be called LIVE mode. In LIVE mode the camera captures a 3 second video when you take a photo, thereby affording opportunity for unintended panning. This presents to the client (my Linux computer when I plug the spiPhone into it) as two files: a .MOV file and a .JPG file. The latter is some kind of representative still frame from the video.

The former reportedly even includes audio(!). I think noone has mentioned that so far. Who expects, when taking a photo, to capture sound? It could be some other person in the room saying something / talking on the phone / … and completely unrelated to the photo. Again, the possibilities for unintended leakage of privacy are significant.

LIVE mode is probably closest to what you were originally expressing concern about.

It would probably be best if the L5 simply didn’t offer this functionality.

2 Likes

I think by “combining” they mean “choosing the highest entropy one” because otherwise the combined image would be less faithful to reality than any of the individual shots. If the HDR image was actually smaller in your test, then I suspect that was just due to shot-to-shot variances in noise level. If it was more than 5% smaller, that would warrant further investigation.

I guess that’s OK because HDR status is less than a bit of information.

Indeed, but while the camera may be a blackbox, it’s undoubtedly sending a raw pixel array to the encoder, which I certainly expect would be open-source in the case of L5. I don’t see any monkey business going on there, apart from the addition of metadata which is already known to be hazardous.

Great research, kieran! What a horror. Tell your iFriends to turn it off:

https://www.digitaltrends.com/mobile/how-to-turn-off-live-photos

Taking photos is not about being faithful to reality (whatever “reality” means, everyone’s got different eyes), it’s about making a beautiful picture. Compare a picture from a security camera and a smartphone and you’ll see how completely different they look due to completely differently optimized image processing algorithms.

Also, HDR is not about taking multiple pictures and choosing the one with the right exposure. It’s about doing what others mentioned before, taking multiple exposures and combining them to get the highest amount of dynamic range at every part of the scene as different scenes require different exposures. Selecting just a single right exposure is no better than taking a digital camera from 2010 and setting it to “auto” so it finds the right exposure itself.

Using HDR does not necessarily mean making it “further from reality” because in reality your eyes will change exposure depending on which part of your scene you are looking at. In a still picture the exposure is determined at the point when it was shot and your eyes cannot compensate for the lost information afterwards, but with HDR it doesn’t need to because it can have different exposures at different parts of the picture. So technically that could be considered to be closer to reality than without HDR. So I’d consider HDR more “realistic” at the primary point you are looking at, but the peripheral view might be better exposed and more detailed than your eyes would make it out to be if looking at the real scene.

I’m not an expert in this field, but I’ve had many discussions with people who are and work professionally with image tuning and creating digital cameras.

“If I had a world of my own, everything would be nonsense. Nothing would be what it is, because everything would be what it isn’t. And contrary wise, what is, it wouldn’t be. And what it wouldn’t be, it would. You see?” from Alice in Wonderland :wink:

i think of HDR in terms of ‘hacking-the-matrix’ :sweat_smile:

1 Like

also look at what AV1 video encodes look like with the proper settings comparing 8bit only vs 10bit ; then compare that to x264.
10bit ends up having higher bit-rate and smaller in size vs 8bit. they are optimized for HDR but 10bit per color channel isn’t exactly true HDR but it’s still way more than RGB.

by contrast let’s look at > https://www.openexr.com/documentation.html that it can store up to 32bit color information and

OpenEXR v2 also introduces the concept of “Multi-Part” files that contain a number of separate, but related, images in one file. Access to any part is independent of the others.

Good insights. That’s really surprising. In order to do that you need either (1) heavy AI to decide which linear combination of input brightness values to assign to which output pixel such that edges are sharply preserved or (2) much simpler logic which sacrifices edges to some extent. Based on keiran’s comment that the HDR image was smaller, I think #2 is what’s going on. Blurrier images will be sparser in discrete cosine components, so the resulting JPEG will be smaller. The net of all this is that HDR sacrifices some image sharpness in exchange for fewer saturated pixels. All considered, that’s probably a good trade, considering that option #1 is extremely challenging to implement. (The blurring comes from the fact that edges always move between frames, to whatever small extent. Thing of a pic of someone’s hair in the wind.)

It’s worth mentioning, too, that pixel values are not a simple function of energy received at the CCD element. This goes way back to the way vacuum tubes work. I would bet that the people who designed HDR image combination logic didn’t properly account for this.

That would be like 11-11-10 RGB. I’m not sure I can see that deep, but I wouldn’t complain if it were available.

Oh great! Another “hidden frames” image format purporting to be a “photo” format. Thanks for the warning.

1 Like

Thanks for educating me! It’s clear now that my experience using photography gizmos is lacking :slight_smile:

Yes.

No.

At least I don’t currently know about any mode that would do that on either camera. Even then I guess we’d get no metadata attached.

Objection! Plaintiff overgeneralizes.

At the same time, it does necessarily make it farther from the reality where the relative differences between large scale features are preserved. It’s a matter of choosing which reality you wish to see (like you wrote later in the post).

1 Like

Yes, exactly. That’s how I use HDR. If you want fidelity then there are other modes that may be better suited to your purposes (for example, modes that behave the way you describe in capturing multiple images but then selecting one image). HDR images do sometimes look a bit weird. It is a trade-off between reality (fidelity) and capturing detail.

Anyway, @johan-bjareholt explains it better and in more detail.

1 Like

Indeed, a useful warning from @reC, but hidden frames are both good things and bad things, depending on how they are used. For example, they may be used in your service for the purposes of steganography, or they may be used against you for the purposes of watermarking.

OK, thanks for that info. Is the JPEG encoding then done in the GPU or in software?

There is no hardware video/image encoding on the i.MX 8M Quad (that will be coming in the i.MX 8M Plus), so it has to be done in software. Maybe the software can use the GPU, but the free Entaviv driver doesn’t support OpenCL, so I doubt that the software uses the GPU when image encoding.

1 Like

That typo appears once in the FAQ.

Oh thanks. I fixed that typo.

No.

:wink: Encoding is not what I’m working on right now. ffmpeg’s command line does it for me so far.

1 Like