-
-
Notifications
You must be signed in to change notification settings - Fork 81
Image metadata parsing & modification
For in-memory and stored images, we wish to control which metadata is stored with the image, both to remove sensitive information added by the camera (date, GPS, camera brand/ID) or add useful information that we choose (date, GPS, subject, consent).
We also wish to redact the image (and the thumbnail!) to hide sensitive information. The redaction should be reversible, with the redacted information stored separately.
JFIF files containing JPEG-compressed image data in a tagged format. One of those tags can be an EXIF data block with image metadata.
Roughly, image data is converted into a YCrCb colourspace and each channel encoded separately, but similarly. Images are encoded in 8x8 pixel blocks (Which are grouped into Minimally Coding Units and the colour channels are interleaved within and MCU).
The 8x8 blocks are encoded by extracting the "DC" (average) part and encoding deviation from that in a Discrete Cosine Transformation (DCT). The coefficients are quantized and Huffman encoded in variable-length bit-strings. In addition the DCT coefficients are sparse so the zeros are run-length encoded.
Unfortunately parsing the encoded data is rather involved since each coefficient's length is Huffman encoded in a variable number of bits, so even working out which bits correspond to which coefficient involves a lot of decoding. Even working out where the next MCU starts involves disentangling the bit lengths of all the components of the previous MCU. (JPEG does allow for a Restart tag every N MCUs, but In the files I've looked at this is not used).
Exif data is stored in an APP2 tag. Cameras may add various photographic parameters (such as focal length, shutter speed, flash use etc) as well as several tags that may be considered sensitive for our use case (date/time, GPS location, Camera brand, )
IPTC metadata (location, subject, copyright etc) is stored in a Photoshop3 block. It's not used by cameras, but is a way for us to add this data where we want to.
I have a C++ program that parses JPEG and EXIF data. It correctly parses the encoded data and correctly extracts the DC-grey component of the image. (No interpretation of Chroma or DCT coefficients.) It can write out all the JPEG data to a file (but not yet EXIF).
- Redacting the image data.
- Reversible redaction
- Writing EXIF information out
- Modifying EXIF data.
Decompressing an image on the phone is impractical (e.g. a 5MP image is 15MB uncompressed). Redaction can be done in the compressed domain. Redaction should have the properties of being:
- Information destroying [the redacted image hides information which cannot be reconstructed]
- Deversible with stored information [We keep compact information separately which can be used to restore the original image , hopefully without any loss]
- Defeat automatic and human recognition techniques, with or without enhancement technology. [note that blurring has been found to improve face recognition accuracy, particularly in matched conditions]
- Visually pleasant [e.g. streetview noise+blurring rather than black boxes]
It seems sufficient to modify entire macroblocks- we don't need to reverse the DCT or look at sub-macroblock regions.
Removing (and preserving separately) the AC coefficients (probably all of them, but optionally just the higher frequency ones) of affected macroblocks should have the desired properties at a fine resolution. For high-resolution images, sufficient detail may be present in the DC components, so these need to be flattened/have noise added. Note that the DC components are delta-encoded so if we modify the DC components, the DC component of the first macroblock after the redaction needs to be fixed up.
It is hoped that keeping average DC (or smoothed + noise added) in each macroblock (for both Y and chroma) will be visually pleasant enough.
The bits that we remove from the encoded bit stream will, for most applications need to be stored separately to enable reversing of the redaction. Ideally:
-
Lossless reconstruction
-
Redaction data encrypted
-
Optionally separate keys for different data types (faces, background)
-
Redacted file should be a fully standards-compliant file that can be viewed (in its degraded state) on any viewer/browser/camera.
-
Stored within the JPEG file.
-
- Possibly in a standard (e.g. unused APPn) tag- though this allows discoverability. it might allow the data to preserved through manipulation by image processing tools (or at least reinsertion by standard tools such as
-
- Optionally hidden (e.g. appended to the file, with a "header" at the very end)
-
- Possibly stored in a separate file, or in a database (then hard to transfer to another client)
##Resources for understanding the formats:
- JPEG on Wikipedia
- JPEG Huffman encoding scheme details.
- Some JPEG tag details.
- EXIF format on Wikipedia
- EXIF 2.3 standard document, April 2010
The same desiderata apply to video. Here speed and compression become more important. As with the current JPEG goals, we expect to be rewriting existing files, rather than doing redaction in-memory or uncompressed.
We expect the target format to be MPEG4.
- MPEG 4 on wikipedia note that the standard encompasses all kinds of other streams beyond video and audio.
- MPEG4 Standard document
- A tool for indexing (hinting) a MP4 files and that gives a few info about the content
- A open source project including : mp4info : an utility to dump info from mp4 file mp4f : a library for parsing mp4 files
- Closed source MPEG4 Parser