Projects and developments by Supragya Raj

Google Summer of Code 2018

Google Summer of Code is an annual program undertaken by Google to promote open source development and bringing potential developers(students) closer to open source organizations. I was a part of Google Summer of Code 2018 as a student software developer, working with Apertus Association.

My project on Google Summer of Code Website can be found here: It was one of the six projects undertaken by Apertus Association in 2018.

The mentors for this project were Georg Hofstetter (g3gg0) and Andrej Balyschew (BAndiT1983).

GSoC codebase submission:


Aim of the project

The project is a result of the lab task T951: Raw Video Container Format which aims at comparing and finding the right containerization system suitable for AXIOM Beta camera that may very well be fast to encode and at the same time be as close to a popular standard so that many different professional software can work with them easily. There have been multiple different considerations for this – and many different formats have been studied. A working proof of concept was to be built as a part of this GSoC project that could show that the proposed system would actually work. These have been discussed in paragraphs below.

System under consideration – Apertus AXIOM Beta

The system under consideration is apertus AXIOM Beta.

This camera uses RAW12 file format to encode it’s frame data. RAW12 is a very simple headerless, payload only file format which stores 12-bit raw sensor data in RGGB bayer pattern. More information on RAW12 file format can be found here.

The proposed image recording pathway (USB3 interface based) for the camera is given as following:

The system is supposed to have the following characteristics features:

  • The USB3 interface (not fully developed as of writing this article) is supposed to be a high speed interface providing data upto 3Gbps.
  • Gigabit Ethernet Interface is relatively low speed data bus and more suitable to transport metadata information from the camera to the recording end.
  • The camera uses a CMV12000 image sensor for capturing images. This is supposed to be replaceable by design (not currently implemented) and provides 4096×3072 sized bayer pattern at maximum of 300 fps.
  • Metadata information captured by the camera is just the set of register values of CMV12000 image sensor as of now, which is transported if necessary through Gbe link.

The system developed in this GSoC is the recording computer end.

An analysis of different file formats

Multiple file formats have been analyzed as a part of this GSoC project. Some of these included CinemaDNG / TIFF-EP, Arriraw, Cineform, Magic Lantern Video among others. The deciding factor(s) in selecting a container format over others is to be based on following considerations:

  • The RAW12 payload (camera output) should be easily encoded into the final container. The easier, the better. It should be noted that even simpler transformations like byte swapping due to format constraints poses problems when encoding “in camera”.
  • The encoding should be feasible “in camera”. This means that the camera should be able to encode the RAW12 data into the container at a fast enough rate, as the camera is supposed to be low powered when considering computations and also and if encoding takes too much computation per frame, it can be a serious bottleneck.
  • Compression is welcomed, however not mandatory. As point 1 suggests, even simpler transformations can really affect the upper limit of recording speed.
  • CBR vs VBR: Constant bit rate and constant size headers make offset calculations simpler, allowing stream outputs to be faster, although heavy. VBR on the other hand needs quite a bit of processing help when reading and writing, however this often makes streams more efficient.
  • PLR HDR: PLR is a high dynamic range mode (discussed in paragraphs below) which is a way in which CMV12000 image sensor achieves HDR. Any formats that needs wants to be applicable for AXIOM Beta needs to extend capabilities for PLR.
  • Room for changes: AXIOM Beta is supposed to have swappable sensors. Thus, in that way PLR may not be needed; instead something else may come up by design. Hence, the ease by which changes can be incorporated in the recording system to account for different sensors (and thus, different payloads) is also what affects our decision for a raw video container format.

For detailed analysis of the different file formats that one can consider for AXIOM Beta, refer to Apertus’ lab task T1093.

Raw Video Container Formats

 Magic Lantern MLVCinema DNGCineform RAWProRes RAWARRIRAWRedcode Raw
License / AvailabilityOpen SourceOpen SourceOpen SourceProprietaryProprietaryProprietary
File StructureBlock based, timestamp based sequential file structureHeaders / TIFF-EP tags based file structureActive metadata + Cineform RAW compressedUndocumentedUndocumentedUndocumented
Compression RatioUncompressed available, compressed using LJ92 55-60%Uncompressed available, variable payloadLossy, 4:1 - 10:1lossy, 50% - 70%uncompressed3:1 - 18:1
Compression MethodLossless JPEG92500+ archival formats (RAW), e.g. JPEG. Ratio and method archival basedWaveletProRes RAW HQ / ProRes RAWUndocumentedWavelet (JPG2000)
MetadataSaved in mlv blocksSaved in TIFF-EP headersSaved in headersUndocumentedSaved in file headersUndocumented
Acceptance / Software SupportMlRawViewer
mlv_dump, mlvfs(conversion to DNG)
Industry Standard, Adobe CC suite, Davinci Resolve etc.GoPro systems, The Foundary Nuke, Adobe CC etc.Industry Standard, Adobe CC suite, Davinci Resolve etc.Industry Standard, Adobe CC suite, Davinci Resolve etc.Industry Standard, Adobe CC suite, Davinci Resolve etc.
Comparison of different raw video container format, analyzed as a part of GSoC 2018.

Proof of concept – Emulation design and codebase

Magic Lantern’s MLV file format was chosen for testing / emulating the recording scenario. This was because of a few obvious advantages that MLV file format presented. Some of these were:

  • MLV file format is very easy to encode on fly. MLV consists of blocks of data not necessarily in sequence in stream but arranged logically according to timestamp.
  • Since MLV consists of simple blocks – metadata blocks and frame blocks which can be sent over different channels (high speed USB3 and Gbe respectively) and can be joined on the recording end to make MLV, there need not be any intermediate representation to transport metadata and video frames.
  • MLV is based on logical sequence of timestamps and thus it can change the settings of upcoming frames as it likes using a few metadata changing blocks (e.g. mlv_lens_hdr_t, mlv_expo_hdr_t). This can be called a form of “set mode” way of storing metadata information wherein if not exclusively detailed, the individual video frames can take up the metadata information of the previous video frame, hence overcoming redundancy. It could thus be possible that both per frame meta information be extracted from the camera or just triggers (for eg, dial rotation for ISO while recording) based meta information be extracted and encoded into MLV.

An example MLV file looks as follows:

In this file, the yellow blocks are needed necessarily in the given order on disk (mlv_rawi_hdr_t is not necessarily needed just after mlv_file_hdr_t, it needs to be there before the first mlv_vidf_hdr_t. However, our current implementation assumes the strict behavior given in the above figure), and the other blocks are logically sorted according to their timestamps. This allows us to build the following two sets of MLV blocks – one which contain a big amount of data, mlv_vidf_hdr_t with video frames, and everything else. Different sources can provide different parts of this file as follows too:

AXIOM Beta camera will provide with two interfaces or more to the recording system. This should be (1 low speed + 1 high speed) or (1 low speed + 2 high speed) or (1 low speed + 3 high speed) etc. High speed channels should provide the mlv_vidf_hdr_t blocks along with frame data. A lower speed channel (compared to frame providing USB3, probably a Gbe) should provide the meta blocks (all mlv blocks except mlv_vidf_hdr_t). These are saved to a high speed disk (a SSD RAID 0 config) in realtime (this has been benchmarked, although not on SSD RAID0). In the above figure, source 2, 3 etc.. are high speed channels while source 1 is low speed.

When the recording stream sends all it’s blocks, the cache is combined to form the following mlv file:

An emulation design was setup to show this system could work. This is the primary codebase for this GSoC project and could be found on GitHub using this link. The figure below explains the way in which the emulation is thought of and implemented.

The video frames are transported as a series of blocks of mlv_vidf_hdr_t along with frame data through high speed USB3 ports. (Find all MLV file structures here). Other important metadata information is transported using low speed Gigabit ethernet port.

The low speed port transports the following important metadata blocks:

  • mlv_file_hdr_t
  • mlv_rawi_hdr_t
  • mlv_rtci_hdr_t
  • mlv_expo_hdr_t
  • mlv_idnt_hdr_t
  • mlv_lens_hdr_t
  • mlv_wbal_hdr_t
  • … other mlv metadata blocks according to camera setting changes

Creating the above two streams for emulation is the work of Generator. This compiles two files – rawinfo and rawdata that models the two streams coming out of the camera – rawdata being the high speed video frame transport and rawinfo being low speed meta transport.

The contents of Generator can be hosted over the network or loaded in memory directly and served through FUSE. This can remove the secondary storage speed bottleneck wherein the Stream Handler (recording end, discussed below) gets full speed access to HDD/SSD.

Stream Handler: This application models the primary interface code for the recording unit. Uses two threads to store two streams on disk as fast as it can. Highly I/O based application with very little processing involved.

Joiner: A program that runs after the stream handling is done. It joins the high speed output (video frames) and low speed output (metadata) into one MLV file. Uses cat internally . However, there are a few considerations on whether this module may really be needed. See MLV recording end – Thoughts on joining for more details.

MLV_dump / MLVFS: Final “publisher” system (3rd party) which is used to convert mlv files into corresponding DNG files.

One can find these subsystems clearly in the repository for this GSoC project.

PLR high dynamic range – thoughts and implementation ideas

PLR or multi slope exposure is one of the two ways in which a CMV12000 sensor achieves high dynamic range. The other one being interleaved exposure / dual expo mode. While in dual exposure mode, the odd and the even rows of the sensor have different exposure times, PLR takes on the following approach:

This feature will partially reset those pixels which reach a programmable voltage, while leaving the other pixels untouched. This can be done 2 times within one exposure time to achieve a maximum of 3 exposure slopes. Partial reset can be seen to happen using the following figure:

The red line here is a sensor cell’s readout when it gets a higher amount of light. The untouched blue line is practically a sensor cell reading out a bit darker portion of the image. As shown in the figure, the bright pixel is held to a programmable voltage for a programmable time during the exposure time. This happens two times to make sure that at the end of the exposure time the pixel is not saturated. The darker pixel is not influenced by this multiple slope and will have a normal response. You can find more information regarding PLR mode in CMV12000 datasheets and Sebastian Pichelhofer’s post on PLR experiments.

To encode this multiple slope information, the following MLV struct is proposed as an addition:

struct mlv_cmv12kplr_hdr_t {
    uint8_t blockType[4];   // "PLR_" 
    uint32_t blockSize;
    uint64_t timestamp;
    uint32_t expTime;       // Contains the exposure time Texp by sensor
    uint8_t numSlopes;     // 1, 2 or 3
    uint32_t expKp1;         // Contains the first exposure time in PLR, invalid if numSlopes = 1
    uint32_t expKp2;         // Contains the second exposure time in PLR, invalid if numSlopes = 1 or 2
    uint8_t vtfl2;                // Hold voltage, invalid if numSlopes = 1
    uint8_t vtfl3;                // Hold voltage, invalid if numSlopes = 1 or 2

Currently, if this block is added to the MLV files, it would go unprocessed, both by mlv_dump and mlvfs. There would be changes required in mlv_dump and mlvfs to make PLR induce a linearization table (Tag 50712 (C618.H)) in DNG files. Simple mathematical calculations using a newly created function is mlv_dump and mlvfs can help introduce this linearization table in the exported CDNG files.

As a part of testing linearization tables in CDNG, a synthetic linearization table was added to mlvfs for each of the mlv frames that it rendered. We were able to reciprocate the affects in the final output of mlvfs as follows:

Figure showcasing the state of image before adding linearization table through mlvfs

Figure showcasing the state of image after adding the linearization table.

MLV recording end – Thoughts on joining

Currently, the Stream Handler handles both the frame data from high speed link and metadata from the low speed link, stores them in separate caches and finally merges them to make an MLV file. This is an expensive operation. However, the current emulation does join the two caches, it can be done much simply using the following method:

  • Create two caches while receiving the data from the links – metaCache and frameCache.
  • Start reading the metadata blocks into metaCache using meta handling thread.
  • Before writing to frameCache, consider adding a mlv_null_hdr_t block in the frameCache with a conservative size denoting maximum size the metadata blocks may take. Let this be 10KB currently. Thus adding a mlv_null_hdr_t block to reserve the space early on in frameCache allows us to write frames with an offset in the frameCache and not from the beginning of the file. This would help us later in constructing the MLV file.
  • Once both the metaCache and frameCache are written to, the files are as shown below. Joining them is very easy as metaCache can now be overwritten in free space and the unused space between the meta blocks and video frames can be filled using a newer, updated mlv_null_hdr_t.

An another way to reduce the overhead of joining can be by using the multi file mlv. Although this method is more suitable in keeping two frameCaches in different files, which can be combined later.

Results and conclusion

The emulation built showed promise and could export DNG as shown below.

However, there are still things that needs to be addressed… primarily adding custom color matrix, camera model information and PLR block support. These can be a part of future projects undertaken by Apertus / GSoC projects under Apertus association.

Important Links / References

  1. Comparisons of various Raw Video Container Formats:
  2. GitHub link to proof of concept emulation: