Friday 10 February 2012

Tuning ...

Had a poke at some performance tuning of jjmpeg.

I took 2 videos:
PAL
A PAL DVD, half hour show.
1080p
A half hour show recorded directly with a DVB-T receiver. 1440x1080p, ~30fps, 10MB/s.

I then used JJMediaReader to scan the files and decode the video frames to their native format. I then took this frame and converted it to an RGB format using one of the tests below.
ByteBuffer
Code uses libswscale to write to an avcodec allocated frame in BGR24 format. The frame is not accessed from Java: this is the baseline performance of using a ByteBuffer, and it could be the end point if then passing the data to JOGL or JOCL.
ByteBuffer to Array
Perform the above, then use nio to copy the content to a Java byte array.
IntBuffer
Code uses libswscale to write to an avallocated frame in ABGR format. Similar to the first test, but a baseline for ABGR conversion.
IntBuffer to Array
Perform the above, then use nio to copy the content to a Java int array.
int array
Use JNI function GetPrimitiveArrayCritical, form a dummy image that points to it, and write to it directly using libswscale to ABGR format. This gives the Java end an integer array to work with directly.

In all cases the GC load was zero for reading all frames (i.e. no per-frame objects were allocated). I'm using JDK 1.7. The machine is an intel i7x980. I'm using a fairly old build of ffmpeg (version 52 of libavcodec/libavformat).

The timing results (in seconds):

Test \ Video PAL 1440x1080p

ByteBuffer 81.5 237
ByteBuffer to byte[] 86.0 279

IntBuffer 81.3 242
IntBuffer to int[] 86 297
int[] 81.9 242

Discussion

So ... using GetPrimitiveArrayCritical is the same speed as using a Direct ByteBuffer - but the data is faster to then access from Java as it can just be indexed.

Using RGB and ByteBuffer's is a bit quicker than using RGBA. Apart from the differences down to libswscale there seems some overhead using an IntBuffer (derived from a ByteBuffer) to write to an Int array.

Using RGB is marginally quicker than using RGBA - although that's mostly down to libswscale, and for my build nothing is accelerated. When I move to ffmpeg 0.10 I will re-check the default formats i'm using are the quick(?) ones.

When using a direct buffer and then copying the whole array to a corresponding java array, the overhead is fairly small until the video size increases to HD resolutions. At 23% for 1440x1080xABGR, it is approaching a significant amount: but this application does nothing with the data. Any processing performed will reduce this quickly. At PAL resolution it's only about 5%.

Conclusions

For modern desktop hardware, it probably doesn't really matter: the machine is fast enough that a redundant copy isn't much overhead, even at HD resolution.

Possibly of more interest is how the rest of the pipeline copes. Obviously with JOGL or JOCL the work is already done when using ByteBuffers, or ideally you'd process the YUV data yourself. I'm not sure about Java2D though, from a previous post there's a suggestion integer BufferedImage is the fastest.

However there are possibly cases where it would be beneficial and for Java image processing it is probably easier to use anyway: so I will add this new interface to jjmpeg after confirming it actually works.

I also found a bug in AVPlane where I wasn't setting the JNI-allocated ByteBuffer to native byte order. This made a big difference to the IntBuffer to int[] version (well 44% over no array copy in PAL), but wouldn't have been hit with my existing code.

No comments: