diff options
Diffstat (limited to 'DOCS/OUTDATED-tech/dr-methods.txt')
-rw-r--r-- | DOCS/OUTDATED-tech/dr-methods.txt | 129 |
1 files changed, 0 insertions, 129 deletions
diff --git a/DOCS/OUTDATED-tech/dr-methods.txt b/DOCS/OUTDATED-tech/dr-methods.txt deleted file mode 100644 index c8eac2aae3..0000000000 --- a/DOCS/OUTDATED-tech/dr-methods.txt +++ /dev/null @@ -1,129 +0,0 @@ -DIRECT RENDERING METHODS -- by A'rpi -======================== (based on a mail to -dev-eng) - -Ok. It seems none of you really knows what direct rendering means... -I'll try to explain now! :) - -At first, there are 2 different way, both called direct rendering. -The main point is the same, but they work different. - -method 1: decoding directly to externally provided buffers. -so, the codec decodes macroblocks directly to the buffer provided by the -caller. as this buffer will be read later (for MC of next frame) it's not -a good idea to place such buffers in slow video ram. but. -there are many video out drivers using buffers in system ram, and using some -way of memcpy or DMA to blit it to video ram at display time. -for example, Xv and X11 (normal and Shm too) are such thingie. -XImage will be a buffer in system ram (!) and X*PutImage will copy it to -video ram. Only nvidia and ati rage128 Xv drivers use DMA, others just -memcpy it. Also some opengl drivers (including Matrox) uses DMA to copy from -texsubimage to video ram. -The current mplayer way mean: codec allocates some buffer, and decode image -to that buffer. then this buffer is copied to X11's buffer. then Xserver -copies this buffer to video ram. So one more memcpy than required... -direct rendering can remove this extra memcpy, and use Xserver's memory -buffers for decoding buffer. Note again: it helps only if the external -buffer is in fast system ram. - -method 2: decoding to internal buffers, but blit after each macroblocks, -including optional colorspace conversion. -advantages: it can blit into video ram, as it keeps the copy in its internal -buffers for next frame's MC. skipped macroblocks won't be copied again to -video ram (except if video buffer address changes between frames -> hw -double/triple buffering) -Just avoiding blitting of skipped MBs mean about 100% speedup (2 times -faster) for low bitrate (<700kbit) divxes. It even makes possible to watch -VCD resolution divx on p200mmx with DGA. -how does it work? the codec works as normally, decodes macroblocks into its -internal buffer. but after each decoded macroblock, it immediatelly copies -this macroblock to the video ram. it's in the L1 cache, so it will be fast. -skipped macroblocks can be skipped easily -> less vram write -> more speedup. -but, as it copies directly to video ram, it must do colorspace conversion if -needed (for example divx -> rgb DGA), and cannot be used with scaling. -another interesting question of such direct rendering is the planar formats. -Eugene K. of Divx4 told me that he experienced worse performance blittig -yv12 blocks (copied 3 blocks to 3 different (Y,U,V) buffers) than doing -(really unneeded) yv12->yuy2 conversion on-the-fly. -so, divx4 codec (with -vc divx4 api) converts from its internal yv12 buffer -to the external yuy2. - -method 2a: -libmpeg2 already uses simplified variation of this: when it finish decoding a -slice (a horizontal line of MBs) it copies it to external (video ram) buffer -(using callback to libvo), so at least it copies from L2 cache instead of -slow ram. for non-predictive (B) frames it can re-use this cached memory -for the next slice - so it uses less memory and has better cache utilization: -it gave me 23% -> 20% VOB decoding speedup on p3. libavcodec supports -per-slice callbacks too, but no slice-memory reusing for B frames yet. - -method 2b: -some codecs (indeo vfw 3/4 using IF09, and libavcodec) can export the 'bitmap' -of skipped macroblocks - so libvo driver can do selective blitting: copy only -the changed macroblocks to slow vram. - -so, again: the main difference between method 1 and 2: -method1 stores decoded data only once: in the external read/write buffer. -method2 stores decoded data twice: in its internal read/write buffer (for -later reading) and in the write-only slow video ram. - -both methods can make big speedup, depending on codec behaviour and libvo -driver. for example, IPB mpegs could combine these, use method 2 for I/P -frames and method 1 for B frams. mpeg2dec does already this. -for I-only type video (like mjpeg) method 1 is better. for I/P type video -with MC (like divx, h263 etc) method 2 is the best choice. -for I/P type videos without MC (like FLI, CVID) could use method 1 with -static buffer or method 2 with double/triple buffering. - -i hope it is clear now. -and i hope even nick understand what are we talking about... - -ah, and at the end, the abilities of codecs: -libmpeg2,libavcodec: can do method 1 and 2 (but slice level copy, not MB level) -vfw, dshow: can do method 2, with static or variable address external buffer -odivx, and most native codecs like fli, cvid, rle: can do method 1 -divx4: can do method 2 (with old odivx api it does method 1) -xanim: they currently can't do DR, but they exports their -internal buffers. but it's very easy to implement menthod 1 support, -and a bit harder but possible without any rewrite to do method 2. - -so, dshow and divx4 already implements all requirements of method 2. -libmpeg2 and libavcodec implements method 1 and 2a (lavc 2b too) - -anyway, in the ideal world, we need all codecs support both methods. -anyway 2: in ideal world, there are no libvo drivers having buffer in system -ram and memcpy to video ram... -anyway 3: in our really ideal world, all libvo driver has its buffers in -fast sytem ram and does blitting with DMA... :) - -============================================================================ -MPlayer NOW! -- The libmpcodecs way. - -libmpcodecs replaced old draw callbacks with mpi (mplayer image) struct. -steps of decoding with libmpcodecs: -1. codec requests an mpi from libmpcodecs core (vd.c) -2. vd creates an mpi struct filled by codec's requirements (size, stride, - colorspace, flags, type) -3. vd asks libvo (control(VOCTRL_GET_IMAGE)), if it can provide such buffer: - - if it can -> do direct rendering - - it it can not -> allocate system ram area with memalign()/malloc() - Note: codec may request EXPORT buffer, it means buffer allocation is - done inside the codec, so we cannot do DR :( -4. codec decodes one frame to the mpi struct (system ram or direct rendering) -5. if it isn't DR, we call libvo's draw functions to blit image to video ram - -current possible buffer setups: -- EXPORT - codec handles buffer allocation and it exports its buffer pointers - used for opendivx, xanim and libavcodec -- STATIC - codec requires a single static buffer with constant preserved content - used by codecs which do partial updating of image, but doesn't require reading - of previous frame. most rle-based codecs, like cvid, rle8, qtrle, qtsmc etc. -- TEMP - codec requires a buffer, but it doesn't depend on previous frame at all - used for I-only codecs (like mjpeg) and for codecs supporting method-2 direct - rendering with variable buffer address (vfw, dshow, divx4). -- IP - codec requires 2 (or more) read/write buffers. it's for codecs supporting - method-1 direct rendering but using motion compensation (ie. reading from - previous frame buffer). could be used for libavcodec (divx3/4,h263). - IP buffer stays from 2 (or more) STATIC buffers. -- IPB - similar to IP, but also have one (or more) TEMP buffers for B frames. - it will be used for libmpeg2 and libavcodec (mpeg1/2/4). - IPB buffer stays from 2 (or more) STATIC buffers and 1 (or more) TEMP buffer. |