From 46d86da6300ebcd2134996c76b9238fcf8e0fb6e Mon Sep 17 00:00:00 2001
From: Niklas Haas <git@haasn.xyz>
Date: Wed, 16 Aug 2017 22:13:51 +0200
Subject: vo_opengl: refactor RA texture and buffer updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- tex_uploads args are moved to a struct
- the ability to directly upload texture data without going through a
  buffer is made explicit
- the concept of buffer updates and buffer polling is made more explicit
  and generalized to buf_update as well (not just mapped buffers)
- the ability to call tex_upload/buf_update on a tex/buf is made
  explicit during tex/buf creation
- uploading from buffers now uses an explicit offset instead of
  implicitly comparing *src against buf->data, because not all buffers
  may actually be persistently mapped
- the initial_data = immutable requirement is dropped. (May be re-added
  later for D3D11 if that ever becomes a thing)

This change helps the vulkan abstraction immensely and also helps move
common code (like the PBO pooling) out of ra_gl and into the
opengl/utils.c

This also technically has the side-benefit / side-constraint of using
PBOs for OSD texture uploads as well, which actually seems to help
performance on machines where --opengl-pbo is faster than the naive code
path. Because of this, I decided to hook up the OSD code to the
opengl-pbo option as well.

One drawback of this refactor is that the GL_STREAM_COPY hack for
texture uploads "got lost", but I think I'm happy with that going away
anyway since DR almost fully deprecates it, and it's not the "right
thing" anyway - but instead an nvidia-only hack to make this stuff work
somewhat better on NUMA systems with discrete GPUs.

Another change is that due to the way fencing works with ra_buf (we get
one fence per ra_buf per upload) we have to use multiple ra_bufs instead
of offsets into a shared buffer. But for OpenGL this is probably better
anyway. It's possible that in future, we could support having
independent “buffer slices” (each with their own fence/sync object), but
this would be an optimization more than anything. I also think that we
could address the underlying problem (memory closeness) differently by
making the ra_vk memory allocator smart enough to chunk together
allocations under the hood.
---
 video/out/opengl/common.c | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'video/out/opengl/common.c')

diff --git a/video/out/opengl/common.c b/video/out/opengl/common.c
index c7a714817a..f2550e8b8c 100644
--- a/video/out/opengl/common.c
+++ b/video/out/opengl/common.c
@@ -309,6 +309,13 @@ static const struct gl_functions gl_functions[] = {
             {0}
         },
     },
+    {
+        .ver_core = 430,
+        .functions = (const struct gl_function[]) {
+            DEF_FN(InvalidateTexImage),
+            {0}
+        },
+    },
     {
         .ver_core = 430,
         .ver_es_core = 300,
-- 
cgit v1.2.3