Of course i know it's always pretty much impossible to give time frames, but what includes this "special handling" and how long do you think will it take to support them properly?
So far all shader variables are 32-bit. Booleans, are not. This complicates doing the endianness conversion when uploading the uniform variables to the GPU. From there, the GPU represents booleans as a 64-bit mask, with one bit for each thread.
@Hans Wait, but why glslangvalidator fail in first place with Raziels example ? I mean, even if w3dnova do not support something, glslangvalidator shouldnt fail if shader is correct
@Hans This not so important to add bool support As we can use int as bool As we can test bits with a AND operator I mean slightly modifying a shader (that used bool) will allow it to works...
So far all shader variables are 32-bit. Booleans, are not. This complicates doing the endianness conversion when uploading the uniform variables to the GPU. From there, the GPU represents booleans as a 64-bit mask, with one bit for each thread.
I see.
And do i understand it correctly if i assume that not all "uniforms" are boolean, but all "uniforms" are affected from this 32/64-bit conversion?
Because i tried another project and while it gives me a different error "uniform" is there again.
kas1e: "In the Warp3dnova itself missing at moment (taken from Daniel's ogles2.library readme): sample coverage, dithering, compressed textures and render-to-texture, which mean it also missing in ogles2 because of that. "
Hans: "Correction, render-to-texture has been in there for ages now (over a year)."
Sorry for this, I forgot about this readme's section... it's outdated. In the change-log there's correct information though. Of course render-to-texture exists, otherwise e.g. Spencer would have a hard time to do its shadow effects This feature is inside Nova since version 1.37 IIRC and our OGLES2 supports it since version 1.14 (30. April 2017).
@Hans,Daniel Tested latest beta of w3d nova (1.62), where all the endian conversion of mixed datas works fine. So yeah, now, when i use daniels ogles2.library v1.22 with removed patching code he give us for tests, all looks correct.
But ! There is some really strange issue arise:
If i use warp3dnova 1.62 + public ogles2.library v1.22 (that one where patch applied), then i have in q3 let's say 100 fps in some resolution.
Now, if i put for testing ogles2.library v1.22 where patch was removed , over the same warp3dnova 1.62 , then, while everything still looks correct (so, endian conversion works from w3dnova side), q3 speed REDUCED A LOT, instead of 100fps, it give 65 !
I.e. strange thing is that when necessary conversion code have and warp3dnova and ogles2 , things is much faster, than when conversion code have only warp3dnova.
Its like .. Dunno, its like extensions stop working, while in console output i can see they found and used.
Dunno what to think, but i may think of 2 ways:
1). when patch was removed from ogles2.library for testing purposes, something else was broken
2). conversion code of warp3dnova take care of some datas sizes, while didn't about others, and patch from daniel make it faster, even when it ot top of conversion code in warp3d nova.
@Hans For others, in terms of cube2 3 issues from 4 are fixed, thanks ! Only remain one is that OpImageSampleProjImplicitLod unimplemented.. Maybe for time being just made it as some empty stub ?
If i use warp3dnova 1.62 + public ogles2.library v1.22 (that one where patch applied), then i have in q3 let's say 100 fps in some resolution.
Now, if i put for testing ogles2.library v1.22 where patch was removed , over the same warp3dnova 1.62 , then, while everything still looks correct (so, endian conversion works from w3dnova side), q3 speed REDUCED A LOT, instead of 100fps, it give 65 !
I.e. strange thing is that when necessary conversion code have and warp3dnova and ogles2 , things is much faster, than when conversion code have only warp3dnova.
Sounds like Daniel's workaround code is more efficient than the new Warp3D Nova "complex endian convert" code. Maybe it's using the caches more efficiently, somehow. Daniel knows a thing or two about that...
EDIT: On what hardware is this? I'm converting the endianness and copying 16 KiB at a time, which I thought was small enough to fit in all of the L1 caches. No cache prefetch instructions used, unlike the functions used when copying and endianness converting a buffer with a single data size.
Quote:
Only remain one is that OpImageSampleProjImplicitLod unimplemented.. Maybe for time being just made it as some empty stub ?
Having an unimplemented stub won't help anyone because the rendered image will be wrong. I prefer that anything that's not implemented yet will trigger an error. That way we know exactly why something doesn't work.
IIRC, you can emulate the shader's textureProj() function with texture() by dividing the texture coordinates by their last element.
EDIT: On what hardware is this? I'm converting the endianness and copying 16 KiB at a time, which I thought was small enough to fit in all of the L1 caches. No cache prefetch instructions used, unlike the functions used when copying and endianness converting a buffer with a single data size.
That on x5k with some RadeonHD verde (250 or 250x is it). Strange that is _that_ slow things down .. You may try that test q3 archive i send you before, as well as those ogles2.libraries (public and with removed workoround) to see how it behave on your hardware (and probabaly will be easer to debug if it need any debugging).
Probably when Daniels patch sitting on top of wap3dnova conversion code, ogles2 send already patched data to warp3dnova, so conversion code in nova skips as nothing to convert, and because of this we have same fps as in case with previous nova versions.
@Daniel Is that test ogles2.library with removed patching code compiled as usual, and nothing else disabled ?
Edited by kas1e on 2018/11/28 9:58:19 Edited by kas1e on 2018/11/28 10:03:20
Is this conversion stuff really a good thing in any case ? I mean as in most case we must peek the data and read them manually to write them in the vbo then perhaps we can order them during the process
I mean if we read XYZ then copy it elsewhere in vbo then read UVW then copy it elsewhere in vbo then RGBA then perhaps can reorder them too...
I mean : disabling the auto-reorder feature can help in some case
Is this conversion stuff really a good thing in any case ? I mean as in most case we must peek the data and read them manually to write them in the vbo then perhaps we can order them during the process
In most cases, you upload the data to the GPU once, and leave it there. So there's zero speed advantage to shifting where the endianness conversion is done beyond the initial copy. This is true even with skeletal animation, which can be done entirely on the GPU.
The one exception I see, is older games like Quake 3, which were created before hardware skeletal animation was possible. In that case, the character models have new poses uploaded every frame.
Quote:
I mean if we read XYZ then copy it elsewhere in vbo then read UVW then copy it elsewhere in vbo then RGBA then perhaps can reorder them too...
I mean : disabling the auto-reorder feature can help in some case
Do you really want the responsibility of doing all the endianness conversion? Bear in mind that you're not guaranteed that the GPU will even need it. Right now, yes, all GPUs are little-endian and we use big-endian CPUs. In future... who knows?
If there's enough interest, I could add the ability to check the GPU's endianness and disable the endianness conversion, making it the app's/game's responsibility (or the GLES2 wrapper's).
That on x5k with some RadeonHD verde (250 or 250x is it). Strange that is _that_ slow things down .. You may try that test q3 archive i send you before, as well as those ogles2.libraries (public and with removed workoround) to see how it behave on your hardware (and probabaly will be easer to debug if it need any debugging).
Probably when Daniels patch sitting on top of wap3dnova conversion code, ogles2 send already patched data to warp3dnova, so conversion code in nova skips as nothing to convert, and because of this we have same fps as in case with previous nova versions.
@Daniel Is that test ogles2.library with removed patching code compiled as usual, and nothing else disabled ?
Yes, Daniel's patch bypass the new endianness code, and uses the "everything is 32-bit" copy-convert code.
There's no bug in the way; it's all about optimization. However, I'm not sure why it's 30% slower, or how to optimize it. We don't have tools that could identify the bottlenecks (e.g., cache misses, etc.), so it's more guess work than anything else. I suppose I could try insert some cache prefetching instructions to see if that helps.
Suggestions from those who are experienced in these kind of optimizations are welcome.