|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 6/12 5:13
#581 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/1/26 21:48 From New Zealand
Posts: 2196
|
@kas1e
Quote: Knowing that you are busy with everything, but just to remind after 2 months, is it possible that you can find some time soon to working on nova again ? Not right now, sorry. I really need to get things I'm working on finished ASAP, so I don't have much time for other things. Quote: Also, what you can suggest of how can we debug why Nova fail to made FrickingShark shaders works correctly, while compile them fine ? (taking in account, that shaders are ok, and that on win32 and linux version the same code which use those shaders works correctly). From memory, last time we talked about this you had simplified the shaders (disabled the fogging), and it was still drawing a single colour. You weren't sure if the textures were set up correctly. The best way I can think of, is to simplify the shader until it works, and then work back up. I'd start by simply outputting Texture0 to gl_FragColor, to make sure that works. If it doesn't work, then either the texture isn't bound correctly, or something is going wrong with the texture coordinates (in which case, check the vertex shader). After that, I'd output Texture1, to make sure that one works, and then gradually add in the rest of the code again. Hans |
|
_________________
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. https://keasigmadelta.com/ - more of my work |
||
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/15 19:34
#582 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@All
Some info about gl4es: seeing from commits , ptitSeb start to works on precompiled shaders support (what mean, no pauses in the games which want to change state offten, like FrickingShark, LugaruHD and some others). Another thing worth to mention: i was able to build old supertuxkart (that 0.6.2a version) over gl4es, and good thing that rendering and co all looks fine just from begining (through there little issue when you run it : not whole menu draws by some reassons, you need go inside of any entry and exit from to make menu appears). But bad thing, is that such an expected speed boost didn't happens :( In some tracks gl4es build a little bit faster, in some slower than minigl build. So generally speaking at moment nothing which worth to release in that terms. There is videos of one track called "racetrack" to see difference: minigl build: https://youtu.be/e6GUwU2OANk gl4es build: https://youtu.be/6Pf7WoZvSEo As you can see gl4es a little bit faster there (expectually under the bridge), but not _that_ fast as one expect. I will play with differenc settings of gl4es , as well as asking ptitSeb (gl4es author) if he have any ideas about. I may also try to build whole supertuxkart with LTO enabled, maybe it can give some boost (at least with foobillard++ it give +10 fps), maybe will be lucky with that 0.6.2a vesion too.. |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/15 20:55
#583 |
---|---|---|
Just can't stay away
![]() ![]() Joined:
2009/5/1 17:57 From Czech Republic
Posts: 1059
|
@kas1e
Quote: i was able to build old supertuxkart (that 0.6.2a version) over gl4es Does that mean the game would run on Polaris cards, where minigl is currently a no-go? |
|
_________________
Smoke me a kipper, I'll be back for breakfast! AmigaOne X5000 @ 2GHz / 4GB RAM / Radeon RX 560 / ESI Juli@ / AmigaOS 4.1 Final Edition SAM440ep-flex @ 667MHz / 1GB RAM / Radeon 9250 / AmigaOS 4.1 Final Edition |
||
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/15 20:56
#584 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@trixie
Quote:
Yes. You are right indeed, maybe it worth to release just because of that .. |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/17 8:56
#585 |
---|---|---|
Just can't stay away
![]() ![]() Joined:
2009/5/1 17:57 From Czech Republic
Posts: 1059
|
@kas1e
That would be very welcome! |
|
_________________
Smoke me a kipper, I'll be back for breakfast! AmigaOne X5000 @ 2GHz / 4GB RAM / Radeon RX 560 / ESI Juli@ / AmigaOS 4.1 Final Edition SAM440ep-flex @ 667MHz / 1GB RAM / Radeon 9250 / AmigaOS 4.1 Final Edition |
||
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/17 19:32
#586 |
---|---|---|
Home away from home
![]() ![]() Joined:
2006/11/26 21:45 From a story that hasn't been written yet
Posts: 3550
|
@kas1e
I still get the same visual error i got with v1.58 of Warp3DNova drivers: ![]() ![]() The left one is with OGLES2, the right one with OpenGL I'm using Warp3DNova.library 1.65 (31.03.2019) W3DN_SI.library 1.65 (31.03.2019) now, but it doesn't seem to be fixed as this post (#483) from you assumed back then. |
|
_________________
If slaughterhouses had glass walls, everyone would be a vegetarian. ~ Sir Paul McCartney - Did everything just taste purple for a second? ~ Philip J. Fry - Ain't got no cash, ain't got no style, ladies vomit when I smile. ~ Dr. |
||
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/21 9:37
#587 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@All
As profiler in glSnoop seems working, i want to understand why SuperTuxKart compiled over gl4es the same 1:1 slow as MiniGL version. So, i run the game on some track, and there is output i got: OGLES2:
OpenGL ES 2.0 profiling results:
Warp3DNOVA:
Warp3D Nova profiling results:
As i see from both reports, the timewaster are DrawArrays which takes about 60% of everything. And only 12-20% is taken by glDrawElements. But as i do not understand why it can be that slow and what can be done about, i just bring that info there in hope someone more skilled can bring some ideas about :) I also for sake of tests profile quake3, and that what i got: OGLES2:
OpenGL ES 2.0 profiling results:
Warp3DNOVA:
Warp3D Nova profiling results:
So in case with Quake3 , there can be seen that everything is limited by glDrawElements and its optimisation. Probably, those 2 tests mean, that gl4es need optimisation in glDrawArray() handling (batching and stuff). But i of course can't be sure. |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/21 11:18
#588 |
---|---|---|
Quite a regular
![]() ![]() Joined:
2007/7/14 20:30 From Lothric
Posts: 809
|
@kas1e
In STK case, it seems that OGLES2 spent 13 % of "context lifetime" in known functions (that figure also includes W3DNova). This amount may change once we add the rest of functions so glSnoop becomes more aware of everything. But it's also possibile that 87% of time is done something else than drawing. Compare to Q3 case where OGLES2 spent 58 % of context lifetime so my interpretation is that Q3 is more GPU-oriented than STK, based on these rough stats. It's also important to note that Pause doesn't impact profiling, at the moment anyway. So if you let game sit on some menu it may twist the stats. |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/21 11:39
#589 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@Capehill
Yeah, those results are without "pause" taken, just run the game, in menu choice play, choice track, and play a little in it. What is interesting is that there is (summing both glDrawXXX) 137928+7109 draw commands, for 538 frames (number of SwapBuffer) will makes 269 draw command per frames ! (for comparison, quake3 gets 30 draw commands per frame!). That of course with menu and "a little playing" in game itself. But when i play more in the game itself, the results even worse: 489 draw commands per frame ! Also, i see problem there that this STK version works pretty fast on old 1ghz PC , so on our 2ghz (even if take in account it CPU limited), it should be surely fast too. Its all looks like something blocking the speed there, and i can't got what exactly. Probably that such a high amount of draw commands per frame is that what is key for slowing downs here , and they somehow need to be batched or so.. Through that didn't explain of course why minigl build and gl4es one are exactly the same by speed in most cases.. Edit: and yeah, we probabaly need for profiling add all/every ogles2/warp3d functions, so we will got real/full profiling. |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 4:55
#590 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@Capehill
I do run latest glSnoop over STK , and as we can now just use "PROFILE" without tracing, i can fully play one track, so to profile everything what happens there. So, there is results:
OpenGL ES 2.0 profiling results for Shell Process 'supertuxkart_gl4es_1915':
It is 469 draw calls per frame (!). Visually when you play , it vary from 5 to 20 fps. 5 fps when many cars close to you, and 20 fps when you draw alone on track. What i also notice now (after i able to play full track, thanks to new PROFILE option in glsnoop), that actual timewaster are W3DN_BufferUnlock() (if take in account, that we add already most usable functions and that this % do not include ogles2 values). Edited by kas1e on 2019/7/22 5:12:40
|
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 5:56
#591 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/1/26 21:48 From New Zealand
Posts: 2196
|
@kas1e
Quote: It is 469 draw calls per frame (!). Visually when you play , it vary from 5 to 20 fps. 5 fps when many cars close to you, and 20 fps when you draw alone on track. Back in August of 2016, the Sam460ex managed about 25.6k draw-calls/s (measured using Daniel's boingtest).** IIRC, the X1000 managed about double that, but I can't find the data to confirm. Your test had about 9.2K draw calls/s with the driver using about 47.12% of CPU time (scales up to about 19.6K calls/s if GLES2 + Nova were using 100% of CPU time). The number of draw calls/s goes down as the amount of data that's transferred to VRAM increases. So, more shader constants will slow it down, as will more and more vertices. This is why developers should use VBOs instead of vertex arrays... What system was this test on? If it's a Sam460, then the draw-calls/s bottleneck is probably a big factor. If you're using an X1000 or X5000, then the number of vertices/s being copied to VRAM is probably having a sizeable effect. Quote: What i also notice now (after i able to play full track, thanks to new PROFILE option in glsnoop), that actual timewaster are W3DN_BufferUnlock() (if take in account, that we add already most usable functions and that this % do not include ogles2 values). That's to be expected. Data is copied to VRAM when the buffer is unlocked, so this is probably the slowest operation. We still lack GART support, so the transfer is slow... Hans ** Boingtest uses VBOs, so this can be considered the draw-calls/s limit back in 2016 in the absence of large vertex data-sets being uploaded to VRAM. Things have been optimized a bit since then, but lack of GART support is the biggest factor limiting the draw-calls/s limit. |
|
_________________
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. https://keasigmadelta.com/ - more of my work |
||
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 8:20
#592 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@Hans
Quote:
I am on x5000, so then it mean that having those ~500 draw calls per frame not which make it be slow , as it still have lot of space left.. But to compare, quake3 doing only 30 draw calls per frame, so probabaly such a different still make sense too .. Quote:
Yes, lot of time spent in the creation of buffers. Does this operation really necessary? I mean, can't NOVA works without buffer (with just main CPU memory) via some environment or something ? And yeah, i as usuall will say obvious thing, but GART was and still much more important than other things, but sure those who pay those make the decissions of course :) Btw, ptitSeb also build that version of supertuxkart for tests, and on Pandora he also have very low framerate. He made GL capture on his side, and seems there are a lot of glCallList(...) involved. If each glDrawArrays(...) is in a list, then the BATCH mode will not merge them => slow. Also what he say about missing of GART on our side, that its unfortunate the reassons for slow gameplay in STK0.6.2a, he say that : --- Mmm. I see, but that's unfortunate. Most of those legacy games are not using VBO, so need to transfert data every times (and even if they were using VBO, gl4es would not anyway). VBO are bit special. Testing on the Pandora showed me that use of VBO actualy slowed down things on graphic intensive game (that was Doom III). And because gl4es is targetted to embedded SoC and SoC mostly have shared memory for VRAM, I'm still not convinced VBO would speed up anything. I still have some idea on were use VBO easily, but I don't much incentive to spent time on something that I'm not confident will improve anything (except complexity on gl4es). --- Edited by kas1e on 2019/7/22 8:45:58
|
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 8:56
#593 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/1/26 21:48 From New Zealand
Posts: 2196
|
@kas1e
Quote: Yes, lot of time spent in the creation of buffers. Does this operation really necessary? I mean, can't NOVA works without buffer (with just main CPU memory) via some environment or something ? Yes, it's necessary. Firstly, you need GART support for the GPU to read directly from main memory. Otherwise, you need to copy stuff over. Next, the GPU is little-endian so at a bare minimum all data needs to be byte-swapped. Quote: Also what he say about missing of GART on our side, that its unfortunate the reassons for slow gameplay in STK0.6.2a, he say that : That makes sense assuming that Pandora and other hardware ptitSeb is testing on have shared graphics memory (I know the Raspberry Pi has shared graphics memory). In that case, yes, VBOs won't help. Thats because VRAM is just a part of main memory, and so the GPU's access speed is the same as for the rest of RAM. On desktop systems it's different. The GPU has very fast dedicated VRAM, and everything else has to be copied across the PCIe bus. VRAM access is noticeably faster than PCIe access, so VBOs make a big difference. Hans |
|
_________________
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. https://keasigmadelta.com/ - more of my work |
||
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 17:13
#594 |
---|---|---|
Quite a regular
![]() ![]() Joined:
2007/7/14 20:30 From Lothric
Posts: 809
|
@kas1e
Can you reproduce these issues with Foobillard+ and Prototype: Quote:
Quote:
|
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 18:21
#595 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@Capehill
Yeah. With prototype (and noticed even before, just didn't reported), when you run glSnoop, visually, i can see some strange distortion while played in actual game: some vertical line near of ship arise from time to time, which is not fully black. I fear when first time when see it that its bug in game which left unnoticed, but then i run it without glSnoop, and there is no such a line. So i conclude that it can be something about patching of framebuffer based functions, etc. But can be wrong of course and that can be not related. As for foobilalrd++, yeah, have the same. And when do just PROFILE (for speed), and play in game , visually i see no glitches (was in hope it can be same as with prototype). Good thing that now, even if GUI window is not active, it refresh count of errors once they arise (for foobillard++ it show 672 GL errors and 0 for nova). For prototype also show 0 gl errors and 4 W3D errors, which bring me to next enhancement if you doesn't mind : once errors arise in ogles2, or in w3dn , maybe show the button near like "Show errors", and pressing on button will just open a window with errors list only, or something , i.e. not full looging/tracing copy, but just some clean summary, and for full summary user can check the full logs). What you think ? Also can you clean a bit some bits for me if you have time for :) : Let's say we have for warp3dnova "44.17% of total context life-time" and for ogles2 "60.54% of total context life-time". What did it mean, that we have full 100%, and 40 go to wapr3dnova and 60 to ogles ? or , it mean that we only catch 60% in whole (and 40% just untraced/unpatched at moment) , and from those traced 60% of whole calls, mean that it include 44% of warp3dnova inside (so actuall ogles2 use are 18%) ? I mean maybe something more clean and general can be divided into another field , not related to ogles2 / warp3d , but just some general summorise of everything at once. I just start to be loosing a bit into this % which is % from % from % :)) |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 18:55
#596 |
---|---|---|
Quite a regular
![]() ![]() Joined:
2007/7/14 20:30 From Lothric
Posts: 809
|
@kas1e
Regarding errors, the simplest might be to display error count in one column at the end. "44 % of total context life-time" means that if context exists for 100 milliseconds, then known calls used 44 milliseconds of that. Let's assume for simplicity that both OGLES2 and Nova context existed exactly for the same duration, 1000 ms. Now if OGLES2 spends 80 % of (its) total context life-time and Nova 40 % of (its own) total context life-time, then both OGLES2 and Nova used actually the same amount time processing because OGLES2 includes Nova (so you can do 80% - 40% = 40 % so both use the same amount of CPU time. Let's see if it can be simplified somehow. (my head is spinning too) |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/22 18:59
#597 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@Capehill
Quote:
After i check almost all ogles2 stuff we have, i can say that most of time ogles2 take around 40-60% , so somewhere else some functions eat lot of time usually.. Quote:
At least i got it now, so even if stay same, other ones can got it via reading that info (we can put it to docs/readme/guide/etc). |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/23 12:11
#598 |
---|---|---|
Quite a regular
![]() ![]() Joined:
2007/7/14 20:30 From Lothric
Posts: 809
|
@kas1e
Added a check for W3DN_Submit() return value, and it seems that Foobillard+ and Prototype have one-shot errors where Submit (first, maybe?) returns 0. |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/24 4:46
#599 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/9/11 11:31 From Russia
Posts: 5546
|
@Capehill
If only it give any point what and where should be fixed :) |
|
|
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress |
Posted on: 7/24 5:33
#600 |
---|---|---|
Home away from home
![]() ![]() Joined:
2007/1/26 21:48 From New Zealand
Posts: 2196
|
@Capehill
Quote: Added a check for W3DN_Submit() return value, and it seems that Foobillard+ and Prototype have one-shot errors where Submit (first, maybe?) returns 0. Check the errCode value. If it's W3DNEC_QUEUEEMPTY, then it's not really an error. It just means that you submitted an empty queue. Otherwise, errCode should give you a hint as to what's wrong. Hans |
|
_________________
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. https://keasigmadelta.com/ - more of my work |
||