Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
88 user(s) are online (46 user(s) are browsing Forums)

Members: 1
Guests: 87

joerg, more...

Headlines

 
  Register To Post  

« 1 (2)
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@Spectre
Yeah, i see the results of gfxbenches of course, what i mean, is for example CopyToVram maximum 530 MiB/s on the fastest hardaware. Are those values we expected to have with all working DMA,GART and stuff ? Shoudl't it be 2-3-4 times faster instead ?

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: Need help with shader optimisation
Quite a regular
Quite a regular


See User information
@kas1e

Note the great X1000 memcopy score of 2,668.66 with this one :

https://hdrlab.org.nz/benchmark/gfxben ... phicsCard/598/Result/2432

Is it due to the screenmode or something else ?

All these results use what appear to be identical model cards on the outside but one of the two versions seems to actually be bit slower . but the above benchmark seems very good .

Go to top
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@Spectre
Not that great , on my x5000 i do have 530 Mib/s for CopyToVram, while on x1000 there is show 430 mib/s , so this kind of explain the speed in the irrlicht test case we have when test thigs : 260fps vs 220fps, quite as it expected to be.

As for much bigger writepixelarray and readpixelarray, this is because on tested machines x1000 were used with DMA acceleration for WPA functions in graphics libraries, but x5000 do not, while, for X5000 year ago DMA were added to graphics.library too, and values about the same. Just it wasn't released still (don't ask why:) ).


Question is : why 530Mib/s is maximum we have for CopyToVram. Is it expected to be that slow ? Half of GB ? While theiretical limit is how much, 2gb ? more ?

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@Spectre660

That is my test result when i first had the RX560 card running (stable) in my X1000.

The slightly lower results stems from having to use INTERRUPT=No and probably because I forgot to turn off CANDI

It was just a quick stress test to see if I can provoke a freeze...if I don't forget I'll do another run with CANDI off once I get back home.

People are dying.
Entire ecosystems are collapsing.
We are in the beginning of a mass extinction.
And all you can talk about is money and fairytales of eternal economic growth.
How dare you!
– Greta Thunberg
Go to top
Re: Need help with shader optimisation
Quite a regular
Quite a regular


See User information
@Raziel

Note for you that card model 598 in the GfxBench2D data base is actually a Polaris 11 model so in reality a Radeon RX550 .

The majority of the results for card model 598 are probably from my machines .

I have an identical looking and branded card I just stared using with the x5000/40. it shows as card model 617 .
Swapped card Model 598 and Card model 617 in my machines this morning to get complete test results across my two machines .


https://hdrlab.org.nz/benchmark/gfxben ... /AmigaOS/GraphicsCard/598

https://hdrlab.org.nz/benchmark/gfxben ... /AmigaOS/GraphicsCard/617

Go to top
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@kas1e

A few comments:
1. If I'm expected to personally do everything, then you're going to wait forever. Seriously, I'm just one guy that's already overloaded with work. It's time for people to stop expecting the same few developers to do everything. And no, this is *NOT* a case where "only the driver developer" can do it
2. CopyTo/FromVRAM are CPU RAM<=>VRAM transfer tests, so the presence of GART support is irrelevant. Similarly, WritePixelArray/ReadPixelArray do *NOT* use GART or the graphics card's DMA engines. it's entirely reliant on the graphics.library's code
3. GfxBench2D tests are interesting, but tell us nothing regarding GL4ES, GLES2, etc. (because they don't use them). They also don't give us the detailed data needed to figure out what to do about anything

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: Need help with shader optimisation
Quite a regular
Quite a regular


See User information
@kas1e

Any general wisdom to be gleaned here ?

https://raphlinus.github.io/gpu/2021/04/28/slow-shader.html

Radeon shader analyzer
https://shader-playground.timjones.io/

Go to top
Re: Need help with shader optimisation
Just can't stay away
Just can't stay away


See User information
@kas1e

Quote:

What about writing analogue of "slow for us" quake3map irrlicht test, workingdirectly over nova and/or ogles2, so we can see if it will give us 500 fps at least, or will be on the same level as it now with gl4es ?


What is the status of Irrlicht's OpenGLES2 renderer?

Go to top
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@Hans

Quote:

And no, this is *NOT* a case where "only the driver developer" can do it


Why it is not the case ?:) Who else can find the low-level issues with speed if not the driver developers ?:) From the 3d party we can only provide test cases to show that something is wrong (that what i do with irrlicht), but how we as not low-level devs, can wrote internal benchmark tools to check raw speed via DMA/GART to/from video card ?

Quote:

2. CopyTo/FromVRAM are CPU RAM<=>VRAM transfer tests, so the presence of GART support is irrelevant. Similarly, WritePixelArray/ReadPixelArray do *NOT* use GART or the graphics card's DMA engines. it's entirely reliant on the graphics.library's code


Then those ones we skip as not important for our tests. But then which tests will show us if we reach maximum possible adequate limits with raw copy with DMA/GART , or not ?




Quote:

3. GfxBench2D tests are interesting, but tell us nothing regarding GL4ES, GLES2, etc. (because they don't use them). They also don't give us the detailed data needed to figure out what to do about anything


I am about general raw speed of copy with DMA/GART to/from video card. Can there we written benchmark which will show us that ? So we will know maximum possible limits on os4 with current drivers. Because as i remember, you do have some kind of this utility back in time, and it show very slow results , which we didn't expect to be that slow.

By pointing on gfxbench i mean to find exactly, if the basics are ok or not. I.e. do raw speed already on the 2GB speed of copy at least, or , it limited to 400-500mb/s ? I remember we discuss before that there is problem about (exactly the same times when we find that x5000/040 are slower in some aspects in those test, and exactly when you create feature request about PCIE optimisation code in kernel or something of that sort).

Is this problem done with and we have now 2GB/s with your tests ?

And if maximum raw speed via DMA/GART are ok and on the sane level, then, we can start dig in into other software to find who is guilty. But if we still on 400-500Mb/s or how it was, and not of at least theiretical 2GB or how it was (or at least 1.5GB), then we can go in cicles all we want with optimising other stuff, but hits the wall.


Edited by kas1e on 2022/9/21 17:41:12
Edited by kas1e on 2022/9/21 17:42:21
Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@Capehill
Quote:

What is the status of Irrlicht's OpenGLES2 renderer?


It is in working shape , devs do use it, through 2 moments :

1). it use EGL context managment, but it should be more or less easy to swith to SDL. I exactly today asking remaining dev "Wtf, why ogles2 renderer limited to EGL, as it can do all context managment through SDL").

2). current public irrlicht ogl-es trunk to use SDL1 still, but it also easy to swith to SDL2.

2-3 years ago i already build ogles2 only version of irrliht, but it were done over Huno's EGL wrapper which at this time were very unstable, and for our tests this is no go of course, as we need plain one.

I may try firstly to create OGLES2 port as well, but then, we need to know : if raw copy speed via DMA/GART are of what we expected to be , and on the sane level. Because from the years ago discussion, it were clear that we didn't reach limits, and hits on 500mb/s barrier or something (i hope Hans can explain more, but i can dig in mails if he doesn't mind to refresh the things).

We need to know if all ok in "raw" terms because if i will made ogles2 port of irrlicht, it will be then not give us anything else, but only to see how it differes between GL4ES and OGLES2, but it will not say us a shit about if amigaos4 drivers GART/DMA already on the maximum hardware limits.

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@Capehill
Maybe you also will be in interest to look at : i find out wtf was wrong with ogles2 build back in days, see this piece of code in SDL device :

https://sourceforge.net/p/irrlicht/cod ... rrlicht/CIrrDeviceSDL.cpp

Line starting from 493 we do have:

case video::EDT_OGLES2:
#if defined(_IRR_COMPILE_WITH_OGLES2_) && defined(_IRR_EMSCRIPTEN_PLATFORM_)
        
{
            
video::SExposedVideoData data;

            
ContextManager = new video::CEGLManager();
            
ContextManager->initialize(CreationParamsdata);

            
VideoDriver video::createOGLES2Driver(CreationParamsFileSystemContextManager);
        }
#else
        
os::Printer::log("No OpenGL-ES2 support compiled in."ELL_ERROR);
#endif
        
break;


So, while they think that OGLES + SDL can be in use only in EMSCRIPTEN probabaly (and which i comment out), still , for unknown reassons they do use EGL manager there , to fill the params or something.

The remain author which still works on Irrliht do not know why that done like this (he didn't code that part), and why EGL used there at all.

Because if you will see few lines above, there you will see pure OpenGL, which used without any EGL, and all ok in that terms. Maybe that done because for EMSRIPTEN it can't be different ?

And if we go at the line from 44, we can see there such kind of code:

#ifdef _IRR_COMPILE_WITH_OPENGL_
        
IVideoDrivercreateOpenGLDriver(const SIrrlichtCreationParametersparams,
                
io::IFileSystemioCIrrDeviceSDLdevice);
        
#endif

        #if defined(_IRR_COMPILE_WITH_OGLES2_) && defined(_IRR_EMSCRIPTEN_PLATFORM_)
        
IVideoDrivercreateOGLES2Driver(const irr::SIrrlichtCreationParametersparamsio::IFileSystemioIContextManagercontextManager);
        
#endif


See, for pure OpenGL, we do use SDL devie, while for OGLES2 we do use this "contextmanager" created from EGL.

Why i do not know. More of it , i do not know then why this code then placed in SDLDevice.cpp then if they do use there EGLManger.

See there that EGLManager.cpp just in case:

https://sourceforge.net/p/irrlicht/cod ... /Irrlicht/CEGLManager.cpp


But basically what is clear , is that they do use EGL on emscripten + ogles2, for unknown reassons and not SDL for context creation and stuff.


I feel there for real should be something easy to make it work for us without EGL.. I also can easy add SDL2 there instead of SDL1 (just like i do already for 1.8.4 OpenGL port), just need to understand firstly how to get rid of EGL.


EDIT: i compile it for SDL1, at least no linking or compiling errors, all i do is to replaced everywhere IContextManager* contextManager on CIrrDeviceSDL* device as with pure OpenGL it done.

So i compile simple test case , and it show me just black screen. I assume, that happens because our SDL1 can't handle OGLES2 ? I probabaly need to try to build debug version of SDL1 to see wtf happens when things runs, but then, if we didn't tell to SDL1 to enable OGLES2 (as it happens by default probabaly on EMSCRIPTEN) then it will be on opengl, and probabaly our SDL1 do not support OGLES2 at all, right ?

Through, it's not about direct support of ogles2, but about "rendering window context". Because basically if i open SDL1 window and create context with it, and then also open ogles2.library, and use calls from it to this window, and link against -lSDL -logles2, then theoreticaly it should render into the window, right ? Or we need somehow show the ogles2 to render exactly to SDL1 window on amigaos ? (while on EPSCRIPTEN that probabaly happens by default)


Edited by kas1e on 2022/9/21 21:11:40
Edited by kas1e on 2022/9/21 21:14:49
Edited by kas1e on 2022/9/21 21:52:33
Edited by kas1e on 2022/9/21 21:56:09
Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: Need help with shader optimisation
Home away from home
Home away from home


See User information
@kas1e
Quote:
Why it is not the case ?:) Who else can find the low-level issues with speed if not the driver developers ?:) From the 3d party we can only provide test cases to show that something is wrong (that what i do with irrlicht), but how we as not low-level devs, can wrote internal benchmark tools to check raw speed via DMA/GART to/from video card ?

Anyone can write a test case Q3 map renderer with a GL4ES version, GLES2 version, etc. Likewise, anyone could do profiling of the game that this thread was originally about, to see what it's doing, and what's taking the most time.

Interpreting the benchmark results is a different matter. I thought that Huno already established that GL4ES is responsible for significant slow-down with Doom 3. The next step would be figuring out what GL4ES is doing that causes trouble. AFAIK, nobody has dug deeper yet...

Quote:
I am about general raw speed of copy with DMA/GART to/from video card. Can there we written benchmark which will show us that ? So we will know maximum possible limits on os4 with current drivers. Because as i remember, you do have some kind of this utility back in time, and it show very slow results , which we didn't expect to be that slow.

...

Those benchmark results were biased by additional overhead. Measuring the actual transfer speed is quite hard. Added to that, those tests used the CPDMA engine which some of AMD's developers said was slow (although others said it was fast enough on Polaris). I don't have a way of measuring the performance of the DMA engine used by the command processor & GPU becuase I can't use it independently of the command processor and GPU.

My conclusion was that there's nothing wrong with the DMA transfer rates and PCIe settings. I could be wrong, but it's more likely that the performance killing bottleneck(s) lies elsewhere.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: Need help with shader optimisation
Quite a regular
Quite a regular


See User information

Go to top

  Register To Post
« 1 (2)

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project