Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
97 user(s) are online (48 user(s) are browsing Forums)

Members: 0
Guests: 97

more...

Headlines

 
  Register To Post  

« 1 ... 31 32 33 (34) 35 36 37 ... 42 »
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Just popping in
Just popping in


See User information
Here are my X1000 results:

02.Quake3Map ~114 FPS
16.Quake3MapShader ~60 FPS
18.SplitScreen ~28 FPS
20.ManagedLights ~235 FPS

PROCESSOR: P.A. Semi PWRficient PA6T-1682M
VERSION:
Kickstart version 53.89
Exec version 53.89
Disk version 53.15
graphics.library 54.226 (13.09.2016)
RadeonHD.chip 2.22 (25.03.2017)
Warp3DNova.library 1.65
ogles2.library 2.8

GfxCard: RADEON HD 7800

Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@TearsOfMe
Thanks. So its not because of missing DMA on x5k. Less to check then..

But from trace logs it visibly that bottleneck is drawing function itself. But why, and what make it be THAT slower than on old amd1.6ghz with crappy inbuild gfx board that unknown still.

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Not too shy to talk
Not too shy to talk


See User information
@kas1e
>not the VBO creation/handling it seems

Are you sure ?
(When I coded Wazp3D57 with Nova rendering) I have tried differents methods for updating a VBO but it seems to be slow

Perhaps having a patched MiniGL that will update (say) 11 times the VBO will allow to know how much time a VBO update is REALLY during in a REAL program
(delta time / 10)

I mean when I was testing Cow3D on Wazp3D57-> Nova it was (say) 80 % of a real waRp3D (massive VBO update but one time) so bandwith seems +- ok

But when I was testing Quake2 (real life test) it was 1-2 FPS ... weird

An other thing:
Also when I was testing a simple raymarching test i found that Nova GLSL seems to have very strange bugs: I mean all is fine when GLSL code is compiled but strange artefacts appears like the GLSL code was computing badly at some pixels (like a rounding fpu bug)in frag shader










Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@thellier

Quote:

Are you sure ?
(When I coded Wazp3D57 with Nova rendering) I have tried differents methods for updating a VBO but it seems to be slow


In tracers log i post, you can see at end 2 tables: warp3dnova profile and ogles2 profile. So in warp3d one you can see how much and what function of warp3dnova take time. And creating of vbo there is okish, didnt take all time.

Besides, VBO used everywhere in other games/examples, but only those examples slow that much. So they seems to do something which make our drivers be that slow in compare with even just amd1.6ghz with inbuild intel. I even didnt say about modern computers, but be THAT slow, show that something is really wrong somewhere.

Quote:

Also when I was testing a simple raymarching test i found that Nova GLSL seems to have very strange bugs: I mean all is fine when GLSL code is compiled but strange artefacts appears like the GLSL code was computing badly at some pixels (like a rounding fpu bug)in frag shader


Nova shaders compiler still WIP, but lately it start to be better and better. It sadly have optimisation disabled because of some bug, but visual bugs can be checked and reported, so Hans may fix it.

If you have that raymarching test, we can test it on last nova, then test it on ogles2 on windows (to be sure your code correct), and the create a bugreport. I can help with tests (but lets create another topic about ?)

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Just can't stay away
Just can't stay away


See User information
@kas1e

I tried 02 QuakeMap test, letting it run for at least 10 seconds and it seems that (roughly):

OpenGL:
- DrawElements 90%

Nova:
- BufferUnlock 50%
- DrawElements 30%

Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@Capehill
Yeah, i just wait 15 seconds now too to stabilize all thigs, and in my case i have

OpenGL:
- DrawElements 90%

Nove:
- DrawElements 60%
- BufferUnlock 15%

and all other things take other % bit by bit.

(at least i check that table where i have "% of 1880.049764 ms")

There is my new log (22mb unpacked, 1mb packed):
http://kas1e.mikendezign.com/aos4/irrlicht/02.quake3map_trace.zip


Dunno through what it can say to us.. I mean this time it didn't explain still, why it slower that much in compare with old amd1.6ghz with shiti gfx card

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Just can't stay away
Just can't stay away


See User information
@kas1e

I should have mentioned that the tracing was disabled in my test. Anyway, hope we can do more comparable measurements soon.

Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@Capehill
Quote:

I should have mentioned that the tracing was disabled in my test.


With disabled tracing i have almost same results as you:

ogles2:
DrawElements - 93%

nova:
BufferUnlock - 50%
DrawElements - 30%

We talk about "% of 8344.085574 ms" table right, not about % of CPU time or any other ?


Edited by kas1e on 2019/9/15 7:40:12
Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Just can't stay away
Just can't stay away


See User information
@kas1e

Yes, I mean exactly that column (% of ms).

Our results are now aligned. BufferUnlock takes more time than DrawElements in Nova in that QuakeMap test.


Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@Capehill
Talking with ptitSeb about all that, that what he say:

---
VBO are 1 thing: vertices data in VRAM, so in the graphic card memory ready to use. So the main thing that consume time is the time to transfert the data in VRAM. If you reuse those data, you just activate the VBO, no need to transfert the data.

So, when a software/game use VBO, it create one, fill it with data, and then simply use the transfered data (sometimes, it changes pert of the VBO, to update some of data).

The Transfert of data is traditionnaly at the "Unlock" part of the VBO (lock gives you and address where to put the data, Unlock transfert from that address to VRAM).

On the Amiga, the transfert to VRAM can be slow if you don't have some kind of DMA for that (that's the 1st thing), and all, all data need to be in LittleEndian, because the GraphicCard is LittleEndian (so you need to analyse the VBO, to know what data need swapping, and what data doesn't).

So yes, I think this VBO transfert can be a bottleneck.
---

But we also tested with working DMA in graphics.library on x1000, and results are still the same. Maybe by DMA he mean something like GART there, dunno. That DMA in graphics.library probabaly other kind of DMA expected from VRAM transfers ?

Interestengly also, that in the documentation about BufferUnlock of warp3dnova, there is no mention about any big->little endian conversion ..


Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Just can't stay away
Just can't stay away


See User information
@kas1e

http://www.amigans.net/modules/xforum ... id=115027#forumpost115027

Even if graphics.library has DMA support on X1000, is has "only" capability to transfer bitmap data to VRAM (as far as I know). I suppose Nova needs a way to pull data from RAM using DMA and this is what GART would provide.

Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@Capehill
But isn't "bitmap" are the same data ? I mean , if there is DMA to transfer bitmap to VRAM (and bitmap are usuall data, or not ?) , then it should the same transfer any other data to VRAM ?

Or DMA in graphics.library its to transfer to VRAM only some specific data which no one use ?:)

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Not too shy to talk
Not too shy to talk


See User information
> the same data ? I mean , if there is DMA to transfer bitmap to VRAM

Yes,For Wazp3D57 I have made some code that hack a bitmap transfer to be used to copy vertices to VBO.
(ouuh what a hack but it works )

It changed almost nothing to speed on Sam460 & X5000 but dont know if those machines use dma for that

>about BufferUnlock of warp3dnova, there is no mention about any big->little endian conversion

IMHO what I understood:
There are differents method for updating the VBO
but in fact Nova just read and/or write the data
When you lock it (can) read/reorder the VBO data to a buffer that you will access.
When you unlock it (can) write/reorder the buffer to the VBO data.
As reordering is done Nova side you never accesss to real data that are on the GPU VRAM but on a reordered buffer

You can also do write only (ie write all new vertices values from your buffer)
or read only (ie read some GPU datas)
or read/write (ie change some vertices)

Certainly let Nova do the reordering was not a good idea as datas are then acessed several times (vs a cpu that will write to real GPU vram directly the reordered datas)

See below Nova doc
// W3DN_STATIC_DRAW: Written:(CPU) once Read: rendered many times
// W3DN_STATIC_READ: Written:(GPU) once Read: CPU many times
// W3DN_STATIC_COPY: Written:(GPU) once Read: rendered many times
// W3DN_DYNAMIC_DRAW: Written:(CPU) occasionally Read: rendered many times
// W3DN_DYNAMIC_READ: Written:(GPU) occasionally Read: CPU many times
// W3DN_DYNAMIC_COPY: Written:(GPU) occasionally Read: rendered many times
// W3DN_STREAM_DRAW: Written:(CPU) frequently Read: rendered a few times
// W3DN_STREAM_READ: Written:(GPU) frequently Read: CPU a few times
// W3DN_STREAM_COPY: Written:(GPU) very often Read: rendered a few times



Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@Capehill,Theiller

Hans explain those things a bit for me, and that what i understand :

That "DMA support in graphics.library" in Sam440, Sam460 & x1000 is not real DMA , its just a hack, which used their CPU's DMA to speed up those ram->vram transfers , but only for internal use inside of graphics.library , and only for bitmaps. I.e. its "DMA", but CPU's DMA and not real DMA or GART of anything of that sort. Just some little "speed up".

In other words, is of no use for drivers (like for minigl, for warp3dnova, radeon drivers, etc), and more of it , its only for use for those apps which use graphics.library and rely on those parts which is "hacked" inside of graphics.library to speed things ups (named those "graphics.library's bitmaps).

Probabaly, it may help somewhere and not only in benchmarks (at least it was added not just because), but it didn't help at all with drivers, so that explain why we didn't have a single difference when test irrlicht engine examples between x5000 (without that "cpu dma hack") or on x1000 (with that "cpu dma hack"). Those examples done for gl4s, which works over ogles2.library, which works on top of warp3dnova, which in turn, didnt have any kind of DMA acceleration for RAM->VRAM transfers. There is only proper implementation of GART can help.

Quote:

Certainly let Nova do the reordering was not a good idea as datas are then acessed several times (vs a cpu that will write to real GPU vram directly the reordered datas)


As i aware now, warp3dnova's BufferUnlock() do not only writing from RAM to VRAM, but also do endian conversion from big-endian to little-endian (as gfx card is little endian). So we have 2 stop factors there :

1. no real DMA (GART) is used , that mean transfering from RAM->VRAM are slow. We should be happy we even have something usable without it. We at least have 100fps in quake3 without GART, that for sure not bad.

2. Endian conversion inside of BufferUnlock(), may slow all things down, expectually if it didn't compiled with -O3 optimisation enabled by any of reassons. As it also mean buffers, working with them, etc, so pretty possible that add another bottleneck.


All of this probably explain well why some things works on minigl and on gl4es almost the same by speed, like that quake3, lugaru, supertuxkart : all those games do a lot of drawing per frame, which is limited by speed because of no GART, so both minigl.library which working over warp3dnova, and gl4es , limited by the same.

But that code which writen "right", i.e. not thousands of draw calls per frame with lots of data, those ones speeduped well by usage of VBO and co.

Imho, but i think pretty close to truth.

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Not too shy to talk
Not too shy to talk


See User information
@kas1e
Quote:
Interestengly also, that in the documentation about BufferUnlock of warp3dnova, there is no mention about any big->little endian conversion

That's right, that documentation is not in the Unlock function's desc but in the Lock functions's

In VBOLock's description it is mentioned that you must call VBOSetArray first to tell Nova the data types and other info so that it knows for which parts of the buffer's data it has to perform endian conversion. Although it's not explicitely stated, you may safely asume that the actual conversion is then done inside BufferUnlock (and then back again in VBOLock if a read is requested, unless the driver keeps a copy of the original data, no idea if it does).
If the conversion would take place later, internally, then this info in the docs wouldn't make sense because then you could tell Nova the data layout later (prior to first use) as well.

@kas1e
@thellier
The fun part with that VBOSetArray convention:
you can (ab)use it to trick Nova into not doing its slow endian conversion. Simply tell it beforhand that the VBO is just a package full of plain bytes

But keep your seat, yes, it works, but unfortunately another Nova-slowdown-area kills the potential gain again:
you may remember that ogles2 contains a workaround for plain byte stuff like RGBA8 data. I found upload of such endian-free-simple-data to be so extremely dead-slow for unknown reasons, so the lib converts those to RGBAfloat32 data... You'd expect it to be much slower then (the 4x byte-to-float conversion, 4x as much data to transfer) but it's muuuuuuch faster than letting Nova do the simple job on the plain bytes.

So, unfortunately to avoid Nova's endian-conversion also means to switch to that slow byte-data layout, so we end up with a netto fps loss.
Damn I will experiment a bit more, but this one looks like a dead end. So we'll have to wait for Hans to improve speed and eventually also implement that requested manual-endian-conv-conversion.

Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@Daytona675x

Quote:
The fun part with that VBOSetArray convention:
you can (ab)use it to trick Nova into not doing its slow endian conversion. Simply tell it beforhand that the VBO is just a package full of plain bytes

But keep your seat, yes, it works, but unfortunately another Nova-slowdown-area kills the potential gain again:
you may remember that ogles2 contains a workaround for plain byte stuff like RGBA8 data. I found upload of such endian-free-simple-data to be so extremely dead-slow for unknown reasons, so the lib converts those to RGBAfloat32 data... You'd expect it to be much slower then (the 4x byte-to-float conversion, 4x as much data to transfer) but it's muuuuuuch faster than letting Nova do the simple job on the plain bytes.

Huh? If a VBO contains only uint8 data, then it should be using a straight copy routine (one that uses doubles if possible).

You do need to make sure that *all* VBO arrays are 8-bit or disabled (W3DNEF_NONE), otherwise it'll fall through to the complex case of handling mixed data.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Not too shy to talk
Not too shy to talk


See User information
@Hans
Quote:
You do need to make sure that *all* VBO arrays are 8-bit or disabled (W3DNEF_NONE), otherwise it'll fall through to the complex case of handling mixed data.

No, unfortunately it's not like this. Even setting all "unused" arrays to W3DNEF_NONE and size / stride 0 doesn't change anything.
The only thing that helps is to create a simple 1 array VBO in the first place. Which I should have done and usually do for pure index-VBOs, but which wasn't enforced in this case here indeed, thanks for pointing me at it

And oh yes, that makes a difference indeed! However, not for the "own" vs. "Nova" endian conversion, there's no measurable difference here in this simple 1-array-scenario.

But, damnit, all this revealed again just how slow Nova buffer copy becomes as soon as you don't have the most trivial 1 array VBO layout! Here it's the difference between 7 and 30 fps! And this happens for every VBO you create with 2 or more arrays inside.

Now the thing is:
obviously there is huge optimization potential here. Whatever you do in your multi-array-copy function, it's very bad. And if ogles2 could get rid of it that would result in an incredible speedup for sure.

But: unless you make VBOSetArry with W3DNEF_NONE work as you described above, I cannot implement it, because obviously an 1 array VBO is useless in that case. Or is any special parameter combination required for VBOSetLArray with W3DNEF_NONE to make it work as promised?

Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Not too shy to talk
Not too shy to talk


See User information
@Daytona675x
>VBO is just a package full of plain bytes
But will it avoid the "write from buffer to GPU vram" part ?
I mean perhaps Nova will proceed as usual : copy reordered data from a buffer to vram but only skip all reordering for each items

>RGBA8 data converted to RGBA float
Interesting too

Anyway we just need a new VBO mode (let it call W3DN_RAW_ACCESS) that dont copy GPU vram to/from a reordered buffer but only let it accesses the VBO data at their place in the GPU vram (so it is the cpu that will manage the reordering & copy)


Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Not too shy to talk
Not too shy to talk


See User information
@thellier
Quote:
But will it avoid the "write from buffer to GPU vram" part ?

In theory it could. I mean, we explicitely tell Nova "look, what's coming is just a plain byte buffer, no conversion whatsoever required and write-only" even before locking the VBO! So in theory it could give us a pointer into VRAM at this point in such a scenario. But I bet it doesn't. Actually, I know it doesn't because Hans just told us:
Quote:
Hans:
Huh? If a VBO contains only uint8 data, then it should be using a straight copy routine (one that uses doubles if possible).

If we had direct VRAM access then no such extra copy at his end would be necessary at all

Quote:
I mean perhaps Nova will proceed as usual : copy reordered data from a buffer to vram but only skip all reordering for each items

Yes, that's the case. But while we don't have direct VRAM access we still have a potential up-to-factor-4 (7 vs 30 fps in my test) performance gain around the corner with the plain-bytes trick nevertheless - if VBOSetArray with W3DNEF_NONE would work as promised.
But yes, completely getting rid of the Nova-copy in such a scenerio would be great. This here is meant to be a hack using what we already got (well, with what I thought we already got). And for sth. like that the potential gain is unbelievable big already.

Quote:
>RGBA8 data converted to RGBA float
Interesting too

Yes, but that's not everything ogles2 does under the hood to workaround some Nova slowdowns through type-conversions. E.g. it also silently converts 16bit index data to 32bit because native 16bit indices are slow (and 8bit indices because they aren't supported at all). ogles2 even does so if you supply your indices in your own VBO - then not your VBO is being used but an internal 32bit version of it

Quote:
Anyway we just need a new VBO mode (let it call W3DN_RAW_ACCESS)

Yes, that won't hurt, the potential speedup is gigantic. Although for now I'd be happy already if VBOSetLArray(W3DNEF_NONE) would do what it's supposed to do.

Go to top
Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress
Home away from home
Home away from home


See User information
@Daytona675x

Try setting the unused arrays to UINT8 as well. That should work for now, until I add a proper option to disable endianness conversion.

@thellier

Quote:
Anyway we just need a new VBO mode (let it call W3DN_RAW_ACCESS) that dont copy GPU vram to/from a reordered buffer but only let it accesses the VBO data at their place in the GPU vram (so it is the cpu that will manage the reordering & copy)

That would be a workaround for the lack of GART/DMA. We really want to move away from the CPU accessing VRAM directly, because it's really slow.

At some point we'll finally have GART support, which will allow us to use DMA to copy data to/from VRAM, and even allow the GPU to slurp data directly from RAM.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top

  Register To Post
« 1 ... 31 32 33 (34) 35 36 37 ... 42 »

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project