Forums - All Posts - The Amigans website

Forum Index

Board index » All Posts (Daytona675x)

Bottom

« 1 ... 14 15 16 (17) 18 19 20 ... 23 »

Daytona675x

Re: Replacing 3.5" harddisk with 2.5" SSD on my SAM460ex

Posted on: 2018/12/15 18:45 #321

Not too shy to talk

@JosDuchIt
Quote:

tested a lot of them and only had succes with a Samsung 840 evo (250 GB)

Yes - but only if it's about the sam460ex internal SATA.
If you use a dedicated additional PCI SATA card then I suppose pretty much any SSD will do. It's only the built-in SATA which is problematic.

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: Replacing 3.5" harddisk with 2.5" SSD on my SAM460ex

Posted on: 2018/12/15 12:02 #322

Not too shy to talk

@JosDuchIt
Are you going to use an extra SATA card or the sam460ex' internal connector? If you're trying the latter you'll run into guaranteed trouble with certain (most?) SSDs!

The sam460ex' internal SATA controller is extremely picky regarding SSDs. I tried countless devices (some Kingstons, Crucials, Corsairs, SanDisk, you name it) and the only one that worked flawlessly for me turned out to be a Samsung 840 evo (250 GB). Any other I tried either didn't work at all or quickly resulted in corupted data.

My filesystem settings are:
SFS\00
Blocksize: 4096
Buffers: 600
Maxtrans: 0x7FFFFFFF
Mask: 0xFFFFFFFE

Runs rock-stable since ~3.5 years now

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/12/6 14:57 #323

Not too shy to talk

@Hans
Optimizations turned off is most likely the culprit for the performance penalty here, at least it could easily explain a penalty of that order of magnitude.

Weird with your smart pointers though. STL? For ogles2 I use my own templated ref-counting smart pointers without any issue s (I like to write my own stuff and avoid the STL whenever I can). Crap.

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/12/5 13:57 #324

Not too shy to talk

@Hans
Quote:

Do you really want the responsibility of doing all the endianness conversion?

No, the driver knows best which endianness is needed.

Quote:

If there's enough interest, I could add the ability to check the GPU's endianness and disable the endianness conversion, making it the app's/game's responsibility (or the GLES2 wrapper's).

The only thing that comes to my mind where it could in theory result in somewhat better performance is if you are updating a VBO every frame and have direct access to the VRAM (if you don't then another internal copy is required anyway and then e.g. a tight stwbrx loop is likely better) and if you can do the conversion without poluting your code. IMHO it's not worth the trouble.
Althoug... if you can offer that feature for free and optional and if you don't have more important things to do, then go ahead

Quote:

There's no bug in the way; it's all about optimization. However, I'm not sure why it's 30% slower, or how to optimize it. We don't have tools that could identify the bottlenecks (e.g., cache misses, etc.), so it's more guess work than anything else. I suppose I could try insert some cache prefetching instructions to see if that helps.

Maybe the opposite is better. Did you by accident call dcbz on the (VRAM) destination? This could probably result in such a dramatic slowdown.
Also: are you 100% sure that there aren't any debugging artefacts remaining?

Other than that: we are talking about Roman's performance numbers here. He has an X5000 and from my experience (!) its automatic cache prefetching works pretty well, in contrast to other PPCs. So, unless you have any weird access pattern (which you should not have?) manual prefetching shouldn't change the picture toooo much in this case here.

But it's hard to come up with concrete hints without source, of course

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/12/5 12:54 #325

Not too shy to talk

@Hans
Aaaah, a true classic

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/12/5 10:59 #326

Not too shy to talk

@Hans
Looking at the crash code line
Quote:

stw r9,0(r7)

and at the whole fragment
Quote:

7e7d4abc: 91260000 stw r9,0(r6)
7e7d4ac0: 812400b8 lwz r9,184(r4)
*7e7d4ac4: 91270000 stw r9,0(r7)
7e7d4ac8: 812400c0 lwz r9,192(r4)
7e7d4acc: 91280000 stw r9,0(r8)

while considering that r7 and r8 are both 0 in this stack trace, I'd like to place a bet on the following

Did you maybe forget to check the optional pointer-return-parameters of GetStencilFunc being NULL or not?

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: how PPC Cpu handle "double" and "float" in terms of speed

Posted on: 2018/11/21 8:56 #327

Not too shy to talk

@kas1e
If the game has a well tested explicit single-prec-code-path, I strongly suggest: go for it

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: how PPC Cpu handle "double" and "float" in terms of speed

Posted on: 2018/11/21 8:22 #328

Not too shy to talk

@kas1e
Roughly spoken: for the CPU it usually doesn't really matter too much if it's double or single precision you're using. Single precision values are converted to their double representation for internal use when (un)loaded into/from fpu registers, for free. What's not for free are explicit conversion commands.
On e.g. Tabor it's different, there the FPU internally distinguishes between single and double precision and also offers simple SIMD capabilities for 2 single precision values. AltiVec btw. wants single precision too.
For many commands you can specify whether to use single or double precision (e.g. fadd vs fadds). As far as I know some commands are a bit faster in single-precision mode.

Anyway, in real world you almost always want to follow this rule of thumb:
use the lowest precision required to do the task.

double precision puts twice the pressure on caches and memory, which is the main reason why using single precision most often results in (significant) higher performance on our systems.
I can definitely tell you: if you manage to kill your cache, your performance is gone, you won't get it back, simple as that - and the other way around: you can do tons of calculations on your data if you manage to have your stuff in the cache in time.
Therefore usually single precision floats are one of your best friends when it's about performance.

However, *don't* naively change a programs float / double usage unless you know exactly what you're doing. Although for example the avg. 3D game's vertex data and internal calculations are most often done with single precision data you shouldn't just blindly tell your compiler to compile everything for single prec! You may end up getting very subtle bugs.

The Vampire guys tapped into such an issue recently (which was fixed quickly). They gave their FPU a slightly too low internal precision (which is somewhat equivalent of telling your compiler to always use single prec). While most things worked, the timing of certain demos just went bogus depending on your system time: the demo would run fine if in the year 1970 and it would mess up when running in 2018, a phenomenon called "catastrophic cancellation", which may happen if you take two really big numbers which are close to each other, subtract the one by the other - and falsely expect the result to still have any sort of precision or value other than 0

With a bit more precision those calculations with those input values were okay again.

So, to sum it up:
- use single prec whenever you can for best performance because of cache/memory.
- use doubles when you have to.
- don't change the float / double behaviour of other people's code unless you know exactly what you're doing.

@Deniil
Are you sure that you really measured the right thing (e.g. not some explicit conversions) and a real-world situation (e.g. not just some tight loop with a fixed 32 byte dataset)? Because that's really not the result one would expect.

Edited by Daytona675x on 2018/11/21 8:41:34

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/11/15 14:08 #329

Not too shy to talk

Quote:

kas1e: "In the Warp3dnova itself missing at moment (taken from Daniel's ogles2.library readme): sample coverage, dithering, compressed textures and render-to-texture, which mean it also missing in ogles2 because of that. "

Hans: "Correction, render-to-texture has been in there for ages now (over a year)."

Sorry for this, I forgot about this readme's section... it's outdated. In the change-log there's correct information though. Of course render-to-texture exists, otherwise e.g. Spencer would have a hard time to do its shadow effects

This feature is inside Nova since version 1.37 IIRC and our OGLES2 supports it since version 1.14 (30. April 2017).

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/10/18 14:48 #330

Not too shy to talk

@kas1e
Quote:

Dunno about that, at least ogles2 works fine over nova. Can you bring some test cases to Hans so we can see if anything wrong at all ?

I asume he's simply talking about the fact that updating VBOs in general is a comparably slow process. Which is of course simply in the nature of things. And until Nova gets DMA/GART support here things are even slower on AOS4 compared to e.g. PC.
Those are all well known facts, nothing new at all. And of course Hans knows that

So I guess this isn't something anybody needs to come up with test cases for

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: Removing an OS4 .library from memory

Posted on: 2018/10/17 15:39 #331

Not too shy to talk

http://os4depot.net/index.php?functio ... e=utility/shell/flush.lha

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: Warp3d and AHI problem on Pegasos 2...

Posted on: 2018/7/17 17:33 #332

Not too shy to talk

MiniGL is not guilty, apparently the Warp3D implementation is, or more precisely the R200 driver in combination with certain Peg2 boards.
I once tried to find the cause of this inside the R200 driver, but without luck, I gave up at some point.
I found out that the symptoms can be minimized by lowering the work of the rasterizer. So especially stuff like compressed textures, lower display res, 16bit depth, etc. significantly lower the chance / the frequency of those symptoms to appear. With some games using compressed textures was good enough to make the problem vanish (which is of course no solution).

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/6/13 7:13 #333

Not too shy to talk

@Hans
Quote:

That'll change when uniform booleans are finally supported.

Yes, if Nova / the damn SI then expects 1-byte DBO data internally (and from what you told me it will

), then this will become an issue. Luckily the respective DBO-update-code is well isolated inside ogles2 and integrating a type conversion of this kind should be rather (!) easy.

@kas1e
Quote:

In case with q3 exactly, in 1600x1200, differences is about 3fps. I.e. workaround done in ogles2 better, and give 3+ fps in compare when we has workaround in gl4es.

Hehe

But yes, actually something like that was to be expected after looking at the workaround in gl4es. ogles2.lib does the conversion on the fly while copying the data to the internal VBO. That this is faster than to loop over the memory twice is no wonder.

Quote:

Probably when it all will be done in hardware, in warp3d, and without workarounds, it will give us another little speed increase for few more fps ?

Of course it will be faster to don't do those workarounds, but I can't say by how much. It depends on how often the respective game or lib triggers it per frame and how big the data is (that's of course also of interest when it comes to sending that stuff to VRAM, e.g. 4 bytes vs. 16 bytes per color per vertex). And it depends on how the game works: if a game would use true VBOs (which aren't modified all the time) then the impact would probably be near to zero, because all the conversion and sending to VRAM would ideally be done only once per polygon-soup.

Quote:

Anyway, ptitSeb say it isnt done yet, so all VBO in gl4es in whole only emulated for now, and that what was done as initial support of VBO, was just fast-test-hack, so -> no VBO at moment

Alright, then I'll stick do my boing-ball test for now

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/6/12 11:45 #334

Not too shy to talk

@kas1e
Great!
Did you check the performance impact of the workaround (by comparing it to a version where you did the ubyte-float-conversion of the color inside gl4es)?
Next to come is the workaround for VBOs. Do you have sth. for testing here too?

@Hans
Quote:

Wish I'd thought of the existing Nova limitations sooner.

And I wish I had not overlooked this limitation info in this readme

Quote:

Sorry for the confusion. I'll do my best to clarify, and also correct what I've been saying (got myself confused too ).

And sorry for my initial rant

Although it's a real hefty limitation that came as a big surprise to me, it was my fault to miss the note in the first place.

Quote:

NOTE: DBOs are still 32-bit only.

Luckily that's no problem. The respective commands of ogles2, namely glUniformXX, only operate with 32bit-floats and 32bit-ints. So this limitation "only" affects VAs, and out of those only VAs that are defined via glBufferXXX and glVertexAttribPointer (the glVertexAttribXf functions only work with floats).

Quote:

Quick summary: restrict each VBO to data of one size each (8-bit data in one, 16-bit in another, etc.). The latest beta sets the VA descriptors correctly based on type, so VAs of all types might work provided the restriction above is observed (untested so far).

I could do this for the internal client-RAM-emu-VBOs, since I have full control over those. However I already made the workaround for that situation, which seems to work well looking at kas1e's latest feedback, which essentially patches the data when the emu-VBO is being built up internally, no need for extra VBOs here.

For "real" VBOs supplied by the user via glBufferXXX etc. this could be exploited though. However in my current wip workaround for this I already took another route:
when I detect that a VBO contains critical element-types, then I create one additional internal "sub"-VBO and copy-convert the critical data over to that one when a draw-call with that VBO is being issued and if there have been data modifications since the last sub-VBO-refresh. So it's one sub-VBO with x float-arrays for all x critical arrays of the user-supplied VBO.
Patching the original VBO is a no-go, of course. Just imagine the fun if somebody does a partial glBufferSubData...

Adding the patched arrays at the end of the original VBO is no good idea neither, this just complicates all the internal book-keeping.
The solution I'm implementing now is the "easiest" in this regards.

In both cases interleaved and linear memory layouts are supported, of course.

However, I won't put too much effort into optimizing those workarounds. I hope that this is only a temporary necessity and that it can be removed rather soon again

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/6/11 20:32 #335

Not too shy to talk

@kas1e
With extensions enabled? Note that this first workaround-version only tackles the situation if you don't use VBOs of any kind (by that I mean ogles2 VBOs)!
So what this first workaround is supposed to patch is if e.g. gl4es does a call like

glVertexAttribPointer(1,4,GL_UNSIGNED_BYTE,TRUE,28,pointer_to_first_color);

where "pointer_to_first_color" is really a pointer and no VBO is bound. So it's the typical old-school vertex-data-via-client-RAM-pointer setup.

Your statement about "enabled extensions" somehow sounds as if you're trying to use VBOs? That's not done yet.

Anyway, for the above situation ogles2 already has to manage an internal (hidden) Nova-VBO. Therefore the workaround is not that hard (it only costs performance and RAM is being wasted):
if such a glVertexAttribPointer call is made, then copy-convert to float (incl. eventual normalization) and tell Nova that it now got std. floats in the internal VBO, instead of simply copying it over.

I have no idea where endian-issues should come from (just as I don't know why not supporting uchar8 x 4 has anything to do with any endian-issues in the first place).
For Nova those are simple floats now, just like e.g. the xyz coordinates.

Anyway, check out the fresh version on the FTP. I found a typo that probably was the reason for this ;)

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/6/11 14:35 #336

Not too shy to talk

@kas1e
Quote:

Does OGLES2 transfert GL_UNSIGNED_BYTE as is or does it transform to GL_FLOAT ?
if NO: can GL_UNSIGNED_BYTE (with normalization) be implemented in Warp3d.
If YES: there is something odd in that conversion
if NO: ptitSeb can made a workaround in gl4es (unless OGLES2 handled that, as GL_UNSIGNED_BYTE support by GLES2 driver is mandatory).

ogles2 followed the Nova docs and what seemed to work until now...
As such it does not do any conversion of GL_UNSIGNED_BYTE VAs, of course.

If it turns out that Nova cannot handle this, then I am going to interprete this as a severe Nova bug. I won't add any workarounds in ogles2 for this but wait until it gets fixed, sorry.

EDIT:
see EDIT of previous post.
In the meantime I found out that I really overlooked a limitation mentioned in one of the readmes.
So in contrast to what I said above I will add an internal uchar8 converter to ogles2.
This will stay active until this gets fixed inside Nova.
Don't add such a conversion to gl4es, it really makes no sense to mess around there.

EDIT #2:
The first workaround has been added, check it out on my FTP, ogles2_wtf_ubyte_1.zip
This version will internally convert every GL_UNSIGNED_BYTE VA for client memory usage. For safety I am converting to float internally, I'm not relying on this new 32bit integer support, because I better prefer to do my own normalization and because that way it should work with previous Nova versions too.
Note that I'm only patching GL_UNSIGNED_BYTE for now since this is the one that's used in 99% of the non-32bit use-cases.

Next to come is the patching of VBOs supplied by the client application. This will be really ugly since internally a totally different looking VBO has to be created and maintained as soon as there's at least one GL_UNSIGNED_BYTE VA. :P

Edited by Daytona675x on 2018/6/11 15:24:47
Edited by Daytona675x on 2018/6/11 17:03:00

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/6/11 14:26 #337

Not too shy to talk

EDIT:
taking another close look at one of the two readmes, in the "known limitations" section of the one, reveals that there is indeed information about such limitations. Namely only 32bit float vertex data is supported, which renders 4x8 bit ubyte color VAs offcially unsupported indeed.

Okay, I have no idea how I could overlook this. Most likely because a) the reason mentioned regarding endianness makes no sense in this case, b) because it (seemed to) worked so far, c) because it's something so basic and d) because we had lots of talk about ubyte colors and the idea of this not being supported at all didnt pop up until now.

So, due to the fact that I overlooked it and did not take care of this limitation (if it exists for real, as being said, so far it seemed to work), I will add a workaround for it to ogles2.
This will probably result in a severe performance penalty, especially if using the lib in a non-VBO-way.

However, now that I became aware of this limitation:
IMHO this should be the very very first entry on the Nova todo-list!
4x8bit color (or similar) VAs are used extremely often to have compact vertex structures, so this can be considered an absolute basic building block.

Also the doc should be edited to not show completely misleading / false information. The doc is what people use when coding and this limitation requires a fat warning note in VBOSetArray "only supports this and that for now".

Before EDIT:
@Hans
Quote:

However, the W3DN_SI driver currently can't handle anything other than 32-bit datatypes (it says so in the readme). So that excludes GL_UNSIGNED_BYTE.

The very latest Nova 1.53's readme from two days ago suddenly contains a new information saying:

Quote:

Support 32-bit integer vertex attributes (both normalized and unnormalized). NOTE: Only 32-bit datatypes supported at present

So from this it now sounds as if e.g. VA colors composed out of 4 unsigned char / GL_UNSIGNED_BYTE suddenly became unsupported now, excluded en passant with this readme entry.

But: This is real news. OGLES2 uses ubyte VA colors with normalization since day one without issues (well, at least I thought so...).
And so far no readme or doc of Nova contained info saying that this was not supported!
Also, although I'm not using it: until two days ago I also believed that 32bit VA ints would work. The docs didn't give any reason not to think so. So what's presented as news in the latest readme is sth. everybody reading the docs until now expected to work anyway.

The only limitations of Nova of that style that were known until two days ago regarding 32bit was if using the data as indices for vertex data. However, even that limitation has been lifted long ago, so that 16bit indices are possible natively.
And then there was / is the "recommendation" to internally align every VA to 32bit.

That was it. There was zero hint until now that unsigned byte VAs and thus the typical RGBA8 colors won't work or won't work reliably.

Quote:

So the driver has to convert the endianness as its copied to the GPU.

As kas1e correctly said, there is no endian problem with single unsigned bytes. There is also no endian problem with 4 RGBA unsigned bytes. The client application has to take care that the components are in the correct order, neither ogles2 nor Nova have to be "smart" here.
Endian-issues are no explanation for missing / broken unsigned byte VA support.
Besides that (although not of interest in this case here) the doc for VBOSetArray explicitely says that calling it before VBOLock would "allow the driver to know what endianness conversion to perform beforehand". It does not mention that such endian conversion is not implemented or broken.

Quote:

Right now it assumes that everything is 32-bit, and it probably returns an error if you try to use 8/16-bit vertex data.

"Probably"? Until now Nova did not return an error. It simply swallowed such VA colors / VBO layouts and it worked. Well, unless the semi-random vertex-trash e.g. Q3-gl4es generates under certain circumstances and which keeps us puzzled for a long time now is such a "probable" effect...
Which in turn would raise the question why we weren't informed that such VA attributes are potentially broken in Nova despite the tons of discussions / reports on this topic.

Quote:

So long as you're feeding it with floats and have given it the right pointers, strides, etc., it works.

Now it's even floats only all of a sudden?

Sorry, man, clarification please! Apparently you cannot rely on the information in the docs at all?
Simple question: which are the allowed / (not just probably) functional values for the following parameters of VBOSetArray?

W3DN_ElementFormat elementType:
so far the doc didn't forbid any (the before-mentioned index limitation aside). Nothing about 32bit float only.

BOOL normalized:
for which elementTypes is it working? So far the doc said that it worked for ALL signed / unsigned integer types; actually for everything but floats.

Hopefully this is all just some kind of weird misunderstanding

Thanks,
Daniel

Edited by Daytona675x on 2018/6/11 15:21:39

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]

Topic | Forum

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2018/5/30 10:22 #338

Not too shy to talk

@samo79
Quote:

Perhaps for the MiniGL bug you may contact Daytona, recently he still worked on the MiniGL side

Actually the recent bug fixes were only a side-effect of what I was really doing:
adding the std.PPC<->SPE-ABI bridge to the libs so that programs compiled for std. PPC could attach to (at least internally) native SPE MiniGL / mglut libs.
Such a bridge is always necessary if std.PPC-code shall talk to PPC-SPE-code and float-parameters are involved, because the PPC-FPU- and PPC-SPE-ABIs are incompatible when it comes to floats, naturally, because usually float parameters are passed via FPU-registers, which simply don't exist on PPC-SPE (on such systems such parameters are expected to be in GPRs instead).
Anyway, I simply fixed what I saw when doing some cleanup. I don't plan to otherwise invest much time into regular MiniGL, especially because my own MiniGL replacement lib is progressing well (some of the MiniGL fixes also came up because I noticed differences when comparing my lib with the output of std. MGL, which in some cases was due to std.MGL bugs).

That Q3-issue with the broken geometry in the mirrors / portals here smells like clipping-bugs in MGL. However, I noticed that W3D for e.g. old Radeons and W3D for SI behave different when it comes to clipping (and I don't mean R200-guard-band-related effects). It may also be that this here is an incompatibility (compared to other earlier W3D implementations) inside the W3DSI-driver.

EDIT: no, I just checked it on my sam440, it happens there too, so you can safely asume that it's a MiniGL-bug, most likely related to clipping.

Edited by Daytona675x on 2018/5/30 10:54:05

[Facebook] [YouTube Channel] [Atomic Bomberman Discord]