Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
30 user(s) are online (18 user(s) are browsing Forums)

Members: 0
Guests: 30

more...

Headlines

Forum Index


Board index » All Posts (geennaam)




Re: x5000 benchmarks / speed up
Quite a regular
Quite a regular


Unfortunately dcbtl opcode is not recognized by any of the gcc versions in the latest SDK. Unless I use the compiler switch -mcpu=G5.
But the generated code will crash immediately of course.

Even mcpu=e5500 doesn't recognize those 64bytes cacheline opcodes.

DCBA is actually still supported by the e5500 ( see e5500 rm). But, depending on a bit in the l1 cache control register, it works on half or full cache lines. The new opcode e5500 dcbal opcode works always on full cachelines. But again not recognized by gcc.

Anyways, I am stuck now.


Edited by geennaam on 2023/2/21 20:21:20
Go to top


Re: x5000 benchmarks / speed up
Quite a regular
Quite a regular


@joerg

I am probably doing something wrong but as soon as the buffers don't fit into CPU cache anymore, the copy performance is very poor.

Tried bcopy(), memcpy() and IExec->CopyMemQuick()

All more or less the same result:
Copy (CopyMemQuick):
--------------------------------------
Block size         1 Kb3423.14 MB/s
Block size         2 Kb
3657.73 MB/s
Block size         4 Kb
3751.87 MB/s
Block size         8 Kb
3845.11 MB/s
Block size        16 Kb
3722.72 MB/s
Block size        32 Kb
2970.01 MB/s
Block size        64 Kb
2959.63 MB/s
Block size       128 Kb
2979.04 MB/s
Block size       256 Kb
2940.45 MB/s
Block size       512 Kb
1487.38 MB/s
Block size      1024 Kb
1313.59 MB/s
Block size      2048 Kb
541.94 MB/s
Block size      4096 Kb
501.63 MB/s
Block size      8192 Kb
496.15 MB/s
Block size     16384 Kb
496.39 MB/s


I am not doing any DCBT/DCBA because it is my assumption that those functions are optimised already.

Go to top


Re: x5000 benchmarks / speed up
Quite a regular
Quite a regular


@joerg

It's now clear to me that unlike modern AMD/Intel CPUs, the NXP PowerPCs lack hardware cache management. Hence the slow transfer speeds with GCC generated code.

I have implemented quick and dirty DCBT/DCBA and I can already see a speedup to DDR3.

(My assumption is that the cacheline size of the e5500 is 64bytes.)

Memspeed V0.2:

Write 32bit integer:
--------------------------------------
Block size     16384 Kb2785.91 MB/s  +62%


Write 64bit integer:
--------------------------------------
Block size     16384 Kb2040.55 MB/s  +19%


Read 32bit integer:
--------------------------------------
Block size      16384 Kb1587.62 MB/+123%


Read 64bit integer:
--------------------------------------
Block size      16384 Kb1292.63 MB/+80%


Write 32bit float:
--------------------------------------
Block size    16384 Kb2784.67 MB/+63%


Write 64bit float:
--------------------------------------
Block size    16384 Kb2211.16 MB/s  +30%


Read 32bit float:
--------------------------------------
Block size    16384 Kb1590.21 MB/+123%


Read 64bit float:
--------------------------------------
Block size    16384 Kb1718.09 MB/s  +33%



Is bcopy(), memset() and memcpy() "inspired" by the Apple powerpc assembly code? From what i've heard it's lightning fast.


Edited by geennaam on 2023/2/20 23:35:07
Go to top


Re: SDL2
Quite a regular
Quite a regular


@Raziel
@Capehill

I don't know if this is automatically handled by the device IO interface. But if you use the low level library function interface then the application must check the minimum buffer size and audiomode frequency.

With the low level function inferface you can check the AHIDB_MaxPlaySamples tag with GetAudioAttrsA()
It will return the driver buffer value in sample frames. (A16bit stereo frame is 4 bytes.)

Your own minimum application buffer size must be calculated according to the following formula:
"application buffer size" >= "driver buffer size" * "sample frequency" / "audiomode frequency"

So if you want to play 16bit stereo 44.1kHz sound with an Soundblaster Audigy FX through unit 0 where unit 0 is set to 48kHz, the minimum application buffer must be:

1024*44100/48000 >= 941 frames (3764 bytes)
The same example for a 7.1 32bit frame format would require at least 30112 bytes.

Go to top


Re: OpenAL-soft for AmigaOS
Quite a regular
Quite a regular


It looks like OpenAL does support mono and stereo only. Hopefully someone who is familiar with OpenAL can look at this as well.

bool AHIPlayback::reset()
{
    
std::fprintf(stderr"AHIPlayback::reset\n");

    switch (
mDevice->channelsFromFmt())
    {
        case 
1:  // mono
            
switch (mDevice->FmtType)
            {
                case 
DevFmtByte:
                    
mAhiFmt AHIST_M8S;
                    break;
                case 
DevFmtShort:
                    
mAhiFmt AHIST_M16S;
                    break;
                case 
DevFmtInt:
                    
mAhiFmt AHIST_M32S;
                    break;
                default:
                    
mAhiFmt AHIST_NOTYPE;
                    break;
            }
            break;
        case 
2:  // stereo
            
switch (mDevice->FmtType)
            {
                case 
DevFmtByte:
                    
mAhiFmt AHIST_S8S;
                    break;
                case 
DevFmtShort:
                    
mAhiFmt AHIST_S16S;
                    break;
                case 
DevFmtInt:
                    
mAhiFmt AHIST_S32S;
                    break;
                default:
                    
mAhiFmt AHIST_NOTYPE;
                    break;
            }
            break;
        default: 
// surround?
            
mAhiFmt AHIST_NOTYPE;
            break;
    }


Edited by geennaam on 2023/2/20 15:34:41
Go to top


Re: SDL2
Quite a regular
Quite a regular


@Capehill

Yes, channel mapping is correct. Good work!

I do wonder if the clicking noise is caused by a wong buffer lenght. Afterall the stereo test works fine with the same tone generator.

Go to top


Re: SDL2
Quite a regular
Quite a regular


@Capehill

Good progress!

The noise that you are referring to sounds like a looping sample click. Basically, the end of the sample doesn't fit the the start of the sample.
I don't hear any other noise.

Surround works. Only the channel mapping is off.

testsurround32 output -> soundcard channel

"Front center" -> Rear left
"Low Frequency Effect" -> Rear Right
"Back Left" -> Side left
"Back right" -> Side Right
"Side Left" -> Center
"Side right" -> LFE

Go to top


Re: Porting Death Rally - help needed
Quite a regular
Quite a regular


@jabirulo

Runs fine on a X5k + RX 570 + ogles2 renderer

Go to top


Re: SDL2
Quite a regular
Quite a regular


@Capehill

First of all, I'm am not an AHI expert. Sniffing through the AHI device sourcecode is still on my todo list.

But this is how I understand AHI so far:

Yes, AHI expects 32bit samples in a 7.1 mode.

According to AHI.h:
#define AHIST_L7_1 (0x00c3000aUL) /* 7.1, 32 bit signed (8xLONG) */

It's the format of the application output buffer itself that determines how many independant ouput channels are presented to the driver.

After that, AHI does 1:1 mapping of channels in your databuffer.

If you present a stereo buffer to AHI when a 7.1 audiomode is selected for your application, AHI will do no channel upmixing for you. It simple present a stereo buffer to the driver and my driver will playback only on front left and right. The other channels will be muted.

If you present a 7.1 buffer when a 7.1 audio mode is selected then all 8 channels will be played. The order of the samples in the buffer will determine on which speaker the channel is played.
If your application has for example just 5.1 data then do the correct channel mapping in your 7.1 buffer and leave the other two channels empty.

HIFI modes will always present 32bit mixing buffers to the driver.
When you present a 16bit stereo buffer to any HIFI mode, AHI will upmix the format to 32bit and presents a 32bit stereo mixing buffer to the driver.

Unfortunately AHI doesn't know a 16bit 7.1 buffer format. So in case of multichannel 16bit sound, I would do int16->int32 conversion instead of clipping to stereo. It will not cost much processing power:
*output_channel = (int32)*input_channel*65536;

Hope this helps.


Edited by geennaam on 2023/2/18 14:58:14
Go to top


Re: Qt 6 progress
Quite a regular
Quite a regular


@afxgroup

I see what you are doing here. And I can do that as well.

So you are suggesting that our tool chain is closed source? Because if it was open source, you don't need to spend time fixing it, right? It would have happened a long time ago.

Yes, Amigaos4 is in the wrong hands. But an Open source Amigaos4 would mean that within weeks there will be:
-Amigaos4.2
-TheRealAmigaos4.2.1
-IKnowItBetterAmigaOSAllwayOneVersionHigher
-AmigaOS4DoneRight
-OneAmigaOS4toRuleThemAll
-AmigaOS4-<insert your Favorite-ISA>

Because that is what we have become.

Go to top


Re: Qt 6 progress
Quite a regular
Quite a regular


@LiveForIt

OK, I feel addressed here as well.

Problem is that if you don't keep it to yourself. Someone will exploit it. This is unfortunately how it works in the amigascene.

Secondly, I personally don't believe in open-source. Bad coding gets copied because the copier doesn't understand half of it. It's IMHO the main reason why amigaos systems are unstable. Memory protection is just a gatekeeper for bad coded software.

However, I do believe in quality documentation with examples about how things are supposed to be done correctly.

Go to top


Re: Porting Death Rally - help needed
Quite a regular
Quite a regular


@SinanSam460

Great news. Looking forward to the game. Looks a bit like Micro Machines.

Go to top


Re: My AmigaOne X5000 twins - I need some help and advice.
Quite a regular
Quite a regular


@Gebrochen

My X5020 mainboard was actually bought directly from Amigakit. Right after brexit.

Once shipped from Amigakit, it took only a couple of days to clear customs and arrive at my home. Iirc it was wrapped in foam and bubble-wrap . Therefore the outer box didn't close well. But no visual damage to the package.

Go to top


Re: My AmigaOne X5000 twins - I need some help and advice.
Quite a regular
Quite a regular


Ouch, I was also seriously consdering to buy a X5040 mainboard. But a 3k doorstop doesn't really sound appealing. I wonder if those reported performance issues have been solved already.

My X5020 was also shipped as a bare mainboard. It has an unreliable SATA connector and somehow I cannot save the power pref for my RX card. I've also had some issues with the RTC in the past. But this could have been software related.
Other than that, the machine is fine.

Go to top


Re: Porting Death Rally - help needed
Quite a regular
Quite a regular


@m3x

Edit: Isn't this only for eg. alignments of variables within structures? And not for mallocs?


Edited by geennaam on 2023/2/15 14:00:38
Edited by geennaam on 2023/2/15 14:01:23
Go to top


Re: Porting Death Rally - help needed
Quite a regular
Quite a regular


@kas1e

Quote:
The problem which we have on x5000, is probably because of
missing 4 opcodes (lfs, lfsu, stfs, stfsu) which need to implement for x5k, but this wasn't done yet.

Do you mean that this misalignment handling code has not been implemented for those four instructions?

As you say, non-aligned allocations will result in performance issues anyways and should therefore be avoided.

It was my understanding that all allocations in OS4 are default 32bits aligned. So potentially only doubles and uint64 could have alignement issues when you don't force the correct alignment. Or is this only true for IExec->AllocVecTags() calls and not the standard C malloc() like calls?

I must admit that I only use AllocVecTags() because it gives me as much control as possible from this abstraction level. As a hardware guy I have trust issues with compilers and OSes

Looking at the names of the source files give my already a headache. But forcing the memory allocations to be correctly aligned should't be that hard.

Go to top


Re: Porting Death Rally - help needed
Quite a regular
Quite a regular


@SinanSam460

Is this sourcecode reverse engineered? This would at least explain the weird namings.

I am pretty sure that the problem is elsewhere in the code.

The disassembly shows that *xf points to 16bit aligned address (0x5B24264E) where it must be 32bit. The pointer itself is 32bit aligned (0x5B01823C).

lfd f1,40(r31)      Load f1(double) from address 0x5B018248
bl 0x7F73668C       branch to linked 
function sin()
fmr f12,f1          Copy f1 to f12 (result from sine function)
lis r923527       load immediate shift (r9 0x5B070000)
lfd f0,-7736(r9)    Load f0 with double from address 0x5BE6E1C8
fmul f0
,f12,f0      f0 f0 0.788011
fadd f0
,f31,f0      f0 f0 413.0
frsp f0
,f0          double -> Float
lwz r9
,28(r31)      Load r9 with word from address 0x5B01823C
stfs f0
,0(r9)       Store float in F0 to 0x5B24264E


The double loads are double aligned, so that is ok.
The loaded value from (double)s_35e[n].XLocation seems to be 413.0. Strange because this is supposed to be a __BYTE__
The NaN is most likely the result of something bogus loaded from 0x5BE6E1C8 -> lfd f0,-7736(r9). This should have been 12.0

The question remains why there is only an alignement issue on the X5k and sam460. And then specifically on AmigaOS4. Because as I understand it, the MOS version runs fine.
But this is a question for the compiler experts.

Go to top


Re: Porting Death Rally - help needed
Quite a regular
Quite a regular


@SinanSam460

Assuming that Asterix points to the failing instruction and I am reading the disassembly correctly than your code is trying to store a SP float result (f0) at a non-aligned memory location 0x5B24264E(r9). Since a double is converted to single prior to load of destination and store, it is likely *xf or *yf.

Also f0 is NaN. So something is really wrong here.

Can you copy and paste the contents of the disassembly tab of the grim reaper window? It might give a hint where things start to go wrong.


Edited by geennaam on 2023/2/14 23:11:33
Edited by geennaam on 2023/2/14 23:15:03
Edited by geennaam on 2023/2/14 23:20:29
Edited by geennaam on 2023/2/14 23:20:56
Go to top


Re: Porting Death Rally - help needed
Quite a regular
Quite a regular


@SinanSam460

I'm not really a much of a coder so I would not use these kind of constructions myself without exactly knowing what's the resulting behaviour.

But are you sure that it is wise to fill a pointer to a float (32bits) with a double (64bits) result? I can imagine that at least the compiler would complain about a missing cast.

Difference in behaviour can be the difference between unintended compiler decision for this construction versus FP emulation code for the A1222.

Furthermore, the issue could be somewhere else as well. Make sure that n is within bounds for example.

On the positive side: It it good to see that the A1222 actually behaves as intended by the original coders without having a compatible FPU


Edited by geennaam on 2023/2/14 12:18:24
Go to top


Re: Get address of deleted logic blocks from Filesystem
Quite a regular
Quite a regular


@tonyw

Native LBA is relative these days. NVMe drives can support up to 64 different LBA formats. It will report which of the supported LBAs will perform the best based on how the drive has been formatted. My samsung 970 EVO 1TB happens to support only one LBA (512 bytes). It's the OS memory page size that needs to be at least 4k.
I receive a WD BLACK SN770 today. So maybe this modern drive will actually support 4k pages as well.

I am not familiar with filesystems. When can I find this bitmap?
Does NGFS(2) also store this info in a bitmap?

NVME will show the maximum capacity of a namespace and the allocated capacity. I will compare this NVMe allocated value with the actual allocation reported by the filesystem. It should diverge over time without Trim.

Go to top



TopTop
« 1 ... 19 20 21 (22) 23 24 25 ... 35 »




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project