Forums - All Posts - The Amigans website

Forum Index

Board index » All Posts (geennaam)

Bottom

« 1 ... 19 20 21 (22) 23 24 25 ... 35 »

geennaam

Re: x5000 benchmarks / speed up

Posted on: 2023/2/21 20:03 #421

Quite a regular

Unfortunately dcbtl opcode is not recognized by any of the gcc versions in the latest SDK. Unless I use the compiler switch -mcpu=G5.
But the generated code will crash immediately of course.

Even mcpu=e5500 doesn't recognize those 64bytes cacheline opcodes.

DCBA is actually still supported by the e5500 ( see e5500 rm). But, depending on a bit in the l1 cache control register, it works on half or full cache lines. The new opcode e5500 dcbal opcode works always on full cachelines. But again not recognized by gcc.

Anyways, I am stuck now.

Edited by geennaam on 2023/2/21 20:21:20

Topic | Forum

geennaam

Re: x5000 benchmarks / speed up

Posted on: 2023/2/20 23:34 #422

Quite a regular

@joerg

I am probably doing something wrong but as soon as the buffers don't fit into CPU cache anymore, the copy performance is very poor.

Tried bcopy(), memcpy() and IExec->CopyMemQuick()

All more or less the same result:


Copy (CopyMemQuick):

--------------------------------------

Block size         1 Kb: 3423.14 MB/s

Block size         2 Kb: 3657.73 MB/s

Block size         4 Kb: 3751.87 MB/s

Block size         8 Kb: 3845.11 MB/s

Block size        16 Kb: 3722.72 MB/s

Block size        32 Kb: 2970.01 MB/s

Block size        64 Kb: 2959.63 MB/s

Block size       128 Kb: 2979.04 MB/s

Block size       256 Kb: 2940.45 MB/s

Block size       512 Kb: 1487.38 MB/s

Block size      1024 Kb: 1313.59 MB/s

Block size      2048 Kb: 541.94 MB/s

Block size      4096 Kb: 501.63 MB/s

Block size      8192 Kb: 496.15 MB/s

Block size     16384 Kb: 496.39 MB/s

I am not doing any DCBT/DCBA because it is my assumption that those functions are optimised already.

Topic | Forum

geennaam

Re: x5000 benchmarks / speed up

Posted on: 2023/2/20 23:04 #423

Quite a regular

@joerg

It's now clear to me that unlike modern AMD/Intel CPUs, the NXP PowerPCs lack hardware cache management. Hence the slow transfer speeds with GCC generated code.

I have implemented quick and dirty DCBT/DCBA and I can already see a speedup to DDR3.

(My assumption is that the cacheline size of the e5500 is 64bytes.)


Memspeed V0.2:



Write 32bit integer:

--------------------------------------

Block size     16384 Kb: 2785.91 MB/s  +62%





Write 64bit integer:

--------------------------------------

Block size     16384 Kb: 2040.55 MB/s  +19%





Read 32bit integer:

--------------------------------------

Block size      16384 Kb: 1587.62 MB/s +123%





Read 64bit integer:

--------------------------------------

Block size      16384 Kb: 1292.63 MB/s +80%





Write 32bit float:

--------------------------------------

Block size    16384 Kb: 2784.67 MB/s +63%





Write 64bit float:

--------------------------------------

Block size    16384 Kb: 2211.16 MB/s  +30%





Read 32bit float:

--------------------------------------

Block size    16384 Kb: 1590.21 MB/s +123%





Read 64bit float:

--------------------------------------

Block size    16384 Kb: 1718.09 MB/s  +33%

Is bcopy(), memset() and memcpy() "inspired" by the Apple powerpc assembly code? From what i've heard it's lightning fast.

Edited by geennaam on 2023/2/20 23:35:07

Topic | Forum

geennaam

Re: SDL2

Posted on: 2023/2/20 22:23 #424

Quite a regular

@Raziel
@Capehill

I don't know if this is automatically handled by the device IO interface. But if you use the low level library function interface then the application must check the minimum buffer size and audiomode frequency.

With the low level function inferface you can check the AHIDB_MaxPlaySamples tag with GetAudioAttrsA()
It will return the driver buffer value in sample frames. (A16bit stereo frame is 4 bytes.)

Your own minimum application buffer size must be calculated according to the following formula:
"application buffer size" >= "driver buffer size" * "sample frequency" / "audiomode frequency"

So if you want to play 16bit stereo 44.1kHz sound with an Soundblaster Audigy FX through unit 0 where unit 0 is set to 48kHz, the minimum application buffer must be:

1024*44100/48000 >= 941 frames (3764 bytes)
The same example for a 7.1 32bit frame format would require at least 30112 bytes.

Topic | Forum

geennaam

Re: OpenAL-soft for AmigaOS

Posted on: 2023/2/20 15:17 #425

Quite a regular

It looks like OpenAL does support mono and stereo only. Hopefully someone who is familiar with OpenAL can look at this as well.


bool AHIPlayback::reset()

{

    std::fprintf(stderr, "AHIPlayback::reset\n");



    switch (mDevice->channelsFromFmt())

    {

        case 1:  // mono

            switch (mDevice->FmtType)

            {

                case DevFmtByte:

                    mAhiFmt = AHIST_M8S;

                    break;

                case DevFmtShort:

                    mAhiFmt = AHIST_M16S;

                    break;

                case DevFmtInt:

                    mAhiFmt = AHIST_M32S;

                    break;

                default:

                    mAhiFmt = AHIST_NOTYPE;

                    break;

            }

            break;

        case 2:  // stereo

            switch (mDevice->FmtType)

            {

                case DevFmtByte:

                    mAhiFmt = AHIST_S8S;

                    break;

                case DevFmtShort:

                    mAhiFmt = AHIST_S16S;

                    break;

                case DevFmtInt:

                    mAhiFmt = AHIST_S32S;

                    break;

                default:

                    mAhiFmt = AHIST_NOTYPE;

                    break;

            }

            break;

        default: // surround?

            mAhiFmt = AHIST_NOTYPE;

            break;

    }

Edited by geennaam on 2023/2/20 15:34:41

Topic | Forum

geennaam

Re: SDL2

Posted on: 2023/2/20 7:06 #426

Quite a regular

@Capehill

Yes, channel mapping is correct. Good work!

I do wonder if the clicking noise is caused by a wong buffer lenght. Afterall the stereo test works fine with the same tone generator.

Topic | Forum

geennaam

Re: SDL2

Posted on: 2023/2/19 16:20 #427

Quite a regular

@Capehill

Good progress!

The noise that you are referring to sounds like a looping sample click. Basically, the end of the sample doesn't fit the the start of the sample.
I don't hear any other noise.

Surround works. Only the channel mapping is off.

testsurround32 output -> soundcard channel

"Front center" -> Rear left
"Low Frequency Effect" -> Rear Right
"Back Left" -> Side left
"Back right" -> Side Right
"Side Left" -> Center
"Side right" -> LFE

Topic | Forum

geennaam

Re: Porting Death Rally - help needed

Posted on: 2023/2/19 11:23 #428

Quite a regular

@jabirulo

Runs fine on a X5k + RX 570 + ogles2 renderer

Topic | Forum

geennaam

Re: SDL2

Posted on: 2023/2/18 14:43 #429

Quite a regular

@Capehill

First of all, I'm am not an AHI expert. Sniffing through the AHI device sourcecode is still on my todo list.

But this is how I understand AHI so far:

Yes, AHI expects 32bit samples in a 7.1 mode.

According to AHI.h:
#define AHIST_L7_1 (0x00c3000aUL) /* 7.1, 32 bit signed (8xLONG) */

It's the format of the application output buffer itself that determines how many independant ouput channels are presented to the driver.

After that, AHI does 1:1 mapping of channels in your databuffer.

If you present a stereo buffer to AHI when a 7.1 audiomode is selected for your application, AHI will do no channel upmixing for you. It simple present a stereo buffer to the driver and my driver will playback only on front left and right. The other channels will be muted.

If you present a 7.1 buffer when a 7.1 audio mode is selected then all 8 channels will be played. The order of the samples in the buffer will determine on which speaker the channel is played.
If your application has for example just 5.1 data then do the correct channel mapping in your 7.1 buffer and leave the other two channels empty.

HIFI modes will always present 32bit mixing buffers to the driver.
When you present a 16bit stereo buffer to any HIFI mode, AHI will upmix the format to 32bit and presents a 32bit stereo mixing buffer to the driver.

Unfortunately AHI doesn't know a 16bit 7.1 buffer format. So in case of multichannel 16bit sound, I would do int16->int32 conversion instead of clipping to stereo. It will not cost much processing power:
*output_channel = (int32)*input_channel*65536;

Hope this helps.

Edited by geennaam on 2023/2/18 14:58:14

Topic | Forum

geennaam

Re: Qt 6 progress

Posted on: 2023/2/16 21:12 #430

Quite a regular

@afxgroup

I see what you are doing here. And I can do that as well.

So you are suggesting that our tool chain is closed source? Because if it was open source, you don't need to spend time fixing it, right? It would have happened a long time ago.

Yes, Amigaos4 is in the wrong hands. But an Open source Amigaos4 would mean that within weeks there will be:
-Amigaos4.2
-TheRealAmigaos4.2.1
-IKnowItBetterAmigaOSAllwayOneVersionHigher
-AmigaOS4DoneRight
-OneAmigaOS4toRuleThemAll
-AmigaOS4-<insert your Favorite-ISA>

Because that is what we have become.

Topic | Forum

geennaam

Re: Qt 6 progress

Posted on: 2023/2/16 20:38 #431

Quite a regular

@LiveForIt

OK, I feel addressed here as well.

Problem is that if you don't keep it to yourself. Someone will exploit it. This is unfortunately how it works in the amigascene.

Secondly, I personally don't believe in open-source. Bad coding gets copied because the copier doesn't understand half of it. It's IMHO the main reason why amigaos systems are unstable. Memory protection is just a gatekeeper for bad coded software.

However, I do believe in quality documentation with examples about how things are supposed to be done correctly.

Topic | Forum

geennaam

Re: Porting Death Rally - help needed

Posted on: 2023/2/15 20:42 #432

Quite a regular

@SinanSam460

Great news. Looking forward to the game. Looks a bit like Micro Machines.

Topic | Forum

geennaam

Re: My AmigaOne X5000 twins - I need some help and advice.

Posted on: 2023/2/15 14:07 #433

Quite a regular

@Gebrochen

My X5020 mainboard was actually bought directly from Amigakit. Right after brexit.

Once shipped from Amigakit, it took only a couple of days to clear customs and arrive at my home. Iirc it was wrapped in foam and bubble-wrap . Therefore the outer box didn't close well. But no visual damage to the package.

Topic | Forum

geennaam

Re: My AmigaOne X5000 twins - I need some help and advice.

Posted on: 2023/2/15 13:37 #434

Quite a regular

Ouch, I was also seriously consdering to buy a X5040 mainboard. But a 3k doorstop doesn't really sound appealing. I wonder if those reported performance issues have been solved already.

My X5020 was also shipped as a bare mainboard. It has an unreliable SATA connector and somehow I cannot save the power pref for my RX card. I've also had some issues with the RTC in the past. But this could have been software related.
Other than that, the machine is fine.

Topic | Forum

geennaam

Re: Porting Death Rally - help needed

Posted on: 2023/2/15 12:59 #435

Quite a regular

@m3x

Edit: Isn't this only for eg. alignments of variables within structures? And not for mallocs?

Edited by geennaam on 2023/2/15 14:00:38
Edited by geennaam on 2023/2/15 14:01:23

Topic | Forum

geennaam

Re: Porting Death Rally - help needed

Posted on: 2023/2/15 12:53 #436

Quite a regular

@kas1e

Quote:

The problem which we have on x5000, is probably because of
missing 4 opcodes (lfs, lfsu, stfs, stfsu) which need to implement for x5k, but this wasn't done yet.

Do you mean that this misalignment handling code has not been implemented for those four instructions?

As you say, non-aligned allocations will result in performance issues anyways and should therefore be avoided.

It was my understanding that all allocations in OS4 are default 32bits aligned. So potentially only doubles and uint64 could have alignement issues when you don't force the correct alignment. Or is this only true for IExec->AllocVecTags() calls and not the standard C malloc() like calls?

I must admit that I only use AllocVecTags() because it gives me as much control as possible from this abstraction level. As a hardware guy I have trust issues with compilers and OSes

Looking at the names of the source files give my already a headache. But forcing the memory allocations to be correctly aligned should't be that hard.

Topic | Forum

geennaam

Re: Porting Death Rally - help needed

Posted on: 2023/2/15 10:23 #437

Quite a regular

@SinanSam460

Is this sourcecode reverse engineered? This would at least explain the weird namings.

I am pretty sure that the problem is elsewhere in the code.

The disassembly shows that *xf points to 16bit aligned address (0x5B24264E) where it must be 32bit. The pointer itself is 32bit aligned (0x5B01823C).


lfd f1,40(r31)      Load f1(double) from address 0x5B018248

bl 0x7F73668C       branch to linked function sin()

fmr f12,f1          Copy f1 to f12 (result from sine function)

lis r9, 23527       load immediate shift (r9 = 0x5B070000)

lfd f0,-7736(r9)    Load f0 with double from address 0x5BE6E1C8

fmul f0,f12,f0      f0 = f0 * 0.788011

fadd f0,f31,f0      f0 = f0 + 413.0

frsp f0,f0          double -> Float

lwz r9,28(r31)      Load r9 with word from address 0x5B01823C

stfs f0,0(r9)       Store float in F0 to 0x5B24264E

The double loads are double aligned, so that is ok.
The loaded value from (double)s_35e[n].XLocation seems to be 413.0. Strange because this is supposed to be a __BYTE__
The NaN is most likely the result of something bogus loaded from 0x5BE6E1C8 -> lfd f0,-7736(r9). This should have been 12.0

The question remains why there is only an alignement issue on the X5k and sam460. And then specifically on AmigaOS4. Because as I understand it, the MOS version runs fine.
But this is a question for the compiler experts.

Topic | Forum

geennaam

Re: Porting Death Rally - help needed

Posted on: 2023/2/14 22:52 #438

Quite a regular

@SinanSam460

Assuming that Asterix points to the failing instruction and I am reading the disassembly correctly than your code is trying to store a SP float result (f0) at a non-aligned memory location 0x5B24264E(r9). Since a double is converted to single prior to load of destination and store, it is likely *xf or *yf.

Also f0 is NaN. So something is really wrong here.

Can you copy and paste the contents of the disassembly tab of the grim reaper window? It might give a hint where things start to go wrong.

Edited by geennaam on 2023/2/14 23:11:33
Edited by geennaam on 2023/2/14 23:15:03
Edited by geennaam on 2023/2/14 23:20:29
Edited by geennaam on 2023/2/14 23:20:56

Topic | Forum

geennaam

Re: Porting Death Rally - help needed

Posted on: 2023/2/14 12:02 #439

Quite a regular

@SinanSam460

I'm not really a much of a coder so I would not use these kind of constructions myself without exactly knowing what's the resulting behaviour.

But are you sure that it is wise to fill a pointer to a float (32bits) with a double (64bits) result? I can imagine that at least the compiler would complain about a missing cast.

Difference in behaviour can be the difference between unintended compiler decision for this construction versus FP emulation code for the A1222.

Furthermore, the issue could be somewhere else as well. Make sure that n is within bounds for example.

On the positive side: It it good to see that the A1222 actually behaves as intended by the original coders without having a compatible FPU

Edited by geennaam on 2023/2/14 12:18:24

Topic | Forum

geennaam

Re: Get address of deleted logic blocks from Filesystem

Posted on: 2023/2/10 11:42 #440

Quite a regular

@tonyw

Native LBA is relative these days. NVMe drives can support up to 64 different LBA formats. It will report which of the supported LBAs will perform the best based on how the drive has been formatted. My samsung 970 EVO 1TB happens to support only one LBA (512 bytes). It's the OS memory page size that needs to be at least 4k.
I receive a WD BLACK SN770 today. So maybe this modern drive will actually support 4k pages as well.

I am not familiar with filesystems. When can I find this bitmap?
Does NGFS(2) also store this info in a bitmap?

NVME will show the maximum capacity of a namespace and the allocated capacity. I will compare this NVMe allocated value with the actual allocation reported by the filesystem. It should diverge over time without Trim.

Topic | Forum

Top

« 1 ... 19 20 21 (22) 23 24 25 ... 35 »