Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
86 user(s) are online (58 user(s) are browsing Forums)

Members: 1
Guests: 85

rjd324, more...

Support us!

Headlines

Forum Index


Board index » All Posts (joerg)




Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


@nikitas
The fastest passed-trough gfx card until now is a MSI R9 270x:
https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2773
About 1/3 (X5000) to 1/2 (X1000) of the fastest results on real hardware.

If you can get such a gfx card very cheap you may try it, but it's not only the gfx card itself, different host hardware (AMD Ryzen CPUs seem to be much faster for QEmu than Intel CPUs, ARM based CPUs like the Apple M1/M2/M3 may even be faster than Ryzen CPUs, but since Macs don't have PCIe slots they are useless), different Linux versions as well as differences in the BIOS/UEFI may make a difference as well.

Go to top


Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


@balaton
Quote:
Considering the RageMem results copying with simple CPU loop should not be slow as that's compiled to host code.
Any CPU access to VRAM over ZorroIII/PCI/PCIe is extremely slow, no matter if real or emulated hardware.
RageMem results are useless, especially if it's used on fake hardware like the SM50x emulation of QEmu (no real VRAM but emulated in DRAM) or the WinUAE uaegfx.card emulation.
You'd have to check for example the WinUAE Voodoo3 emulation instead, where you get the same slow VRAM accesses as on real hardware.

Quote:
The DMA engine of the 460EX is just a memcpy or memmove in the optimised case or line by line copy if the region is not continuous.
A bcopy()/memmove()/memcpy() implementation on the host for the guest's DMA engine is unusable, if the host doesn't use host DMA for guest DMA you'll never get any usable results, no matter if it's a passed-trough vfio-pci real gfx card or a virtio-gpu emulated one with an AmigaOS driver Hans is trying to implement.

Go to top


Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


@balaton
Quote:
I don't know what code path copy from VRAM goes through in QEMU
If it's not a direct VRAM access from application code but using the AmigaOS functions:
Classic Amiga/AmigaOne/Pagasos2 with G2 or G3 CPU: CPU copy loop using the 64 bit FPU registers, very slow.
AmigaOne/Pagasos2 with G4 CPU: CPU copy loop using the 128 bit vector registers, only a little bit better.
Sam4x0/X1000/X5000 (and maybe A1222 too): If the requirements are met (aligned and large enough size) the DMA engine of the CPU is used for the copy which us much faster, for unaligned and small copies it's the same slow CPU copy loop as on G2, G3 or G4 (X1000).

If the emulation of the 4x0 CPUs DMA engine is using host DMA it should be much faster than AmigaOne/Pegasos2 for copies from/to VRAM, but if the emulation of the 4x0 DMA engine uses host CPU integer/FPU/vector accesses instead there is probably no difference.

In this post I quoted some GfxBench2D results. AFAIK "Copy to VRAM" and "Copy from VRAM" are using direct CPU (integer/FPU/vector) accesses of the VRAM in the benchmark tool and especially when reading from VRAM that's very slow, while "Write Pixel Array" and "Read Pixel Array" are using the AmigaOS functions with DMA instead.

Go to top


Re: A1222 Freezes when writing to the RAM DISK
Just can't stay away
Just can't stay away


@daveyw
Quote:
You shouldn't really need a SWAP partition. AmigaOS can only address 2GB RAM,
At least 4 GB with software using ExtMem like ram-handler, maybe more on the 64 bit systems.
ExtMem uses a small virtual memory window with the virtual address in the first 2 GB, the physical memory address can be anywhere (upper 2 GB on the 32 bit systems, on 64 bit systems it could even be outside of the first 4 GB).
It's similar to the bank switching used on 16 bit CPUs like Z80, 6510, 8086, etc. in the 1980's to access more than 64KB RAM.
According to the autodocs ExtMem doesn't work with the Pegasos2 kernels. Maybe the A1222 kernel has a similar problem/bug?
I'd try replacing the ram-handler.kmod, either with a much older version, or maybe the current version of a Pegasos2 version of AmigaOS 4.1 in case it includes a special Pegasos2 version without ExtMem support, and check if a version of ram-handler which doesn't use ExtMem works.

Go to top


Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


@nikitas
Quote:
U-Boot 2010.06.05 (Jul 08 2018 - 22:45:33)
That's too old, for Radeon RX cards you need the Sam460ex U-Boot version 2015.c (or newer) and may have to use
setenv x86emu 2
saveenv

Some Radeon HD and RX gfx cards only work with the x86 emulator set to 2/medium.

Go to top


Re: sam460: new uboot update to handle RadeonRX coming soon!
Just can't stay away
Just can't stay away


@m3x
Quote:
Remember to put them to the original values when done:

> NVSetVar stdin usbkbd
> NVSetVar stdout serial
"stdout serial" seems to be wrong for the original value, should be something like "stdout vga" instead (alt least in the A1-XE and X5000 U-Boot versions that's the default).

Go to top


Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


@Hans
Quote:
First, the driver doesn't use GART on older systems like the pegasos-2, A1-XE, & Sam460 due to lack of memory coherency. I tried manual cache flushing/invalidation, but simply couldn't get it to work reliably. So, all commands are written to VRAM. Therefore, slow VRAM access means slower command submission.
I think I've suggested that several years ago already, but can you explain (again) why using something like
IMMU->SetMemoryAttrs(page_alinged_command_bufferpage_aligned_buffer_sizeMEMATTRF_WRITETHROUGH|MEMATTRF_READ_WRITE);
or if that doesn't work either because it's not write-only but has to be (re)read by the CPU without caching as well
IMMU->SetMemoryAttrs(page_alinged_command_bufferpage_aligned_buffer_sizeMEMATTRF_CACHEINHIBIT|MEMATTRF_COHERENT|MEMATTRF_GUARDED|MEMATTRF_READ_WRITE);
doesn't work?

Write-though or even cache-inhibited DRAM should be much faster than cache-inhibited VRAM over ZorroIII/PCI/PCIe.
At least in some of my classic Amiga OS4 parts I used write-through mapped memory because it was faster, and much easier to use, than cached memory with manual cache flushing.

Quote:
Second, AmigaOS does a lot of CPU-based rendering. Far too much. This includes things such as drawing text and lines (other than horizontal or vertical). This will frequently involve both reading from and writing to VRAM.
The glyph cache of ft2.library is in DRAM instead of in a BitMap in VRAM??? That would be very bad.
I'm not 100% sure anymore, but I think for the text rendering in OWB I used a 8 bit (alpha only) BitMap for an additional glyph cache, IIRC for the last 256 used glyphs (because of the unicode support there can be much more), each char blitted from the glyph cache to a text line sized 32 bit bitmap, adding the text colour, which then was copied to the window with CompositeTags() for the anti-aliasing.

Go to top


Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


@Hans
Quote:
Ouch! Those VRAM access stats are horrifically bad ...
And that was even the fastest result of a passed-through gfx card on QEmu until now, other results are even worse, for example
https://ftp.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2772
Operation MiB/s
Copy to VRAM 4.19
Write Pixel Array 4.38
Copy from VRAM 2.24
Read Pixel Array 2.26

Go to top


Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


@balaton
Quote:
(Just because compose.task shows up using CPU in your screen shot but I don't know what that does ot if it's related.)
compose.task does the alpha blending with transparency, adding shadows, etc. of the window bitmaps to the screen bitmap.
It's using the 3D features of the GPU and therefore doesn't copy much data around (except on GPUs without 3D support like SM501/2 where CPU based workarounds have to be used), but just sends GPU commands to the gfx card.
The only reason I can think of that that's very slow would be extremely slow access to the gfx card VRAM.
Maybe a general problem with VFIO on QEmu?
For example

https://lists.hdrlab.org.nz/benchmark/ ... 2d/OS/AmigaOS/Result/2348 (X5000/20, Radeon RX Polaris10)
Operation MiB/s
Copy to VRAM 533.72
Write Pixel Array 1,071.60
Copy from VRAM 40.33
Read Pixel Array 995.02

https://lists.hdrlab.org.nz/benchmark/ ... 2d/OS/AmigaOS/Result/1551 (X1000, Radeon HD 6970)
Operation MiB/s
Copy to VRAM 438.44
Write Pixel Array 1,415.46
Copy from VRAM 41.77
Read Pixel Array 398.56

https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2660 (Sam460ex, Radeon HD 5450)
Operation MiB/s
Copy to VRAM 305.30
Write Pixel Array 540.02
Copy from VRAM 59.13
Read Pixel Array 60.30

https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2773 (QEmu on i5, Radeon HD 7800 Series (MSI R9 270x))
Operation MiB/s
Copy to VRAM 67.15
Write Pixel Array 65.74
Copy from VRAM 6.09
Read Pixel Array 6.16

Go to top


Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


@MartinW
GfxBench2D results would be interesting, for example compared to

geennaam's MSI R9 270X with QEmu on a Core i5 CPU, 1920×1080@61 (32 bit)
https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2773
Overall Score: 5,237.82

and a HD5450 in a Sam460ex, 1024×768@60 (32 bit)
https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2660
Overall Score: 1,688.95

Quote:
I haven't tried the R5 230 card because I would think it's even weaker than the 5450

https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2345
seems to be a gfx card with the same core as R5 230, 1280×1024@60 (32 bit) in a X5000/20, Overall Score: 1,668.38, about the same as the HD5450 in the Sam460ex, but since it's using a higher resolution the results are not really comparable.


Edited by joerg on 2024/6/6 6:08:48
Edited by joerg on 2024/6/6 6:09:55
Go to top


Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


@nikitas
Quote:
It is true, that a real PegasosII does not have a PCIe slot. Only PCI. My Radeon GPU is plugged into a PCIe slot because my motherboard is relatively new, so it does not have a PCI slot. I don't know if this contributes to the slow performance.
No, the fast RadeonHD cards used are PCIe as well.

Since you get 100% CPU usage even for very simple things using a tool like http://os4depot.net/?function=showfil ... ity/workbench/tequila.lha or my top may help to check which task(s) are using the CPU.

For the RadeonHD cards with fast results the Pegasos2 firmware was used in addition to BBoot, you could try if that changes anything (may require additional steps, something like "/failsafe" in the firmware prompt, but I don't know anything about the Pegasos2 firmware).

Go to top


Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


@balaton
Quote:
- RadeonHD cards may work but likely need the machine firmware to init the graphics card ...

- RadeonRX cards that have AtomBIOS but the guest firmware cannot run it may work with just -kernel bboot without -bios machine firmware, because the AmigaOS driver may be able to parse AromBIOS and init the card but this may not always or fully work ...
AFAIK all drivers from Hans, Radeon HD and RX, support AtomBIOS.
Only the much older ATIRadeon.chip driver for R100/R200 cards like Radeon 7000 (not HD 7000!), 9000, 9250, etc. only works after the firmware has executed the gfx card x86 BIOS.
Some other ancient AmigaOS gfx card drivers like 3dfxVoodoo.card work without executing the gfx card x86 BIOS by the firmware as well.

But even if you use a gfx card with exactly the same gfx chip someone has used successfully already a different BIOS from a different vendor, or even just a different version of the BIOS from the same vendor, may not work.

@MartinW
Quote:
- R5 230 (I'll ignore this one. It was bought to try MOS. That didn't work and it's a very low performance card anyway that I don't think is supported by OS4)
According to https://www.acube-systems.biz/compatibility/compatibility_41.php R5 230 should work.

Quote:
BUT, is it not the case that I would need the V5 HD drivers for this Southern Islands based card?
Not according to https://hdrlab.org.nz/projects/amiga-o ... r-hardware-compatibility/
It's very old and doesn't seem to have been updated for several years, but in the "Radeon R9 Series" part the driver versions used are only 1.2 - 2.7.

From https://wiki.amiga.org/index.php/RadeonHD
Quote:
Oland chipsets (HD R7 240/240D, HD R7 250) require the newer RadeonHD drivers (v1.20, v2.21 or v3.6) exclusively available from the Enhancer Software

I don't know which Radeon HD cards didn't work with the Enhancer Software V3 driver yet but require the separately sold V5 driver instead.
For Radeon RX cards there is no separately sold driver, the current version is included in Enhancer Software.

Go to top


Re: A1222Plus has a new home
Just can't stay away
Just can't stay away



Go to top


Re: A1222Plus has a new home
Just can't stay away
Just can't stay away


@white
Quote:
An emulated graphics card or a driver for a real video card (PCI passthrough) is always missing.
PCI(e) passthrough doesn't need special drivers, the AmigaOS 4.1 gfx drivers are of course working if you have a compatible gfx card installed, for example geennaam got QEmu working with a passed through Radeon HD gfx card.

For emulated gfx WinUAE is of course much better than QEmu. QEmu only supports 16 bit 2D SM50x emualtion with small resolutions, WinUAE supports partially HW accelerated (DirectX) 2D 32 bit UHD resolutions with uaegfx.card, 3D Voodoo3 emulation with the 3dfxVoodoo.chip driver for 3D software/games, emulation of several other classic Amiga gfx cards as well as OCS/ECS/AGA emulation.

Go to top


Re: FTP Between OS3 and OS4
Just can't stay away
Just can't stay away


@MartinW
AmigaOS specific attributes like the protection bits, user/group ID and comment are stored in the directory entry of the file. It's the same on other OSes as well, for example protection bits and user/group on Unix.
The difference between binary and text mode in FTP is just that the binary mode doesn't modify the file and you get an exact copy of the contents, while text mode replaces <CR><LF> by <LF>, or the other way round, destroying binary files.
But neither mode can copy meta data from the directory entries like protection bits or comments, independent of the OSes used.

Go to top


Re: Tracing of callhookpkt()/callhook()
Just can't stay away
Just can't stay away


@Hypex
Quote:
Regarding SDI, looks like it was a bit too late for the 80's, being dated from this century.
For AmigaOS, 1-3.x/m68k and 4.x/PPC, it's useless, the REG(), ASM, SAVEDS, etc. macros are enough if you want to support both m68k and PPC versions of AmigaOS, or even just different compilers: SAS/C, DICE, StormC, VBCC and GCC for m68k AmigaOS, VBCC and GCC for PPC AmigaOS.
SDI is only required if you additionally want to support AmigaOS incompatible OSes like AROS and/or MorphOS.

Quote:
But what does SDI mean? I read through that project years ago and unless I missed the obvious it doesn't say what it is. I think an acronym is rather pointless if it isn't explained on the first line of the readme.
Standard Developver Interface

Go to top


Re: Tracing of callhookpkt()/callhook()
Just can't stay away
Just can't stay away


@Hypex
Quote:
That's still too complicated. Given a library call is like this without needing to think about it.
Only works because on m68k the AmigaOS library calls are replaced with inline functions, macros or #pragma, depending on the compiler, which replace the standard stack based m68k ABI with register arguments, and a hidden A6 library base argument, for those functions .

Quote:
So it certainly could have been programmed in.

int32 hookFunc(struct Hook *hookAPTR objectAPTR message)
#include https://github.com/adtools/SDI/blob/master/SDI_hook.h
HOOKPROTO(hookFuncint32APTRAPTR);
static 
int32 hookFunc(struct Hook *hookAPTR objectAPTR message)
{
   ...
}

Go to top


Re: Tracing of callhookpkt()/callhook()
Just can't stay away
Just can't stay away


@Hypex
Quote:
The 68K ABI convention is also the same way as the native ABI also uses registers in API calls,
No, it doesn't.
Check for example https://m680x0.github.io/doc/abi.html
For the return value a register (d0 for ints, a0 for pointers) is used, but all arguments are on the stack, none in a register.

Quote:
Quote:
But even on m68k most people didn't use that anymore but something like

Except in that case but it can be compiler specific.
It would have been compiler specific if I'd have used something like
int32 hookFunc(__asm(a0struct Hook *hook__asm(a2APTR object__asm(a1APTR message)
which only works with GCC, but using the REG() macro instead works at least with SAS/C, DICE, VBCC, StormC and GCC.

Go to top


Re: Tracing of callhookpkt()/callhook()
Just can't stay away
Just can't stay away


@Hypex
On PPC nobody is using an asm stub in h_Entry and a C function in h_SubEntry.
It's not required because on PPC all 3 arguments are passed in registers (r3, r4 and r5) in C code anyway.

On m68k an asm stub in h_Entry can be used which pushes the 3 registers A0, A2 and A1 on the stack and calls a C function in h_SubEntry.
On m68k C functions don't use registers but only the stack for the arguments.

But even on m68k most people didn't use that anymore but something like
ASM SAVEDS int32 hookFunc(REG(a0struct Hook *hook), REG(a2APTR object), REG(a1APTR message))
{
   ...
   return 
result;
}
in h_Entry.

The EmulateTags() call works in any case, no matter if there is a m68k asm stub in h_Entry which calls a h_SubEntry function with stack arguments, or if h_Entry is the C function using register arguments with the REG() macro.

It's the same for PPC native code, just in case someone uses useless code like
int32 hookFunc(struct Hook *hookAPTR objectAPTR message)
{
   ...
   return 
result;
}

int32 hookStub(struct Hook *hookAPTR objectAPTR message)
{
   return 
hook->h_SubEntry(hookobjectmessage);
}

struct Hook hook;
hook.h_Entry hookStub;
hook.h_SubEntry hookFunc;


Edited by joerg on 2024/5/23 6:08:42
Go to top


Re: developing amiga 68000 clone
Just can't stay away
Just can't stay away


@kerravon
Quote:
int c_Write(void *handle, void *buf, int len)
{
int rc;
rc = fwrite(buf, 1, len, handle);
return (rc);
}
That's wrong and wont work in all cases.
dos.library Write() is an unbuffered I/O function, while C library fwrite() is buffered.
If you want to use C fwrite() for dos Write() you'd have to use
int c_Write(void *handlevoid *bufint len)
{
int rc;
rc fwrite(buf1lenhandle);
fflush(handle);
return 
rc;
}
instead.

Additionally dos.library file handles can't be the same as a C library FILE handle, it's a different abstraction layer.
dos.library BPTR file pointers are more similar to the POSIX int filedes.

Go to top



TopTop
« 1 2 3 (4) 5 6 7 ... 92 »




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project