Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
112 user(s) are online (73 user(s) are browsing Forums)

Members: 2
Guests: 110

K-L, MartinW, more...

Support us!

Headlines

 
  Register To Post  

(1) 2 »
QEMU GPU vfio-pci pass through
Quite a regular
Quite a regular


See User information
Currently information about this is in different topics so created new topic to gather these at one place. I try to summarise what I know about this.

This needs a Linux host machine with support for IOMMU and it should be enabled in the host BIOS/UEFI. Before anything else read some docs on how to set up vfio pass through for GPUs for your Linux distro. There are several docs, anybody knows a good and consise one to link here? Once vfio is set up (card isolated and assigned to vfio) you can try booting a Linux or Windows guest VM to test that pass through is correctly set up before trying with AmigaOS.

You probably need options like these (all devices in the vfio group must be detached from host drivers and passed through to the guest):
-vga none -device vfio-pci,x-vga=on,host=x:y.0 -device vfio-pci,host=x:y.1

If you need a host window for keyboard/mouse input then try to add -device bochs-display which has no VGA ranges so it does not conflict with the passed through card.

- Radeon cards before HD need fixing QEMU at least here and here and maybe elsewhere so they won't work currently.

- RadeonHD cards may work but likely need the machine firmware to init the graphics card so you can't use just BBoot but need to use -bios firmware.rom and boot from there. Also the pegasos2 AmigaOS version has kernel bugs (problem with 64bit BARs and incomplete IRQ setup) that BBoot can work around but then you need both -bios pegasos2.rom to init the card and then do 'boot hd:0 bboot.fth' to start BBoot from within SmartFirmware to patch BARs and IRQ setup. You need to copy bboot bboot.fth and Kickstart.zip to the boot partition for this. Other machines don't need firmware + bboot, only their firmware.

- RadeonRX cards that have AtomBIOS but the guest firmware cannot run it may work with just -kernel bboot without -bios machine firmware, because the AmigaOS driver may be able to parse AromBIOS and init the card but this may not always or fully work. This needs some more experimenting and debugging.

So far I got reports about these cards working for pass-through:
RadeonHD 7000 from this post
RadeonHD R9 270x from this post
RadeonRX 550 worked but slow, will add more details in separate post below

So far all of the reports were with pegasos2 and issues with speed were mentioned which may be related to using USB such as usb-storage for ufat shared directory.


Edited by balaton on 2024/6/3 23:14:43
Edited by balaton on 2024/6/3 23:41:57
Edited by balaton on 2024/6/3 23:44:34
Edited by balaton on 2024/6/3 23:45:08
Edited by balaton on 2024/6/3 23:48:03
Edited by balaton on 2024/6/4 0:09:18
Go to top
Re: QEMU GPU vfio-pci pass through
Quite a regular
Quite a regular


See User information
I've got a report with RadeonRX 550 vfio-pci pass through with pegasos2 machine. The pegasos2.rom could not run the card ROM (the same way it cannot run the QEMU vgabios) and crashed on it so could not boot but AmigaOS seems to boot without the firmware with just -kernel bboot and the AmigaOS RadeonRX driver then inits the card. This seemed to work but resulted in slow performance (even without using USB). I'm not sure if it's because some missing init from the firmware that the RadeonRX driver does not do or some other issue with either pegasos2, its emulation or host setup. I've got a debug log of this here in case @Hans could spot anything in it or if somebody can get similar log from a real machine with same card so we could compare these.

Go to top
Re: QEMU GPU vfio-pci pass through
Not too shy to talk
Not too shy to talk


See User information
@balaton - thanks for starting this thread as a bit of a summary!

I have a feeling I have the perfect storm of non-working hardware, but I'm not giving up just yet realistically I can't remember enough of where I was last year with it to even get it as far as I got it before. I need to read back through the mega-thread to remind myself.

First of all, I am doing this for fun only. Obviously I now have X5040 as well.

From my a initial attempts this evening and yesterday, I have the following cards:
- HD5450
- R9 270
- RX550
- R5 230 (I'll ignore this one. It was bought to try MOS. That didn't work and it's a very low performance card anyway that I don't think is supported by OS4)

The 5450 I can get to boot using bboot only. The OS4 startup sound completes and nothing seems to lockup but all I get out of OS4 is a blank screen entitled "Workbench". Nothing more. This feels a lot like a software setup issue on the OS4 side. Regardless, this is a very weak card that gets super hot. I'm not too concerned if this one works or not.

The RX550 (again bboot plus Kickstart.zip) won't boot at all with the v2.12 (public version) RX driver. Just a blank screen. I do have a test driver that Hans sent me privately last year. I don't really know if I'm meant to discuss it? It also doesn't work but does get a little further. With either one, I think you only get one shot and would have to reboot the host each time for the card to fully reset. That's my very hazy memory from last year anyway.

So on the facer of it, the R9 270 probably gives the most promise but for now hard locks the host. This is where I need to read back through the mega-thread. BUT, is it not the case that I would need the V5 HD drivers for this Southern Islands based card? If I'm correct then it's also a dead end for me as it wouldn't really be worth me buying them just to play about and see how well it works.

But I don't rule out the possibility of something in the future. A little ITX system might be a nice hobby project.


Amiga x5040 ı 16GB ı RX580
GB-A1000 060@100,
A1200 PiStorm32-Lite CM4
Go to top
Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


See User information
@balaton
Quote:
- RadeonHD cards may work but likely need the machine firmware to init the graphics card ...

- RadeonRX cards that have AtomBIOS but the guest firmware cannot run it may work with just -kernel bboot without -bios machine firmware, because the AmigaOS driver may be able to parse AromBIOS and init the card but this may not always or fully work ...
AFAIK all drivers from Hans, Radeon HD and RX, support AtomBIOS.
Only the much older ATIRadeon.chip driver for R100/R200 cards like Radeon 7000 (not HD 7000!), 9000, 9250, etc. only works after the firmware has executed the gfx card x86 BIOS.
Some other ancient AmigaOS gfx card drivers like 3dfxVoodoo.card work without executing the gfx card x86 BIOS by the firmware as well.

But even if you use a gfx card with exactly the same gfx chip someone has used successfully already a different BIOS from a different vendor, or even just a different version of the BIOS from the same vendor, may not work.

@MartinW
Quote:
- R5 230 (I'll ignore this one. It was bought to try MOS. That didn't work and it's a very low performance card anyway that I don't think is supported by OS4)
According to https://www.acube-systems.biz/compatibility/compatibility_41.php R5 230 should work.

Quote:
BUT, is it not the case that I would need the V5 HD drivers for this Southern Islands based card?
Not according to https://hdrlab.org.nz/projects/amiga-o ... r-hardware-compatibility/
It's very old and doesn't seem to have been updated for several years, but in the "Radeon R9 Series" part the driver versions used are only 1.2 - 2.7.

From https://wiki.amiga.org/index.php/RadeonHD
Quote:
Oland chipsets (HD R7 240/240D, HD R7 250) require the newer RadeonHD drivers (v1.20, v2.21 or v3.6) exclusively available from the Enhancer Software

I don't know which Radeon HD cards didn't work with the Enhancer Software V3 driver yet but require the separately sold V5 driver instead.
For Radeon RX cards there is no separately sold driver, the current version is included in Enhancer Software.

Go to top
Re: QEMU GPU vfio-pci pass through
Not too shy to talk
Not too shy to talk


See User information
Just a not-so-small update.

I managed to get the HD5450 card to what appeared to be fully working this evening by using the pegasos bios and starting bboot from the pegasos firmware. Unless you really want 32bit graphics however, there isn't really much advantage to running GPU passthrough with this card over the SM501 emulated card. It's really a low end "desktop only" kind of card and the minute you hit anything like gaming you're not really any better off. I mean, it does work, and you do get 1080p at 32bit, but it's struggling by then. I didn't try video - I never remember to!

Worth noting that I never really felt the CPU was struggling - but I didn't measure it. The system I'm using currently is an old system too. I think it's my Intel i7 6700k machine which is I think 8 years old now?

Another thing of note is the fact that the Pegasos firmware emulation does crash before it fully init's the card but you can carry on and it still seems to work. This is the same for all the cards. I had no luck with bboot alone. I also had to force PCI bus 0. Bus 1 refused to work. But I do know on my newer Risen based system bus 1 worked and not bus 0. So it looks like experimentation is required and YMMV.

My R9 270 is a different matter. I've tried every combination I can think of and while I can boot it and Workbench will load it only takes a mouse click or two and the guest completely hangs, so it looks like there's a clash there that I'm going to suggest seems like IRQ, but I'm wildly throwing that guess in there and it could just as easily be anything. I haven't dug any deeper yet, that's a task for another night.

I haven't tried the R5 230 card because I would think it's even weaker than the 5450 but if anyone really wants me too then I can without too much bother.

I also haven't been back to the RX550 yet. I do know that I wasn't able to get that one to work at all. Maybe I'll revisit it.

Bit of a chicken and egg situation here - if things worked nicely, I'd really be considering building a blazing fast and portable mini ITX system, but I don't have the knowhow to help make things work I don't think.

I appreciate this is all a bit rambling and not very structured. If anyone wants me to test anything specific just say. The beauty of this system is it's my workbench PC and works happily whether I'm messing about with Qemu or not. I could do with getting MacOS booting on it again at some point but that shouldn't interfere with this.

PS: I only have access to Classic, Pegasos and X5000 versions of OS4 and I need to limit what I'm spending here a bit so I don't really want to be being any more versions.


Amiga x5040 ı 16GB ı RX580
GB-A1000 060@100,
A1200 PiStorm32-Lite CM4
Go to top
Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


See User information
@MartinW
GfxBench2D results would be interesting, for example compared to

geennaam's MSI R9 270X with QEmu on a Core i5 CPU, 1920×1080@61 (32 bit)
https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2773
Overall Score: 5,237.82

and a HD5450 in a Sam460ex, 1024×768@60 (32 bit)
https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2660
Overall Score: 1,688.95

Quote:
I haven't tried the R5 230 card because I would think it's even weaker than the 5450

https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2345
seems to be a gfx card with the same core as R5 230, 1280×1024@60 (32 bit) in a X5000/20, Overall Score: 1,668.38, about the same as the HD5450 in the Sam460ex, but since it's using a higher resolution the results are not really comparable.


Edited by joerg on 2024/6/6 6:08:48
Edited by joerg on 2024/6/6 6:09:55
Go to top
Re: QEMU GPU vfio-pci pass through
Not too shy to talk
Not too shy to talk


See User information
I’ll try and do that. Will be interesting. Obviously I can’t get scores for the R9 card since it hangs the system but I can do the 5450 and also the R5. I actually have the R9 card in my X5040 at the moment. When I looked up the specs I realised it’s potentially a better card than an RX550 but it retains the soft boot capability so I’m taking it for a spin to see what it’s like.


Amiga x5040 ı 16GB ı RX580
GB-A1000 060@100,
A1200 PiStorm32-Lite CM4
Go to top
Re: QEMU GPU vfio-pci pass through
Home away from home
Home away from home


See User information
@balaton

The log you shared has nothing of note. However, Joerg has spotted something that could explain it: very slow VRAM access. See his post in the other thread, (link).

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: QEMU GPU vfio-pci pass through
Not too shy to talk
Not too shy to talk


See User information
Just did this one before I start pulling machines apart again:
http://hdrlab.org.nz/benchmark/gfxbench2d/OS/AmigaOS/Result/2785

That was my R9 270X in my X5040, but note that I'm not using the v5 driver which means no GART (I believe) so don't expect lightning results. Even the pointer doesn't feel desperately smooth. Looking at other scores, my results look terrible however, even for that driver!

Copy from RAM to VRAM:
Transfer size16327680 bytes
Src
0x553ab000Dest0x83487400
copy32
78.831 MiB/(took 0.197528 seconds)
copy6478.794 MiB/(took 0.197620 seconds)
copy64f134.954 MiB/(took 0.115382 seconds)
copy64x278.821 MiB/(took 0.197553 seconds)
copy64fx2134.875 MiB/(took 0.115450 seconds)
copy64fx2PF134.954 MiB/(took 0.115382 seconds)
copy64fx4PF134.957 MiB/(took 0.115380 seconds)
useMemcpy78.803 MiB/(took 0.197597 seconds)
useExecCopyMem78.815 MiB/(took 0.197567 seconds)
copyToVRAM134.917 MiB/(took 0.115414 seconds)
WritePixelArray134.882 MiB/(took 0.115444 seconds). 

Copy from VRAM to RAM:
Transfer size16327680 bytes
Src
0x83487400Dest0x553ab000
copy32
13.310 MiB/(took 1.169859 seconds)
copy6412.870 MiB/(took 1.209849 seconds)
copy64f20.292 MiB/(took 0.767344 seconds)
useMemcpy12.566 MiB/(took 1.239176 seconds)
useExecCopyMem12.531 MiB/(took 1.242603 seconds)
copyFromVRAM23.175 MiB/(took 0.671892 seconds)
ReadPixelArray22.941 MiB/(took 0.678747 seconds).


[UPDATE] For some reason I can't get the 5450 GPU passthrough to be even as stable as it was a few days ago. I distinctly remember playing a few games the other day before reporting it working but today as soon as I try to do anything more than a few mouse clicks the entire guest VM locks up. I'll try again soon but if I can't get anything stable then of course I can't supply any benchmarks.

This is all somewhat strange. It's not the same machine as I did the bulk of my testing on last year (that one is now a gaming PC) but I certainly had this working "well enough". I probably really should make notes!


Edited by MartinW on 2024/6/7 16:55:40

Amiga x5040 ı 16GB ı RX580
GB-A1000 060@100,
A1200 PiStorm32-Lite CM4
Go to top
Re: QEMU GPU vfio-pci pass through
Quite a regular
Quite a regular


See User information
@Hans
I don't know what code path copy from VRAM goes through in QEMU so I also don't know where to look for why it could be slow. I goess it could be found with some profiling or if somebody knows how vfio-pci works. Also is there a similar test for Linux? Then we could try if running a Linux guest with the same passed through card reproduces the issue to confirm it's not AmigaOS specific. Maybe x11perf has some test for this?

Go to top
Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


See User information
@balaton
Quote:
I don't know what code path copy from VRAM goes through in QEMU
If it's not a direct VRAM access from application code but using the AmigaOS functions:
Classic Amiga/AmigaOne/Pagasos2 with G2 or G3 CPU: CPU copy loop using the 64 bit FPU registers, very slow.
AmigaOne/Pagasos2 with G4 CPU: CPU copy loop using the 128 bit vector registers, only a little bit better.
Sam4x0/X1000/X5000 (and maybe A1222 too): If the requirements are met (aligned and large enough size) the DMA engine of the CPU is used for the copy which us much faster, for unaligned and small copies it's the same slow CPU copy loop as on G2, G3 or G4 (X1000).

If the emulation of the 4x0 CPUs DMA engine is using host DMA it should be much faster than AmigaOne/Pegasos2 for copies from/to VRAM, but if the emulation of the 4x0 DMA engine uses host CPU integer/FPU/vector accesses instead there is probably no difference.

In this post I quoted some GfxBench2D results. AFAIK "Copy to VRAM" and "Copy from VRAM" are using direct CPU (integer/FPU/vector) accesses of the VRAM in the benchmark tool and especially when reading from VRAM that's very slow, while "Write Pixel Array" and "Read Pixel Array" are using the AmigaOS functions with DMA instead.

Go to top
Re: QEMU GPU vfio-pci pass through
Quite a regular
Quite a regular


See User information
@joerg
Considering the RageMem results copying with simple CPU loop should not be slow as that's compiled to host code. If AmigaOS or the driver tries to do some tricks which might be faster on real CPU it might be a problem. Or accessing passed through memory may cause IO region reads which would need to exit the JIT compiled code and that could be slower.

The DMA engine of the 460EX is just a memcpy or memmove in the optimised case or line by line copy if the region is not continuous. It is implemented in hw/ppc/ppc440_uc.c so adding some logging there could find out if it is used at all in this case. But I think it does not work on sam460ex at the moment and it's slow on others where there's no DMA engine so this isn't a problem.

At least we know (or can find out having the source) what x11perf does so testing with that on a Linux guest may be easier than guessing what AmigaOS does.

Go to top
Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


See User information
@balaton
Quote:
Considering the RageMem results copying with simple CPU loop should not be slow as that's compiled to host code.
Any CPU access to VRAM over ZorroIII/PCI/PCIe is extremely slow, no matter if real or emulated hardware.
RageMem results are useless, especially if it's used on fake hardware like the SM50x emulation of QEmu (no real VRAM but emulated in DRAM) or the WinUAE uaegfx.card emulation.
You'd have to check for example the WinUAE Voodoo3 emulation instead, where you get the same slow VRAM accesses as on real hardware.

Quote:
The DMA engine of the 460EX is just a memcpy or memmove in the optimised case or line by line copy if the region is not continuous.
A bcopy()/memmove()/memcpy() implementation on the host for the guest's DMA engine is unusable, if the host doesn't use host DMA for guest DMA you'll never get any usable results, no matter if it's a passed-trough vfio-pci real gfx card or a virtio-gpu emulated one with an AmigaOS driver Hans is trying to implement.

Go to top
Re: QEMU GPU vfio-pci pass through
Quite a regular
Quite a regular


See User information
@joerg
Quote:
Any CPU access to VRAM over ZorroIII/PCI/PCIe is extremely slow, no matter if real or emulated hardware.

It does not seem so on real hardware from test results (it's slower than RAM access but not as slow as we get on QEMU) and also should not be true on emulated hardware where VRAM is just regular RAM.

Quote:
RageMem results are useless, especially if it's used on fake hardware like the SM50x emulation of QEmu (no real VRAM but emulated in DRAM) or the WinUAE uaegfx.card emulation.
You'd have to check for example the WinUAE Voodoo3 emulation instead, where you get the same slow VRAM accesses as on real hardware.

What RageMem results show is that simple access without tricks is the fastest whereas using things that may be faster on real CPU can be slower on QEMU. The Voodoo3 is also emulated VRAM so if it's slower then it's some other problem and it's not related to QEMU at all so not much to check on that. WinUAE is a different emulator so this says nothing about passed through card with QEMU.

Quote:
A bcopy()/memmove()/memcpy() implementation on the host for the guest's DMA engine is unusable, if the host doesn't use host DMA for guest DMA you'll never get any usable results, no matter if it's a passed-trough vfio-pci real gfx card or a virtio-gpu emulated one with an AmigaOS driver Hans is trying to implement.

The virtio-gpu is also an emulated card on the guest side and AFAIK Hans's driver does not use direct access to VRAM as that's not yet fully supported in current QEMU versions, there are still patches for that to be merged. What we don't know yet is where the speed is lost so without finding that out it's just wild guesses. So instead of trying to guess it would be better to investigate and find where's the bottleneck. It could be anywhere from AmigaOS, driver, QEMU, host and we don't know what the AmigaOS and driver part is doing so it's hard to guess. Trying instead X11, Linux, QEMU, passed through card which are all open source so we know what is used could help to find if the problem is in the QEMU and host side or in the guest side. Then we know where to look further to find the place where speed is lost.

DMA just frees the guest CPU while the transfer is done but probably does not make the VRAM access faster. It still has to go through the PCI bus so I think the problem is not the PCI bus here but something else somewhere either in QEMU or the guest side.

Go to top
Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


See User information
@balaton
Quote:
DMA just frees the guest CPU while the transfer is done but probably does not make the VRAM access faster.
Wrong, DMA copies from/to VRAM are usually 10-100 times faster than CPU copies from/to VRAM.

On the X5000 it's about 25 times faster: https://lists.hdrlab.org.nz/benchmark/ ... 2d/OS/AmigaOS/Result/2348
Copy from VRAM (CPU) 40.33 MiB/s
Read Pixel Array (DMA) 995.02 MiB/s

Go to top
Re: QEMU GPU vfio-pci pass through
Quite a regular
Quite a regular


See User information
@joerg
Quote:
Wrong, DMA copies from/to VRAM are usually 10-100 times faster than CPU copies from/to VRAM.

The PCI bus is the same speed if it's accessed by DMA or CPU. The difference is that the CPU is free to do something else when DMA is used while it's busy when it has to push through data but the speed is probably limited by the PCI bus not the CPU speed. Unless it's a very slow CPU but we're talking about the host here which should be much faster even with direct CPU access. So I think speed is limited even before it reaches the host and not by host CPU accessing card's VRAM.

Go to top
Re: QEMU GPU vfio-pci pass through
Just can't stay away
Just can't stay away


See User information
@balaton
I've updated my previous post. There is a huge difference between CPU and DMA accesses.

Go to top
Re: QEMU GPU vfio-pci pass through
Not too shy to talk
Not too shy to talk


See User information
Not sure how much of the following is still relevant. I tried to post it earlier but the forum wouldn’t let me:

I’m finding it slow with real hardware in x5040 too. I put my RX580 back in and I’m seeing very low read speeds even though the card generally performs well. I don’t really want to muddy the water by posting non qemu figures here.


Amiga x5040 ı 16GB ı RX580
GB-A1000 060@100,
A1200 PiStorm32-Lite CM4
Go to top
Re: QEMU GPU vfio-pci pass through
Just popping in
Just popping in


See User information
PCIE access with cpu is slow. You can test it in Linux by running X11 with "vesa" driver (instead of "nvidia" or whatever). Long ago it seems it was easier to quickly switch X11 driver for tests. Now it seems more difficult. You may have to do "init 3" (maybe "init 2" on some distributions??) to get out of display managers. And you may have to rmmod nvidia* (or whatever) kernel modules used by normally used X11 driver.

You can modify "xorg.conf" and save as "xorgvesa.conf" file after in Device section you replace driver from "nvidia" (or whatever) to "vesa" and also add

Option "ShadowFB" "0"

Otherwise framebuffer reads don't happen in real VRAM, but in a shadow buffer in RAM.

Then you can

startx -- -xf86config xorgvesa.conf

To test VRAM reads use "x11perf -shmget500". The results per second it shows - on a 32 bit screen (4 bytes per pixel) - should be the million bytes per second, because 500x500x4=1000000.

Go to top
Re: QEMU GPU vfio-pci pass through
Home away from home
Home away from home


See User information
@balaton

Quote:
The PCI bus is the same speed if it's accessed by DMA or CPU. The difference is that the CPU is free to do something else when DMA is used while it's busy when it has to push through data but the speed is probably limited by the PCI bus not the CPU speed. Unless it's a very slow CPU but we're talking about the host here which should be much faster even with direct CPU access. So I think speed is limited even before it reaches the host and not by host CPU accessing card's VRAM.

The clock speed doesn't change, but the amount of overhead does. DMA transfers tend to happen in large blocks, so the per-packet overhead is small. CPU transfers tend to go in small blocks, so the overhead becomes more significant. This gets really bad when transferring 64-bits or less. The worst case would be reading a single byte, which requires two packets: the request, and response. I remember seeing a graph showing very large overhead at small transfer block sizes. Unfortunately, I can't find it right now.

Some PCIe controllers can merge multiple consecutive transfers into larger blocks, provided that the accesses happen close enough together (and in the correct order).

I don't think that the overhead could fully explain some of the absolutely abysmal results, though. Okay, it might if the host machine is stuck at PCIe v1, but I assume that the host machines all have PCIe3 or newer. Radeon HD 78xx series cards are PCIe3 (lower end Southern Islands are PCIe2).

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top

  Register To Post
(1) 2 »

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project