Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
37 user(s) are online (30 user(s) are browsing Forums)

Members: 0
Guests: 37

more...

Support us!

Headlines

 
  Register To Post  

« 1 ... 8 9 10 (11) 12 »
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@Georg

Quote:
You mentioned MicroDelay() and it could be caused by it as it will be some kind of busy loop checking some powerpc timer register. If qemu emulation of it is not very precise (may depend on host or even host (kernel) configuration = there may be difference between running Linux distribution A vs distribution B) then this will slow things down as it will cause the delay to last (possibly much) longer than expected.

Could be tested with a little AOS4 program which for example calls MicroDelay(10) 100000 times in a loop. Should complete in 1 second. If it takes (much) longer -> problem.

Yes, that would be worth testing. Libauto automatically opens the timer.device, so linking with -lauto will set up ITimer. Then something like this should work (NOTE: untested):

#include <proto/timer.h>
#include <stdio.h>

int main(int argc, const char **argv) {
    
unsigned usDelay 1;
    
unsigned count 1000000;
    
printf("Calling MicroDelay(%u) %u times\n"usDelaycount);
    for(
unsigned i 0count; ++i) {
        
ITimer->MicroDelay(usSize);
    }
    
printf("Done! This should have taken %.2f seconds. How long did it actually take?\n", ((double)usDelay count) / 1000000.0);
}


Edited by Hans on 2024/7/2 12:36:29
Join the Kea Campus - upgrade your skills; support my work; enjoy the Amiga corner.
https://keasigmadelta.com/ - see more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@Hans
Nikitas did not have a USB vfat device in the commands but may try removing the network (but I think also tried that and it didn't help). On pegasos2 all these may share the same interrupt so maybe there's still an issue with these after all the patches and level sensitive setting in BBoot? Anyway trying to reproduce with Linux could avoid all these and check if the problem is only with how AmigaOS does things or independent of that.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@Hans
I can test it on real peg2 and on qemu's peg2, but in snipped you post usSize is undeclared and usDelay = ; (no value).

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


See User information
@kas1e
#include <proto/timer.h>
#include <stdio.h>

int main(int argc, const char **argv) {
    
uint32 usDelay 1;
    
uint32 count 1000000;
    
printf("Calling MicroDelay(%lu) %lu times\n"usDelaycount);
    for(
uint32 i 0count; ++i) {
        
ITimer->MicroDelay(usDelay);
    }
    
printf("Done! This should have taken %f seconds. How long did it actually take?\n", ((double)usDelay count) / 1000000.0);
}

But it may be better to use for example usDelay = 10 and count = 100000, or usDelay = 100 and count = 10000.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@Hans

Quote:
I just got a reminder of some of geennaam's older discoveries. According to him, his Radeon R9 270x worked well provided that he didn't share part of his hard-drive with AmigaOS as a USB drive, and he also had to shut down ethernet. With either of those enabled, he got massive slowdown.


Okay, so do I have to remove the ethernet from the QEMU command only, or should I also shut down the ethernet on the host? Regarding the USB drive, I have a secondary real SSD drive on "/dev/sdb" that I use in the QEMU command. Is this OK? For the mouse/keyboard, I use bochs-display. Is this OK, too?

@balaton
Indeed, virsh makes it more complex. So, I will create a new QEMU setup for this when I have time.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
Only remove from the QEMU command. It was said that if the guest has a USB disk as with the ufat shared folder then it ran slower with vfio for some reason. I don't know if @geennaam ever talked about a network card. It does (or should) not matter what you have on the host, the theory is that maybe having other PCI devices in the guest like USB or network card may interfere with interrupts from the graphics card. Now they are on different bus and we had several patches to fix this but who knows. This was with an RadeonHD card and used pci.1 so it's different than what you've tried but we have not better idea at the moment. So remove all -device usb-* and -device rtl8139 from QEMU command line and see if that changes anything.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@Hans
Quote:
I also noticed that everyone using VFIO, is using QEmu in KVM mode, which means that the guest OS can execute code on the host CPU directly instead of via emulation. I found nothing about VFIO usage with the TCG based emulator. Looks like we're in uncharted territory.

Most of the people who do this want to play games in a VM that run on Windows while they want to run Linux. So they want to have the most performance and use KVM and vfio. This may mean that using it with TCG is not tested that much and with PPC at all but that does not mean it should not work. Of course we're on uncharted territory, not many people run AmigaOS on QEMU and even less tried vfio GPU pass through so it's not something that was tested and known to work. Some people tried it before for MacOS but gave up because there the firmware is needed to run the FCode ROM of the Mac graphics card (or a suitable ROM for a PC card) for MacOS to even recognise the card but QEMU's OpenBIOS can't run FCode ROMs and real Mac ROMs don't run with QEMU. (I had patches to fix both of these but they aren't upstream so one can only experiment with it with applying patches from different places so only a few people even tried. Somebody once managed to get a Rage128Pro working but don't know if it was usable.)

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@kas1e
I've corrected the code.

@nikitas
Quote:
Okay, so do I have to remove the ethernet from the QEMU command only, or should I also shut down the ethernet on the host? Regarding the USB drive, I have a secondary real SSD drive on "/dev/sdb" that I use in the QEMU command. Is this OK? For the mouse/keyboard, I use bochs-display. Is this OK, too?

Remove it from the QEmu command line, and use your Radeon R7 240 for testing (geennaam said that it had no effect on his RX 5x0 cards).

I have no idea bout using the secondary real SSD drive, or the bochs-display. If you can boot to AmigaOS without them, then try removing both from the QEmu args.

@balaton
VFIO is obviously working with TCG. I was hoping to get some idea of what the overhead was when used with TCG instead of KVM, and maybe some tips on what to try.

Hans

Join the Kea Campus - upgrade your skills; support my work; enjoy the Amiga corner.
https://keasigmadelta.com/ - see more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balatonQuote:
balaton wrote:@Georg
To help testing, could you please share your Linux kernel options and xorg.config to show how to set up vesafb and the x11perf command again so others can reproduce that test without having to find out the right config?


Could be wrong, but I don't think the x11 "vesa" driver needs any special Linux kernel options. There's another X11 driver "fbdev" which does use that Linux kernel framebuffer stuff.

In theory to use "vesa" driver it's just a matter of editing xorg.conf (in /etc/X11) (or save a modified version whereever you want) and look in the "Device" section in there and edit it to say:

Driver "vesa"
Option "ShadowFB" "0"

Many years ago that was enough. But nowadays if you try to start X11 (startx -- -xf86config myxorg.conf) it may fail and the log (var/log/Xorg.0.log) says "vesa: Ignoring device with a bound kernel driver". That seems to be because of the still existing normal gfx card (in my case "nvidia") kernel modules in memory.

So here what I do is to first log out of desktop, use CTRL ALT F1 to switch to virtual console, run "init 3" to get rid of X11 (KDE) display manager, then "lsmod | grep nvidia", then "rmmod" the modules (you need to find the right order, ie. which ones to remove first, otherwise it says "module is in use by ...") and then "startx -- -xf8config myxorg.conf". For some reason here the screen first appears somewhat broken (don't know if it's just the monitor), ~zoomed, ~like_wrong_modulo, so I also have to do some CTRL ALT F1 -> CTRL ALT F7 forth and back switching and then it displays fine.

If the thing is slow and you see flickering mouse sprite (because of disabled shadow framebuffer) in front of gfx updates (like "glxgears" window) it worked.

Google how to disable "compositing" on your desktop. There may be some shortcut key for it. To verify that it's disabled run "xcalc" or "xclock" from a terminal. Press CTRL+Z to freeze the program. Then drag it's window out of screen and back in. If this creates gfx trash or gfx disappering (like text/numbers) then it worked. (Happens because program is frozen and cannot update/refresh areas of window which became hidden and then visible again. With enabled compositor this does not happen, because the windows contents are backed up in their own pixmaps=bitmaps and the contents don't get lost when dragged out of view or behind things).

x11perf -shmput500
x11perf -shmget500

It's unlikely that it is not running in 4 byte per pixel screenmode (so that you can interpret x11perf results/sec as million_bytes/sec) but if you want to check then look if "xdpyinfo" says "32" for "bitmap unit". Tough I'm not 100 % sure that really reflects the "bytes per pixel". (don't know or remember why but AROS hosted X11 driver even creates a dummy test XImage and then picks the bytes per pixel from it).

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@Hans

Quote:
Remove it from the QEmu command line, and use your Radeon R7 240 for testing


No, I tried all the possible combinations, and it didn't go any faster. The only thing that maybe helped a little was removing bochs-diplay and using Evdev for USB devices.

Also, the command:
cpufreq-set -g performance

Made a visible difference. But just a bit.

I also tried a funny thing using a real vga-to-vga on a small old monitor I found. I got the error "Couldn't create screen mode." I think this monitor supports 640x480.


Edited by nikitas on 2024/7/3 7:05:53
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@nikitas

It's a pity that disabling the ethernet & other things made no difference.

Could you try compile and run the code I gave in this post above? Remember to link with -lauto.

Edit: It should compile with gcc -o MicroDelayTest MicroDelayTest.c -lauto

It'll let us know if there's a problem with MicroDelay() or not.

Hans

Join the Kea Campus - upgrade your skills; support my work; enjoy the Amiga corner.
https://keasigmadelta.com/ - see more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@Hans

Yes, of course. I'll try it with both GPUs, and I'll get to you with the results.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@Hans

Running this script with:
- R7 240 attached, took about: 24 seconds.
- RadeonRX 550 attached, took about: 1.50 or 2 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked, took about: 1.0 or 1.5 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked and Ethernet attached and using bochs-display, took about: 1.0 or 1.5 seconds (same as the test above)

When I enable interrupts on Screemode, the systems seem to run slower overall, though this test appears to execute faster.

(Nobody can switch hardware on the fly and run the test faster than me )


Edited by nikitas on 2024/7/3 5:10:10
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton @Hans

Before running the QEMU process, the command sudo lspci -vv -s 0000:01:00.0 shows this:
0000:01:00.0 VGA compatible controllerAdvanced Micro DevicesInc. [AMD/ATILexa PRO [Radeon 540/540X/550/550X RX 540X/550/550X] (rev c7) (prog-if 00 [VGA controller])
    
SubsystemSapphire Technology Limited Lexa PRO [Radeon 540/540X/550/550X RX 540X/550/550X]
    
ControlI/OMemBusMasterSpecCycleMemWINVVGASnoopParErrSteppingSERRFastB2BDisINTx-
    
StatusCap66MHzUDFFastB2BParErrDEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERRINTx-
    
Latency0Cache Line Size64 bytes
    Interrupt
pin A routed to IRQ 255
    IOMMU group
16
    Region 0
Memory at 60e0000000 (64-bitprefetchable) [size=256M]
    
Region 2Memory at 60f0000000 (64-bitprefetchable) [size=2M]
    
Region 4I/O ports at 7000 [disabled] [size=256]
    
Region 5Memory at 85f00000 (32-bitnon-prefetchable) [size=256K]
    
Expansion ROM at 85f40000 [disabled] [size=128K]
    
Capabilities: [48Vendor Specific InformationLen=08 <?>
    Capabilities: [50] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <1us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 8GT/s, Width x8
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-
             AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
             EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150 v2] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [200 v1] Physical Resizable BAR
        BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB
    Capabilities: [270 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Capabilities: [2b0 v1] Address Translation Service (ATS)
        ATSCap:    Invalidate Queue Depth: 00
        ATSCtl:    Enable-, Smallest Translation Unit: 00
    Capabilities: [2c0 v1] Page Request Interface (PRI)
        PRICtl: Enable- Reset-
        PRISta: RF- UPRGI- Stopped+
        Page Request Capacity: 00000020, Page Request Allocation: 00000000
    Capabilities: [2d0 v1] Process Address Space ID (PASID)
        PASIDCap: Exec+ Priv+, Max PASID Width: 10
        PASIDCtl: Enable- Exec- Priv-
    Capabilities: [320 v1] Latency Tolerance Reporting
        Max snoop latency: 15728640ns
        Max no snoop latency: 15728640ns
    Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)
        ARICap:    MFVC- ACS-, Next Function: 1
        ARICtl:    MFVC- ACS-, Function Group: 0
    Capabilities: [370 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=0us PortTPowerOnTime=170us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=184320ns
        L1SubCtl2: T_PwrOn=170us
    Kernel driver in use: vfio-pci
    Kernel modules: amdgpu



After running the command:
sudo cpufreq-set -g performance &&
sudo taskset -c 4 /home/niki/qemu/build/qemu-system-ppc \
-machine pegasos2 \
-m 2G \
-kernel /home/niki/qmiga-PigasosII/bboot -initrd /home/niki/qmiga-PigasosII/Kickstart.zip \
-rtc base=localtime \
-drive if=none,id=DH0,file=/dev/sda,format=raw -device ide-hd,drive=DH0 \
-device vfio-pci,id=RadeonRX550-VGAController,host=0000:01:00.0,x-vga=on,bus=pci.0 \
-device vfio-pci,id=RadeonRX550-AudioController,host=0000:01:00.1,bus=pci.0 \
-device bochs-display \
-device rtl8139,netdev=ETH0 -netdev user,id=ETH0 \
-vga none \
-serial stdio \
-d guest_errors,unimp


The command sudo lspci -vv -s 0000:01:00.0 shows this:
0000:01:00.0 VGA compatible controllerAdvanced Micro DevicesInc. [AMD/ATILexa PRO [Radeon 540/540X/550/550X RX 540X/550/550X] (rev c7) (prog-if 00 [VGA controller])
    
SubsystemSapphire Technology Limited Lexa PRO [Radeon 540/540X/550/550X RX 540X/550/550X]
    
ControlI/OMemBusMasterSpecCycleMemWINVVGASnoopParErrSteppingSERRFastB2BDisINTx-
    
StatusCap66MHzUDFFastB2BParErrDEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERRINTx-
    
Latency0Cache Line Size64 bytes
    Interrupt
pin A routed to IRQ 16
    IOMMU group
16
    Region 0
Memory at 60e0000000 (64-bitprefetchable) [size=256M]
    
Region 2Memory at 60f0000000 (64-bitprefetchable) [size=2M]
    
Region 4I/O ports at 7000 [size=256]
    
Region 5Memory at 85f00000 (32-bitnon-prefetchable) [size=256K]
    
Expansion ROM at 85f40000 [disabled] [size=128K]
    
Capabilities: [48Vendor Specific InformationLen=08 <?>
    Capabilities: [50] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <1us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s (downgraded), Width x8
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-
             AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
             AtomicOpsCtl: ReqEn+
        LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
             EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150 v2] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [200 v1] Physical Resizable BAR
        BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB
    Capabilities: [270 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Capabilities: [2b0 v1] Address Translation Service (ATS)
        ATSCap:    Invalidate Queue Depth: 00
        ATSCtl:    Enable-, Smallest Translation Unit: 00
    Capabilities: [2c0 v1] Page Request Interface (PRI)
        PRICtl: Enable- Reset-
        PRISta: RF- UPRGI- Stopped+
        Page Request Capacity: 00000020, Page Request Allocation: 00000000
    Capabilities: [2d0 v1] Process Address Space ID (PASID)
        PASIDCap: Exec+ Priv+, Max PASID Width: 10
        PASIDCtl: Enable- Exec- Priv-
    Capabilities: [320 v1] Latency Tolerance Reporting
        Max snoop latency: 15728640ns
        Max no snoop latency: 15728640ns
    Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)
        ARICap:    MFVC- ACS-, Next Function: 1
        ARICtl:    MFVC- ACS-, Function Group: 0
    Capabilities: [370 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=0us PortTPowerOnTime=170us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=184320ns
        L1SubCtl2: T_PwrOn=170us
    Kernel driver in use: vfio-pci
    Kernel modules: amdgpu



Before:
LnkSta:    Speed 8GT/sWidth x8


After:
LnkSta:    Speed 2.5GT/(downgraded), Width x8


GRUB Configuration:
GRUB_CMDLINE_LINUX="default_hugepagesz=2MB intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 rd.driver.pre=vfio-pci rd.driver.blacklist=amdgpu modprobe.blacklist=amdgpu vfio-pci.disable_idle_d3=1 isolcpus=4 nohz_full=4 rcu_nocbs=4 irqaffinity=0-3,5,7 pcie_aspm=off pcie_port_pm=off"


Is this a problem, or is it expected?


Edited by nikitas on 2024/7/3 10:13:03
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


See User information
@nikitas

For the science:

// gcc MicroDelay.c -Wall -O3 -lauto

#include <proto/timer.h>

#include <stdio.h>

static void DoMicroDelay(uint32 microsecondsuint32 count)
{
    
struct TimeVal ab;

    
ITimer->GetSysTime(&a);

    for (
uint32 i 0counti++) {
        
ITimer->MicroDelay(microseconds);
    }

    
ITimer->GetSysTime(&b);

    
double duration = (b.Seconds 1000000 b.Microseconds) -
                      (
a.Seconds 1000000 a.Microseconds);

    
printf("%lu * %lu microseconds took %f seconds\n"countmicrosecondsduration 1000000.0);
}

int main()
{
    
DoMicroDelay(11000000);
    
DoMicroDelay(10100000);
    
DoMicroDelay(10010000);
    
DoMicroDelay(10001000);

    return 
0;
}

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@nikitas

Quote:
Is this a problem, or is it expected?

Not a problem, but also not expected.

Quote:
Running this script with:
- R7 240 attached, took about: 24 seconds.
- RadeonRX 550 attached, took about: 1.50 or 2 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked, took about: 1.0 or 1.5 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked and Ethernet attached and using bochs-display, took about: 1.0 or 1.5 seconds (same as the test above)

When I enable interrupts on Screemode, the systems seem to run slower overall, though this test appears to execute faster.


You're RX 550 results aren't too far off, but the R7 240 result is 24x slower than it should be. I didn't expect there to be a dramatic difference depending on which graphics card is plugged in. That doesn't make sense. It does confirm that MicroDelay() can indeed be a problem, although it's not necessarily the cause of the massive graphics slow-down.

Hans

Join the Kea Campus - upgrade your skills; support my work; enjoy the Amiga corner.
https://keasigmadelta.com/ - see more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@Capehill
Quote:
For the science:

I'll do it and inform you.

@Hans
Quote:
Looks like we're in uncharted territory.

I'm the poor canary flying into the mineral mine tunnel to see if toxic gas exists further inside. Let's see if I die...

I even cleaned the connectors and PCIe slot with Isopropyl 90. I found the (hidden) mvme SSD, removed it, and placed it in another slot away from the CPU. (in case it was using PCIe lanes as read). What else can somebody do, I wonder.


Could be a problem that I don't do Single-GPU passthrough? I use the integrated GPU for my host and I pass the RX550 through QEMU/VFIO, plugged in a second monitor for the guest OS.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
Here's some more motivation on what we might be able to achieve:
Resized Image


While QEMU uses one thread for vcpu it is multithreaded and has another thread for other tasks and maybe also an IO thread. So confining this to a single CPU core may not be a good idea. What if you drop all the tweaks of isolating CPU cores using taskset and setting irq affinity and just run QEMU normally? The host OS should be able to schedule the threads on its own.

Using other cards on the host should not interfere as long as they are in different vfio groups. Did you check vfio groups and established that the graphics card you're passing through is in its own group (with its sound function) and you pass all devices in that group? Also re-reading @geennaam's experiment in the long thread he used multifunction=on for the graphics function to create it as multifunction device as the sound part is at the same ID and another function of the card. I don't think that matters but have no better idea now.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@Hans
Maybe the HD card could work if MicroDelay didn't have a problem with that card. This might mean these cards are slow for different reasons. It's also possible that problem with HD card is not actually in MicroDelay but could it be that something is disabling multitasking so the test runs slower even though the delay would return in time? What could do that with only the HD card but not RX card and how could that be confirmed? Maybe snoopy can log calls to Disable/Forbid and see if there are more of these with the HD card than RX card?

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
If MicroDelay shows more or less expected results with one gfx card, but not another gfx card (with otherwise same config) then it's more likely that problem is not MicroDelay, but something else. Like maybe tons of interrupts happening with one gfx card, but not the other?

I would try repeating the test with slow gfx card, but test loop changed to be surrounded by Disable()/Enable() (if that makes it fast, try Forbid()/Permit()). If microdelay is just a busy loop - which is likely - it should still work in disabled state. You might have to use a watch and check time it takes yourself, as AOS timer.device may behave wrong (long disabled state, timer register overflows, whatever).

Go to top

  Register To Post
« 1 ... 8 9 10 (11) 12 »

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project