Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
120 user(s) are online (80 user(s) are browsing Forums)

Members: 1
Guests: 119

NinjaCyborg, more...

Support us!

Headlines

 
  Register To Post  

« 1 (2) 3 4 5 »
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


See User information
@nikitas
The fastest passed-trough gfx card until now is a MSI R9 270x:
https://www.hdrlab.org.nz/benchmark/gf ... 2d/OS/AmigaOS/Result/2773
About 1/3 (X5000) to 1/2 (X1000) of the fastest results on real hardware.

If you can get such a gfx card very cheap you may try it, but it's not only the gfx card itself, different host hardware (AMD Ryzen CPUs seem to be much faster for QEmu than Intel CPUs, ARM based CPUs like the Apple M1/M2/M3 may even be faster than Ryzen CPUs, but since Macs don't have PCIe slots they are useless), different Linux versions as well as differences in the BIOS/UEFI may make a difference as well.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
Since we don't know what causes the issue we also can't tell if another card would work better. That's why I opened the thread to gather test results to see what are common problems independent of host or card and what are problems specific to host/card/machine. If you can use Linux perf tool with QEMU then you could try to do some profiling of the GfxBench tests that show slow speed to see what parts of QEMU are called a lot. I don't know a good tutorial on that but there should be some docs on profiling on Linux with perf.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton
Not that I have much experience in Linux profiling, but I will try it.

@joerg
Yes, Apple M2 for example is much faster with Qemu-PegasosII-AOS4.1. There might be PCIe solution for Macs, the eGPU devices. But they are so expensive. Around 800 Euros. I'd rather prefer to get a real PPC machine.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

Regarding pref profiling, I don't know if you meant something like this:


1) I start Qemu on isolated core 14.

taskset -c 14 qemu-system-ppc \
-machine pegasos2 \
-m 2G \
-kernel bboot -initrd Kickstart.zip \
-rtc base=localtime \
-drive if=none,id=DH0,file=amigahdd-System.img,format=raw -device ide-hd,drive=DH0 \
-drive if=none,id=DH1,file=amigahdd-Work.img,format=raw  -device ide-hd,drive=DH1 \
-device rtl8139,netdev=ETH0 -netdev user,id=ETH0 \
-device vfio-pci,host=04:00.0,bus=pci.0,x-vga=on \
-device vfio-pci,host=04:00.1,bus=pci.0 \
-device bochs-display \
-vga none \
-serial stdio \
-d guest_errors,unimp


2) I started GfxBench2D on AOS4 guest

3) Started perf mem using:

sudo perf mem record --cpu=14


4) GfxBench2D reaches test 17 of 54. And here I stopped the perf mem process.

5) Converted perf.data to perf.data.txt using:

sudo perf script perf.data.txt


A small sample of the produced (7mb) file is:

qemu-system-ppc    3084 [014]   307.311576:      26865         cpu_atom/mem-stores/P:  ffffffffa5467150 sched_clock+0x10 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.315621:      21426         cpu_atom/mem-stores/P:      7b8c764a1ab2 pthread_mutex_unlock@@GLIBC_2.2.5+0x52 (/usr/lib/x86_64-linux-gnu/libc.so.6)
 
qemu-system-ppc    3084 [014]   307.315886:      21426         cpu_atom/mem-stores/P:      7b8c2ccff74b [unknown] (/tmp/perf-3081.map)
 
qemu-system-ppc    3081 [014]   307.317646:      19949         cpu_atom/mem-stores/P:      7b8c76c9c7c6 [unknown] (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.8000.0)
 
qemu-system-ppc    3084 [014]   307.320145:      16060         cpu_atom/mem-stores/P:      5d4271f5dab4 dcbz_common.isra.0+0x64 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.321686:      14512         cpu_atom/mem-stores/P:  ffffffffa596f342 signalfd_poll+0x72 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.323710:      12180         cpu_atom/mem-stores/P:  ffffffffa5405134 __switch_to_asm+0x34 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.324404:      12180         cpu_atom/mem-stores/P:  ffffffffa5584328 __update_load_avg_se+0xd8 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.325995:      10274         cpu_atom/mem-stores/P:      5d427205d200 do_ld4_mmu+0x0 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.327769:       8506         cpu_atom/mem-stores/P:  ffffffffa62bbb8b ____sys_recvmsg+0x6b ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.329555:       7106         cpu_atom/mem-stores/P:      7b8c76589420 __memset_avx2_unaligned_erms+0x20 (/usr/lib/x86_64-linux-gnu/libc.so.6)
 
qemu-system-ppc    3081 [014]   307.329797:       6343         cpu_atom/mem-stores/P:      7b8c6f6c484c [unknown] (/usr/lib/x86_64-linux-gnu/libdbus-1.so.3.32.4)
 
qemu-system-ppc    3084 [014]   307.330404:       6370         cpu_atom/mem-stores/P:  ffffffffa6626584 sched_clock_noinstr+0x4 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.331819:       5641         cpu_atom/mem-stores/P:      5d42720147d7 flatview_read+0x87 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.331830:       5062         cpu_atom/mem-stores/P:  ffffffffa5469f08 os_xsave+0x38 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.332287:      19857         cpu_atom/mem-stores/P:  ffffffffa5467150 sched_clock+0x10 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.332341:      18734         cpu_atom/mem-stores/P:  ffffffffa58f9f25 fput+0x5 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.333967:      25054         cpu_atom/mem-stores/P:      5d427205c136 probe_access_internal+0xf6 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.337424:      18004         cpu_atom/mem-stores/P:  ffffffffa5919988 do_sys_poll+0x48 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.338404:      18004         cpu_atom/mem-stores/P:  ffffffffa55710d3 update_cfs_group+0x3 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.341940:      13674         cpu_atom/mem-stores/P:  ffffffffa54d6520 switch_mm_irqs_off+0x10 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.343822:      12427         cpu_atom/mem-stores/P:      5d427205f82e probe_access+0x1e (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3084 [014]   307.344404:      11081         cpu_atom/mem-stores/P:  ffffffffa556a7b5 update_load_avg+0x675 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.346003:       9473         cpu_atom/mem-stores/P:  ffffffffa591977a do_poll.constprop.0+0x20a ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.348020:       7881         cpu_atom/mem-stores/P:      7b8c76c9d21e [unknown] (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.8000.0)
 
qemu-system-ppc    3084 [014]   307.348221:       7881         cpu_atom/mem-stores/P:      7b8c2f04f62d [unknown] (/tmp/perf-3081.map)
 
qemu-system-ppc    3081 [014]   307.350048:       7580         cpu_atom/mem-stores/P:      5d42720147a2 flatview_read+0x52 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3084 [014]   307.350300:       6763         cpu_atom/mem-stores/P:      5d427205c070 probe_access_internal+0x30 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.352070:       6385         cpu_atom/mem-stores/P:  ffffffffa55f5a95 syscall_exit_to_user_mode_prepare+0x5 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.352082:       5701         cpu_atom/mem-stores/P:  ffffffffa5571a08 dequeue_entity+0x128 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.353412:      18304         cpu_atom/mem-stores/P:      5d4272229db4 timerlist_deadline_ns+0x4 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.354102:      16447         cpu_atom/mem-stores/P:      7b8c75436fb0 [unknown] (/usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0)
 
qemu-system-ppc    3081 [014]   307.356132:      14063         cpu_atom/mem-stores/P:  ffffffffa5860eba rmqueue+0x81a ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.358157:      12767         cpu_atom/mem-stores/P:  ffffffffa5971665 eventfd_poll+0x5 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.360178:      11627         cpu_atom/mem-stores/P:  ffffffffa5467150 sched_clock+0x10 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.362201:      10631         cpu_atom/mem-stores/P:  ffffffffa6642225 _raw_spin_unlock_irqrestore+0x5 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.363669:       9760         cpu_atom/mem-stores/P:      7b8c2f04f471 [unknown] (/tmp/perf-3081.map)
 
qemu-system-ppc    3081 [014]   307.364237:       8749         cpu_atom/mem-stores/P:  ffffffffa5971665 eventfd_poll+0x5 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.366253:       7584         cpu_atom/mem-stores/P:  ffffffffa5eddd01 tty_poll+0x31 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.366266:       7584         cpu_atom/mem-stores/P:  ffffffffa557717c pick_next_task_fair+0x8c ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.368281:      21293         cpu_atom/mem-stores/P:  ffffffffa5919678 do_poll.constprop.0+0x108 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.369762:      19089         cpu_atom/mem-stores/P:      5d4272011bc3 address_space_translate_for_iotlb+0x23 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.372333:      15775         cpu_atom/mem-stores/P:  ffffffffa5ee3f70 n_tty_poll+0x10 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.374364:      14261         cpu_atom/mem-stores/P:      5d42722230ab qemu_lockcnt_cmpxchg_or_wait+0x22b (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.376388:      12932         cpu_atom/mem-stores/P:  ffffffffa591977d do_poll.constprop.0+0x20d ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.377412:      11781         cpu_atom/mem-stores/P:      7b8c76c9a7db [unknown] (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.8000.0)
 
qemu-system-ppc    3081 [014]   307.379434:       9792         cpu_atom/mem-stores/P:  ffffffffa58f9f25 fput+0x5 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.379951:       9792         cpu_atom/mem-stores/P:      5d427205c136 probe_access_internal+0xf6 (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.381470:       8484         cpu_atom/mem-stores/P:  ffffffffa662a1c8 ct_kernel_exit_state+0x8 ([kernel.kallsyms])
 
qemu-system-ppc    3081 [014]   307.381613:       7599         cpu_atom/mem-stores/P:      7b8c764a0138 pthread_mutex_lock@@GLIBC_2.2.5+0x158 (/usr/lib/x86_64-linux-gnu/libc.so.6)
 
qemu-system-ppc    3084 [014]   307.381648:       8308         cpu_atom/mem-stores/P:      7b8c764a0031 pthread_mutex_lock@@GLIBC_2.2.5+0x51 (/usr/lib/x86_64-linux-gnu/libc.so.6)
 
qemu-system-ppc    3081 [014]   307.381662:      14821         cpu_atom/mem-stores/P:      5d42722230ab qemu_lockcnt_cmpxchg_or_wait+0x22b (/usr/local/bin/qemu-system-ppc)
 
qemu-system-ppc    3081 [014]   307.383489:      40886         cpu_atom/mem-stores/P:      7b8c76c93b24 g_source_ref+0x4 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.8000.0)
 
qemu-system-ppc    3081 [014]   307.387538:      29160         cpu_atom/mem-stores/P:  ffffffffa5971665 eventfd_poll+0x5 ([kernel.kallsyms])
 
qemu-system-ppc    3084 [014]   307.391589:      23192         cpu_atom/mem-stores/P:  ffffffffa555cb2e sched_core_idle_cpu+0xe ([kernel.kallsyms])


If this is quite the process, here is the full output:

https://file.io/5Algg1ZdFo53

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@joerg

Quote:
I think I've suggested that several years ago already, but can you explain (again) why using something like
IMMU->SetMemoryAttrs(page_alinged_command_buffer, page_aligned_buffer_size, MEMATTRF_WRITETHROUGH|MEMATTRF_READ_WRITE);

or if that doesn't work either because it's not write-only but has to be (re)read by the CPU without caching as well
IMMU->SetMemoryAttrs(page_alinged_command_buffer, page_aligned_buffer_size, MEMATTRF_CACHEINHIBIT|MEMATTRF_COHERENT|MEMATTRF_GUARDED|MEMATTRF_READ_WRITE);
doesn't work?

Write-though or even cache-inhibited DRAM should be much faster than cache-inhibited VRAM over ZorroIII/PCI/PCIe.
At least in some of my classic Amiga OS4 parts I used write-through mapped memory because it was faster, and much easier to use, than cached memory with manual cache flushing.


The challenge is in guaranteeing that the data is there before the GPU touches it. With the command stream, one byte wrong will likely result in a hung GPU.

It's been a while, so I can't remember everything that I've tried any more. I do remember that MEMATTRF_COHERENT does *nothing* on the Sam460, despite what CPU docs may say. I think I tried using MEMATTRF_WRITETHROUGH. If I did, then it didn't work. There's still a window when the GPU could read the wrong data.

What's really annoying, is that a cache flush/invalidate followed by a sync instruction should guarantee that the data has reached RAM. Yet, the GPU still locked up.

Mind you, when I worked on this I wasn't able to test it directly myself. I had to send the test versions to a beta tester who would email me back the results. That's not the easiest way to debug an issue.

Quote:
The glyph cache of ft2.library is in DRAM instead of in a BitMap in VRAM??? That would be very bad.
I'm not 100% sure anymore, but I think for the text rendering in OWB I used a 8 bit (alpha only) BitMap for an additional glyph cache, IIRC for the last 256 used glyphs (because of the unicode support there can be much more), each char blitted from the glyph cache to a text line sized 32 bit bitmap, adding the text colour, which then was copied to the window with CompositeTags() for the anti-aliasing.

Unless something has changed, the graphics library's Text() function is CPU rendered.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
More useful may be:
perf record -sz --call-graph=lbr
or similar. If your CPU does not support lbr you may need to compile with debug enabled but that may introduce further slow down so not sure how useful those results would be.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

The results while running the command above:

https://file.io/3TvU0POHsIyX

It's again up to test 16/54.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
Use -p or --pid= for perf record to only record qemu-system-ppc events. Start is before running the read tests and stop afterwards. We only need the read tests that show slow performance but you can do a separate profile for running the whole test for comparison. Then generate report with perf report -g as perf script only shows the raw events that are not too useful and I don't know if that can be converted or used for reports.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

Ok, so I ran:

sudo perf record -sz --call-graph=lbr -p 8149


Just before starting the GfxBench2D test.

Then I stopped perf when it reached test 18/54.

And then converted to a readable file. Decompressed file size is 600MB.


https://file.io/fx37WyGXd878

FYI: I had compiled Qemu with --enable-lto.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

I don't know if I can export

sudo perf report -g


as a file. Also, I cannot upload the original perf.data to file.io probably due to its binary type.


Sending you an image of what I'm seeing.

https://ibb.co/Czt3f7x

I observed a call to the helper_raise_exception_err function, if it means something.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
I can't use perf script output because all other perf commands to analyse the trace need the binary perf.data and I don't know how to convert it back to that format. Either upload the perf.data or the output of perf report -g

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

Ok, I sent it via email.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
Thanks but I can't open it because it says it's newer format than my perf version supports. I'll need to get a newer version or I think perf report -g --stdio might work to export the parsed report.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

Yes, because for some reason... I upgraded the Linux Kernel to the latest 6.9.3:

https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.9.3.tar.xz

And then I compiled perf from its sources, like:

cd linux-6.9.3/tools/perf
make


Now, I ran perf with --stdio & exported to a file.

It's not so beautified, but quite ok.
And I don't know if it is the whole report (765Kb).

I sent it via email.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
This is the whole report but function names listed in it don't make sense so maybe --call-graph=lbr does not work on your CPU or maybe the --enable-lto option interferes. You could try compiling QEMU with default options, no extra optimisation and also try other --call-graph options with perf record.

From the picture you've posted in #30 nothing seems to stand out and most of the time is spent running JITed guest code so it's not slowed down somewhere in QEMU (although I don't see what's below that JIT block which takes ~40%, you could try expanding it with e button and take another picture of that). The helper_raise_exception_err is because some exception is happening you can use 'info irq' command in QEMU monitor to see which exception is raised frequently. The numbers are defined in qemu/target/ppc/cpu.h.

It might still help to test with Linux guest and x11perf to check if this is specific to AmigaOS or happens with any guest. There is some documentation here on how to run Linux on these machines.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

At first, I was using a logical core that corresponded to an Intel E-Core instead of P-Core.

I recompiled Qemu with:

../configure --enable-gcrypt --enable-modules --enable-module-upgrades --enable-vhost-user-blk-server --enable-libusb


I did the entire process again and I sent you the perf report via email. Don't know if it makes any more sense now.

I read that the 13th Gen Intel® Core™ i5-13400 CPU supports LBR.

I will try a Linux PPC distro with Qemu when I have time.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
This last one now has usable info. I'm not sure I understand it completely but looks like either there are a lot of exceptions for some reason so check that with 'info irq' in QEMU monitor to see which exceptions are raised. Or there's a translation block that runs slowly for some reason but I don't know how to find out what that TB does. There are QEMU options to log these but maybe there's a better way to debug it that I don't know. If we can't figure out from the exceptions what happens maybe we'd need to try to check what the slow TB is doing to find out where it slows down but may need to ask on the QEMU list about how to do that.

About Linux/X11 reports there are some info in the other thread so you could also report results from Linux testing there.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

This time, while GfxBench2D was running its tests, I was typing those commands.

I don't know if they provide some more insights to you:

(Especially, I was typing "info irq" several times during GfxBench2D first tests)

info irq --> https://ibb.co/fD2f8P7
info mtree pt.1 --> https://ibb.co/gPSX2TW
info mtree pt.2 --> https://ibb.co/XV9rfxg
info mtree pt.3 --> https://ibb.co/vm7J2h4
info pci --> https://ibb.co/D8WCZ5m

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
There seems to be no excessive number of interrupts so it's not slowed down by that and the profile showed that most time was spent in a TB that probably accesses some VRAM. I don't know a good way to find what's in that TB. Maybe you can try 'perf mem record' as you did first but I could not parse the results you've sent so try 'perf mem report' now as we did with the perf record output, maybe that shows something.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
Hello, @balaton

I ran the perf mem command using the isolated core 13 like this:
sudo perf mem record -sz --call-graph=lbr -c 13


I generated reports with those commands:
sudo perf mem report --stdio perf-mem.data.txt


And:
sudo perf inject -i perf.data --jit -o perf.data.jitted -f

sudo perf report -./perf.data.jitted  ---stdio per-mem.data.jitted.txt


Is this what you asked for?

Sent them via email.

Go to top

  Register To Post
« 1 (2) 3 4 5 »

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project