Forums - All Posts - The Amigans website

Forum Index

Board index » All Posts (Georg)

Bottom

(1) 2 3 4 ... 6 »

Georg

Re: qemu 200% host CPU usage at idle?

Posted on: 7/16 7:11 #1

Just popping in

@joergQuote:

joerg wrote:@balaton
The vcpu can only be stopped by QEmu itself, for example when accessing the emulated TimeBase TBU/TBL registers in the AmigaOS MicroDelay() function.

I don't think there's any stop during such things. I think QEmu just generates code that jumps to what they call "helper" functions.

Topic | Forum

Georg

Re: qemu 200% host CPU usage at idle?

Posted on: 7/13 10:46 #2

Just popping in

Quote:

Hans wrote:
I just tried the CPUTemp docky on os4depot, and the idle.task is still eating up all free CPU time. Looks like there's no easy way to disable it.

It may be that AOS4 requires an idle task (one that is always TS_READY), if it otherwise doesn't know how to handle the case of no-ready-task-to-run. Unlike AOS3 which handles it in exec/Dispatch() (loop until there is one with "stop" instruction inside the loop to sleep until interrupt happens which can cause some task to become ready).

If not, "Disable(), Remove(FindTask("idle_task_name")), Enable()" should disable it (without really killing it).

Or write your own idle task which priority 1 highter than the system one. And in your idle task have a loop which inside calls the whatever PPC instruction(s) which cause the cpu to go to sleep on qemu (maybe the emulated cpu handles such instructions even for cpus that in real life don't have them).

Topic | Forum

Georg

Re: qemu 200% host CPU usage at idle?

Posted on: 7/12 19:56 #3

Just popping in

@LiveForIt

If the guest OS does something (like "stop" on 68k, or "hlt" on x86) which lets a real cpu know that it is supposed to sleep (until interrupt happens), then an emulation of that cpu can know, too.

Here MorphOS in qemu (-machine mac99) shows about 8 .. 9 % qemu cpu usage on host with htop.

AROS Sam PPC in qemu (-machine sam460ex) shows about 5 .. 6 % qemu cpu usage on host with htop.

(old version of qemu 6.1.94)

AROS Sam PPC seems to be doing this to go idle:


wrmsr(rdmsr() | MSR_POW | MSR_EE);

__asm__ __volatile__("sync; isync;");

__asm__ __volatile__("wrteei 0");

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/7 7:07 #4

Just popping in

Is the slowness still there if you pass through slow gfx card without actually using it in AOS. Maybe you need to move it somewhere from it's directory "(DEVS:Monitors"?)to prevent it from being loaded.

If not, what if you then start gfx driver manually (double click driver icon?).

What, if you then change wb screenmode to use that gfx card.

What, if you then change back screenmode to something (not that gfx card) else again.

Topic | Forum

Georg

Re: A1222eth vs. p1022eth driver

Posted on: 7/6 11:15 #5

Just popping in

Disassembly looks like it may be operations on some Exec List. Second one maybe AddTail(). So it could be missing list protection (disable, sem, mutex, whatever). Or other list errors (double add, remove node which is not in list, node freed but still in list, ...)

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/4 15:51 #6

Just popping in

@HansQuote:

Hans wrote:@nikitas
[quote]
I'm shocked that the graphics card had any impact on MicroDelay(), because the graphics card is NOT involved in that function.

It's not known if it's MicroDelay() specifically or if "with that gfx card" any other task which does something (heavy, like some other kind of benchmark, or calculation, or even just a compilation of some code) would see the same slowdown.

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 15:48 #7

Just popping in

If MicroDelay shows more or less expected results with one gfx card, but not another gfx card (with otherwise same config) then it's more likely that problem is not MicroDelay, but something else. Like maybe tons of interrupts happening with one gfx card, but not the other?

I would try repeating the test with slow gfx card, but test loop changed to be surrounded by Disable()/Enable() (if that makes it fast, try Forbid()/Permit()). If microdelay is just a busy loop - which is likely - it should still work in disabled state. You might have to use a watch and check time it takes yourself, as AOS timer.device may behave wrong (long disabled state, timer register overflows, whatever).

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 20:41 #8

Just popping in

@balatonQuote:

balaton wrote:@Georg
To help testing, could you please share your Linux kernel options and xorg.config to show how to set up vesafb and the x11perf command again so others can reproduce that test without having to find out the right config?

Could be wrong, but I don't think the x11 "vesa" driver needs any special Linux kernel options. There's another X11 driver "fbdev" which does use that Linux kernel framebuffer stuff.

In theory to use "vesa" driver it's just a matter of editing xorg.conf (in /etc/X11) (or save a modified version whereever you want) and look in the "Device" section in there and edit it to say:

Driver "vesa"
Option "ShadowFB" "0"

Many years ago that was enough. But nowadays if you try to start X11 (startx -- -xf86config myxorg.conf) it may fail and the log (var/log/Xorg.0.log) says "vesa: Ignoring device with a bound kernel driver". That seems to be because of the still existing normal gfx card (in my case "nvidia") kernel modules in memory.

So here what I do is to first log out of desktop, use CTRL ALT F1 to switch to virtual console, run "init 3" to get rid of X11 (KDE) display manager, then "lsmod | grep nvidia", then "rmmod" the modules (you need to find the right order, ie. which ones to remove first, otherwise it says "module is in use by ...") and then "startx -- -xf8config myxorg.conf". For some reason here the screen first appears somewhat broken (don't know if it's just the monitor), ~zoomed, ~like_wrong_modulo, so I also have to do some CTRL ALT F1 -> CTRL ALT F7 forth and back switching and then it displays fine.

If the thing is slow and you see flickering mouse sprite (because of disabled shadow framebuffer) in front of gfx updates (like "glxgears" window) it worked.

Google how to disable "compositing" on your desktop. There may be some shortcut key for it. To verify that it's disabled run "xcalc" or "xclock" from a terminal. Press CTRL+Z to freeze the program. Then drag it's window out of screen and back in. If this creates gfx trash or gfx disappering (like text/numbers) then it worked. (Happens because program is frozen and cannot update/refresh areas of window which became hidden and then visible again. With enabled compositor this does not happen, because the windows contents are backed up in their own pixmaps=bitmaps and the contents don't get lost when dragged out of view or behind things).

x11perf -shmput500
x11perf -shmget500

It's unlikely that it is not running in 4 byte per pixel screenmode (so that you can interpret x11perf results/sec as million_bytes/sec) but if you want to check then look if "xdpyinfo" says "32" for "bitmap unit". Tough I'm not 100 % sure that really reflects the "bytes per pixel". (don't know or remember why but AROS hosted X11 driver even creates a dummy test XImage and then picks the bytes per pixel from it).

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 7:37 #9

Just popping in

@HansQuote:

Hans wrote:@nikitas

That certainly would cause some slowdown, although it cannot explain the difference between geennaam & nikitas' results.

You mentioned MicroDelay() and it could be caused by it as it will be some kind of busy loop checking some powerpc timer register. If qemu emulation of it is not very precise (may depend on host or even host (kernel) configuration = there may be difference between running Linux distribution A vs distribution B) then this will slow things down as it will cause the delay to last (possibly much) longer than expected.

Could be tested with a little AOS4 program which for example calls MicroDelay(10) 100000 times in a loop. Should complete in 1 second. If it takes (much) longer -> problem.

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/1 14:33 #10

Just popping in

Maybe some of the speed loss is really just normal because it has to go through software mmu emulation. Can't compare with UAE/AOS Classic where you usually don't have that at all.

https://airbus-seclab.github.io/qemu_blog/tcg_p3.html

Qemu might have to generate and later execute really quite a lot of code for each memory access!?

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 6/28 15:55 #11

Just popping in

Quote:

VRAM access

Btw, if guest has access to VRAM of pass through gfx card => qemu process has to have access, too. Maybe if you know (real) address of VRAM you can compile a small vram benchmark function into qemu . Then run qemu with gdb, break into gdb, "jump my_vram_benchmark_function". To see what kind of transfer rates native code (instead of emulated, jit-ed PPC code) has with such a pass-through gfx card.

Topic | Forum

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 6/21 7:54 #12

Just popping in

@HansQuote:

Hans wrote:

I've also toyed with the idea of using the vblanking interrupt as a signal to flush.

In AROS hosted on Linux (runs like a normal Linux program, does Exec task switching by poking signal context) the gfx driver uses X11 pixmaps and X11 windows for screens and friend bitmaps of screens (the gfx system does not rely on direct bitmap or framebuffer access). So for example a graphics.library/RectFill() may end up as XFillRectangle() into a X11 pixmap. A BltBitMap() may end up as XCopyArea().

X11 input events (mouse move, key press) are checked/handled at INTB_VERTB intervals and that's where XFlush() is called.

Topic | Forum

Georg

Re: QEMU GPU vfio-pci pass through

Posted on: 6/15 14:00 #13

Just popping in

@Hans

4 byte per pixel (xdpyinfo says "depth: 32 planes"). So 500x500*4=1 million meaning results per sec = million bytes per second.

Topic | Forum

Georg

Re: QEMU GPU vfio-pci pass through

Posted on: 6/14 18:34 #14

Just popping in

@HansQuote:

Hans wrote:@all

So has anyone run any RAMVRAM benchmarks for the host machine yet? It would be useful to know what the CPU and DMA performance is supposed to look like.

Hans

Pretty old computer here: (ASRock Z97 Pro3, i5-4590 3.3 GHz, 16 GB RAM, Nvidia GeForce RTX 2060, OpenSuse Leap 15.4 64 Bit):


Nvidia binary driver:

     read (  6850.0/sec): ShmGetImage 500x500 square

     write(  3860.0/sec): ShmPutImage 500x500 square



Vesa driver with shadowfb disabled:

     read (    21.4/sec): ShmGetImage 500x500 square

     write(   321.0/sec): ShmPutImage 500x500 square



Vesa driver with shadowfb enabled (default):

     read ( 14500.0/sec): ShmGetImage 500x500 square

     write( 13600.0/sec): ShmPutImage 500x500 square

SysBench memory benchmark says read (7896.88 MiB/sec) and write (6098.10 MiB/sec).

Looking through sources a bit it may be that X11 Vesa driver uses memcpy() for GetImage (~ ReadPixelArray) and PutImage (~ WritePixelArray) so it might be that it copies 64 bit at a time, not just 32 bit (~ 1 pixel) as one might expect.

I haven't checked but with enabled shadowfb it may be, that not all write/put (¯ WritePixelArray) get copied to real vram, instead maybe updates from shadowfb to real fb only happen in intervals (like 1 frame).

Topic | Forum

Georg

Re: QEMU GPU vfio-pci pass through

Posted on: 6/11 7:27 #15

Just popping in

@HansQuote:

Hans wrote:@balaton

Does anyone have a suitable Linux test tool for CPU & DMA RAMVRAM copy speeds? It would be very useful to see those results.

X11 GetImage and ShmGetImage functions is the same as ReadPixelArray in AOS land.

X11 PutImage and ShmPutImage functions is the same as WritePixelArray in AOS land.

"Shm" means shared memory, so memory does not need to be copied between X11 client app and X11 server.

This can be benchmarked in Linux with "x11perf -getimage500", "x11perf -shmget500", "x11perf -putimage500", "x11perf -shmput500".

With standard X11 gfx driver (like nvidia) this will use some kind of DMA.

Using "vesa" x11 driver this should use cpu only. But as said in a previous post, this will by default likely use a shadow buffer in RAM to avoid slow reads from VRAM. So you should set ShadowFB to 0 in X11 config if you want read results from VRAM. And disable compositor (there might be a shortcut key for that) so that the putimage calls write directly to screen/vram, instead of a window pixmap which in a second step gets composited to screen/vram.

Edited by Georg on 2024/6/11 8:32:17

Topic | Forum

Georg

Re: QEMU GPU vfio-pci pass through

Posted on: 6/9 19:10 #16

Just popping in

PCIE access with cpu is slow. You can test it in Linux by running X11 with "vesa" driver (instead of "nvidia" or whatever). Long ago it seems it was easier to quickly switch X11 driver for tests. Now it seems more difficult. You may have to do "init 3" (maybe "init 2" on some distributions??) to get out of display managers. And you may have to rmmod nvidia* (or whatever) kernel modules used by normally used X11 driver.

You can modify "xorg.conf" and save as "xorgvesa.conf" file after in Device section you replace driver from "nvidia" (or whatever) to "vesa" and also add

Option "ShadowFB" "0"

Otherwise framebuffer reads don't happen in real VRAM, but in a shadow buffer in RAM.

Then you can

startx -- -xf86config xorgvesa.conf

To test VRAM reads use "x11perf -shmget500". The results per second it shows - on a 32 bit screen (4 bytes per pixel) - should be the million bytes per second, because 500x500x4=1000000.

Topic | Forum

Georg

Re: Qemu Pegasos II interrupts issue

Posted on: 4/20 16:30 #17

Just popping in

@joerg

Quote:

Maybe in some other exec implementations

I just tried a little test in (maybe old) AOS4 for WinUAE. While in Forbid() state It counts 10000 times the number of nodes in sysbase->taskready (it is consistent with what is shown by ranger or scout). Nothing else. Then it sleeps a bit and repeats (until ctrl-c pressed). Most of the time the 10000 results are the same. Every once and then they are not!

Topic | Forum

Georg

Re: Qemu Pegasos II interrupts issue

Posted on: 4/20 7:50 #18

Just popping in

@joergQuote:

Quote:
Also the version history in the readme says "Use Forbid() instead of Disable() when reading task lists" which is wrong/bug.
No.

Task lists can change during interrupt. For example a Signal() in interrupt (for example a replied io request, like timer) can move a task from taskwait list to taskready list.

Topic | Forum

Georg

Re: Qemu Pegasos II interrupts issue

Posted on: 4/19 19:24 #19

Just popping in

@joergQuote:

joerg wrote:@balaton

One of the best tools for checking CPU usage on AmigaOS4 should be Tequila (source), because a few parts of it are based on my 20 years old "top" tool

Btw, how much overhead does the tool itself cause if it relies on quite big number of (soft) interrupts per second (default: docs say 5000, source says 999). How expensive are (soft) interrupts in AOS4? Maybe not quite as much as task switches (in case they don't need to save fpu state) - you were talking about how much speed up there is in filesystems if one avoids AOS3 style app task <-> filesystem task switches.

Also the version history in the readme says "Use Forbid() instead of Disable() when reading task lists" which is wrong/bug.

Topic | Forum

Georg

Re: Qemu Pegasos II interrupts issue

Posted on: 4/11 20:27 #20

Just popping in

@HansQuote:

Hans wrote:@all

Looking into this a bit more, it looks like the problem might be triggered when an interrupt from the VirtioGPU device comes in while the OS is still busy calling interrupt handlers for another device.

Another idea: maybe the OS assumes (and possibly did something to make sure) that interrupts don't nest (get called during another interrupt.), or maybe at least don't nest if it's the same interrupt (number). And this for whatever reason is ignored by Qemu.

Can anyone test on real hw if OS prevents nesting interrupts (maybe just if interrupt number is the same).

Topic | Forum

Top

(1) 2 3 4 ... 6 »