Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
52 user(s) are online (22 user(s) are browsing Forums)

Members: 0
Guests: 52

more...

Support us!

Headlines

Forum Index


Board index » All Posts (Georg)




Re: QEMU GPU vfio-pci pass through
Just popping in
Just popping in


@Hans

4 byte per pixel (xdpyinfo says "depth: 32 planes"). So 500x500*4=1 million meaning results per sec = million bytes per second.

Go to top


Re: QEMU GPU vfio-pci pass through
Just popping in
Just popping in


@HansQuote:
Hans wrote:@all

So has anyone run any RAMVRAM benchmarks for the host machine yet? It would be useful to know what the CPU and DMA performance is supposed to look like.

Hans


Pretty old computer here: (ASRock Z97 Pro3, i5-4590 3.3 GHz, 16 GB RAM, Nvidia GeForce RTX 2060, OpenSuse Leap 15.4 64 Bit):

Nvidia binary driver:
     
read (  6850.0/sec): ShmGetImage 500x500 square
     write
(  3860.0/sec): ShmPutImage 500x500 square

Vesa driver with shadowfb disabled
:
     
read (    21.4/sec): ShmGetImage 500x500 square
     write
(   321.0/sec): ShmPutImage 500x500 square

Vesa driver with shadowfb enabled 
(default):
     
read 14500.0/sec): ShmGetImage 500x500 square
     write
13600.0/sec): ShmPutImage 500x500 square


SysBench memory benchmark says read (7896.88 MiB/sec) and write (6098.10 MiB/sec).

Looking through sources a bit it may be that X11 Vesa driver uses memcpy() for GetImage (~ ReadPixelArray) and PutImage (~ WritePixelArray) so it might be that it copies 64 bit at a time, not just 32 bit (~ 1 pixel) as one might expect.

I haven't checked but with enabled shadowfb it may be, that not all write/put (¯ WritePixelArray) get copied to real vram, instead maybe updates from shadowfb to real fb only happen in intervals (like 1 frame).

Go to top


Re: QEMU GPU vfio-pci pass through
Just popping in
Just popping in


@HansQuote:
Hans wrote:@balaton

Does anyone have a suitable Linux test tool for CPU & DMA RAMVRAM copy speeds? It would be very useful to see those results.


X11 GetImage and ShmGetImage functions is the same as ReadPixelArray in AOS land.

X11 PutImage and ShmPutImage functions is the same as WritePixelArray in AOS land.

"Shm" means shared memory, so memory does not need to be copied between X11 client app and X11 server.

This can be benchmarked in Linux with "x11perf -getimage500", "x11perf -shmget500", "x11perf -putimage500", "x11perf -shmput500".

With standard X11 gfx driver (like nvidia) this will use some kind of DMA.

Using "vesa" x11 driver this should use cpu only. But as said in a previous post, this will by default likely use a shadow buffer in RAM to avoid slow reads from VRAM. So you should set ShadowFB to 0 in X11 config if you want read results from VRAM. And disable compositor (there might be a shortcut key for that) so that the putimage calls write directly to screen/vram, instead of a window pixmap which in a second step gets composited to screen/vram.


Edited by Georg on 2024/6/11 8:32:17
Go to top


Re: QEMU GPU vfio-pci pass through
Just popping in
Just popping in


PCIE access with cpu is slow. You can test it in Linux by running X11 with "vesa" driver (instead of "nvidia" or whatever). Long ago it seems it was easier to quickly switch X11 driver for tests. Now it seems more difficult. You may have to do "init 3" (maybe "init 2" on some distributions??) to get out of display managers. And you may have to rmmod nvidia* (or whatever) kernel modules used by normally used X11 driver.

You can modify "xorg.conf" and save as "xorgvesa.conf" file after in Device section you replace driver from "nvidia" (or whatever) to "vesa" and also add

Option "ShadowFB" "0"

Otherwise framebuffer reads don't happen in real VRAM, but in a shadow buffer in RAM.

Then you can

startx -- -xf86config xorgvesa.conf

To test VRAM reads use "x11perf -shmget500". The results per second it shows - on a 32 bit screen (4 bytes per pixel) - should be the million bytes per second, because 500x500x4=1000000.

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


@joerg

Quote:

Maybe in some other exec implementations


I just tried a little test in (maybe old) AOS4 for WinUAE. While in Forbid() state It counts 10000 times the number of nodes in sysbase->taskready (it is consistent with what is shown by ranger or scout). Nothing else. Then it sleeps a bit and repeats (until ctrl-c pressed). Most of the time the 10000 results are the same. Every once and then they are not!

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


@joergQuote:


Quote:
Also the version history in the readme says "Use Forbid() instead of Disable() when reading task lists" which is wrong/bug.
No.


Task lists can change during interrupt. For example a Signal() in interrupt (for example a replied io request, like timer) can move a task from taskwait list to taskready list.

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


@joergQuote:
joerg wrote:@balaton

One of the best tools for checking CPU usage on AmigaOS4 should be Tequila (source), because a few parts of it are based on my 20 years old "top" tool


Btw, how much overhead does the tool itself cause if it relies on quite big number of (soft) interrupts per second (default: docs say 5000, source says 999). How expensive are (soft) interrupts in AOS4? Maybe not quite as much as task switches (in case they don't need to save fpu state) - you were talking about how much speed up there is in filesystems if one avoids AOS3 style app task <-> filesystem task switches.

Also the version history in the readme says "Use Forbid() instead of Disable() when reading task lists" which is wrong/bug.

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


@HansQuote:
Hans wrote:@all

Looking into this a bit more, it looks like the problem might be triggered when an interrupt from the VirtioGPU device comes in while the OS is still busy calling interrupt handlers for another device.


Another idea: maybe the OS assumes (and possibly did something to make sure) that interrupts don't nest (get called during another interrupt.), or maybe at least don't nest if it's the same interrupt (number). And this for whatever reason is ignored by Qemu.

Can anyone test on real hw if OS prevents nesting interrupts (maybe just if interrupt number is the same).

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


@HansQuote:
Hans wrote:@balaton

[quote]
It doesn't help that the ISR is cleared by reading.


It seems you can use gdb with qemu. Then if you know (print it out) address of ISR it may be possible to do something like this in gdb:

watch *(int *)0x12345678
commands 1
printf 
"ISR is now %d\n", *(int *)0x12345678
cont
end

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


@HansQuote:
Hans wrote:@all

hangs


Is everything completely frozen or are other things (not using/affected directly or indirectly by gfx) still running in the system? Like if you create a (high pri) task that every second kprintfs something to debug output.

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


Quote:
joerg wrote:@Georg

That way broken software removing a node more than once from a list doesn't cause "funny things" like on AROS and AmigaOS <= 3.9, on AmigaOS 4.x you get a DSI exception with stack trace for such bugs instead.


It would surprise me if everthing (like in speed critical kernel/Exec parts) in AOS4 always goes through official functions (like Remove()) - if there is one - and never does something by hand or through macros to avoid function call like even oldest 68k Exec used to do. Really??

Go to top


Re: Qemu Pegasos II interrupts issue
Just popping in
Just popping in


@balatonQuote:
balaton wrote:@Hans
You said interrupt handlers are in a list so you could add a handler as the first element that always returns false but logs the interrupt, then you should see when the handler chain is called. If your driver later in the chain does not get an interrupt but you get the log from the first handler in the chain then something inbetween swallows the interrupt.


Another maybe obscure sounding possibilty would be for the interrupt handler to somehow disappear from the system's interrupt handlers list. Sounds crazy and depends on how AOS4 does certains things internally, but I remember early AROS native x86 versions having had a bug where Exec tasks (not interrupts) could misteriously disappear from the System (no longer in any of taskready list, or taskwait list, or thistask).

If OS4 during interrupt handling were to temporary modify interrupt handler list (maybe temporarily remove handler it is going to call) and there's something unexpected happening (nested interrupts?) which it maybe thinks it has protected against, then things like Remove() on an already removed list node can cause funny things (like in AROS disappearing tasks).

I'm no AOS4 coder, but to debug this maybe periodically check the system interrupt handler list to see if your interrupt node is still in there. If you don't know the system interrupt handler list address (maybe in AOS4 it's hidden, not in ExecBase) it should be possible to find it out like this:

Disable();
node = (struct Node *)interrupt;
while(
node->ln_Pred)
{
node node->ln_Pred;
}
list = (
struct List *)node;
Enable();

Go to top


Re: Sam460 which file systems can be used
Just popping in
Just popping in


@smarkusg

Quote:

I may not agree with it, but you are the creator of SFS and owner of the rights to it, and I accept what you wrote and your decision.


SFS was originally created by John Hendrikx. He released source codes and other people used this sources to bug-fix/enhance/port (ie. also other systems like MOS or AROS, not just AOS4) it. Everyone can theoretically (re)make/build/port their own SFS if interested, but I don't know how much time it would take or how well the fixes/changes/enhancements in AOS4 SFS are documented if there are "critical" ones, you don't want to loose.

Go to top


Re: Sam460 which file systems can be used
Just popping in
Just popping in


Instead of a (qemu emulated) harddisk did anyone try whether SFS also fails when creating/mounting inside AOS4 a SFS disk image file (using something like "diskimage.device"). Maybe SFS doesn't like something about qemu emulated harddisks.

Go to top


Re: AmigaOS 4 68K emulation options
Just popping in
Just popping in


@LiveForItQuote:
LiveForIt wrote:@Georg

do you have any actual example of anything that breaking because of this?


Simpliest example is probably WFLG_RMBTRAP. Docs say:

---
WFLG_RMBTRAP is an exception to most fields in Intuition structures
because it is legal for an application to directly modify this flag.
Note that this change must take place as an atomic operation so that
Exec cannot perform a task switch in the middle of the change. If you
are unsure your compiler will do this, use a Forbid()/Permit() pair to
prevent a task switch.
---

Most 68k programs using this will have a "or.l" and "and.l" or "bset/bclr" in the exe to change this flag, as this is atomic on 68k. On AOS4 and MOS in their 68k emus this almost certainly will not be executed/emulated atomically. So it can misbehave.

For example theoretically what can happen is that Intuition (input.device task) in the middle of the program's operation gets activated (preempts program task) and wants to change some flag in the window itself, like WFLG_WINDOWACTIVE. This change may be lost/swallowed:

68k program:

"or.l WFLG_RMBTRAP,win_Flags(a0)"

but emulated as ~:

"load win_Flags, ppcreg"
"or WFLG_RMBTRAP, ppcreg"
"store ppcreg, win_flags"

If Intuition (input.device task) preempts program task between the "load" and "store", then later the store will overwrite/ignore/undo whatever Intuition changed in the flags.

Quote:

it should run just fine. Portable C/C++ code compiled for 680x0 programs written in C/C++ will depend on shampooers or forbid / permit, to make struct or data atomic.

Only stuff optimized with assembler you can have problems with atomic instructions. atomic Instructions do not work on a data structure only one address at the time.


Maybe older version of browsers/owb/webkit were different (and also relied less on multiple tasks/threads), but nowadays c++ programs (I'm no c++ coder) (and c++ standard libs/featuers!?) seems to be full of things using this std::atomic stuff or lockless algorithms or whatever. And I doubt that for Amiga ports of gcc this would internally all be handled using Forbid()/Permit().

Go to top


Re: AmigaOS 4 68K emulation options
Just popping in
Just popping in


@joergQuote:
joerg wrote:@eliyahu
Any "system friendly" software will work on any m68k CPU, on any gfx hardware (OCS/ECS/AGA/gfx card), no mater if real or emulated..


No, because (correct me if I'm wrong) for example 68k emu in AOS4 or MOS for speed reasons (correct me if I'm wrong) does not make sure that instructions that are atomic on real 68k hw, are atomic in emulation as well.

Software will work or appear to work because there's not many Amiga system friendly software that depends in a hard way (like crashing if atomic stuff not guaranteed) on atomic stuff.

But I guess for example a 68k compiled webkit based web browser would not run fine in AOS4 (or MOS) 68k emulation. Despite being "system friendly" software. Or would it?

Go to top


Re: gcc 9 and 10
Just popping in
Just popping in


Quote:

- elf.library can't load .text, .plt and .rodata into the same memory space at once because of the m68k cross calls in .rodata


I don't know much about this stuff and didn't read all the posts, but: could it if it knew there are no m68k cross calls in .rodata? So what about having some ~flag, ~symbol in the ELF file or the section or whatever which tells elf.library "hey, I know you normally can't do that, but in this case for this .rodata section you are allowed to put it in same memory space as .text and .plt"

Go to top


Re: What the fastest possible x64 emulation way of OS4 today ?
Just popping in
Just popping in


@joergQuote:

Additionally to per task CPU usage my "top" tool displays the number of IExec->SuperState() calls per second, i.e. the number of switches from user to supervisor mode and back per second, and if that's a very high number there is definitely a problem (maybe not QEmu related, can be kernel or AmigaOS hardware drivers, or even user software, related as well).


There are cases where you can get unexpected high number of taskswatches in AOS or clones even if there's no problem (assuming problem = bug). For example when multiple tasks use the same semaphore a lot (for example it could be some memory allocation lock, or something gfx related like OwnBlitter()) and ~"bad luck" causes the semaphore to get in what I call "pingpong state".

If you ran a test program like this:

for(;;)
{
   
ObtainSem sem;
   
dosomething();
   print 
taskname;
   
ReleaseSem sem;
}


two times, then you would expect/hope to get output like this:

task1
task1
task1
task1
...
task2
task2
task2
...


But sooner or later it will get into ping pong state and then output will be:

task1
task2
task1
task2
task1
task2
task1
task2
...


Happens when one of the test programs gets preempted while holding the semaphore. From then one at every ReleaseSem() time there will always be the other task in the wait queue causing switch to this other task (when thistask calls ObtainSem()).


Edited by Georg on 2023/7/26 14:13:39
Go to top


Re: What the fastest possible x64 emulation way of OS4 today ?
Just popping in
Just popping in


@Hypex

Quote:

...

01947B7C: 4E800020 blr
01947B80: 5469103B rlwinm. r9,r3,2,0,29
01947B84: 7C832378 mr r3,r4
01947B88: 4D820020 beqlr-
>01947B8C: 8069000C lwz r3,12(r9)
01947B90: 4E800020 blr
01947B94: 9421FFE0 stwu r1,-32(r1)
01947B98: 7CA802A6 mflr r5
01947B9C: 90A10024 stw r5,36(r1)
01947BA0: 93A10014 stw r29,20(r1)

...

It destroys r3 by moving r4 into it to test but r3 should have been 0x2963703B before that. And then this:



I don't really know or want to know PPC asm (it's sooo ugly), but after a bit of google'ing, maybe the "beqlr" tests r9, because the "." suffix causes the rlwinm to update condition registers while no such suffix used for "mr" instruction, so condition registers does not change after that.

Quote:

It falls through and executes the next instruction.


Theoretically you can't be sure that the execution sequence was as shown in the disassembly above (rlnwim -> mr -> beqlr -> lwz -> crash). Could also be (even if unlikely) a jump from somewhere else to 01947B8C (the lwz instruction).

Go to top


Re: What the fastest possible x64 emulation way of OS4 today ?
Just popping in
Just popping in


@joergQuote:
joerg wrote:@kas1e

... but it doesn't disable the interrupt ...


Maybe there's something wrong with irq disabling/enabling. Passthrough IRQs being ~"delivered"/~"executed" by qemu even when guest OS did do those things it does, to set the computer/hw/irq controller/whatever to a state which prevents that (on real hw).

Go to top



TopTop
(1) 2 3 4 5 »




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project