Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
138 user(s) are online (68 user(s) are browsing Forums)

Members: 0
Guests: 138

more...

Headlines

 
  Register To Post  

PCC ASM memcpy ?
Not too shy to talk
Not too shy to talk


See User information
Hello

Does someone have a ready to use higly optimzed memcpy routine for PPC ?

Something that do memcpy (APTR src, APTR dst, ULONG size) with ASM registers in the most efficient way

Thanks

Go to top
Re: PCC ASM memcpy ?
Supreme Council
Supreme Council


See User information
The newlib memory copy functions are optimised for the host hardware as best they can be. You might squeeze a bit more, but they are well tried and tested.

Simon

Comments made in any post are personal opinion, and are in no-way representative of any commercial entity unless specifically stated as such.
----
http://codebench.co.uk
Go to top
Re: PCC ASM memcpy ?
Home away from home
Home away from home


See User information
@thellier


like Altivec memcpy routine.

https://github.com/cmassiot/vlc-broadc ... /modules/altivec/memcpy.c

I guess 64bit memcpy using doubles, might be fasts, if unrolled by 4.

Is interlining integer operations with fpu ops faster on PPC?

(NutsAboutAmiga)

Basilisk II for AmigaOS4
AmigaInputAnywhere
Excalibur
and other tools and apps.
Go to top
Re: PCC ASM memcpy ?
Home away from home
Home away from home


See User information
@thellier

I have no ready-to-use routines to share, sorry.

If you're copying within RAM, then the exec.library's mem copy function should already be optimal for whatever machine you're on. Consequently, memcpy() should be too.

If you're transferring to/from VRAM, then WritePixelArray()/ReadPixelArray() are the best option (will even use DMA on platforms where DMA routines are available).

If you really want to embed it in your code, then things get complicated. On altivec machines, using altivec is best (in a cache-aligned manner), using doubles is optimal on most others, except for the e500 core (Tabor/A1222), which has a non-standard FPU. Then there are cache instructions that can help boost performance, but which one you should use depends on which CPU (e.g., dcbz on 32-bit CPUs, and dcbzl on 64-bit CPUs).

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: PCC ASM memcpy ?
Not too shy to talk
Not too shy to talk


See User information
Ok Thanks for answers

> 64bit memcpy using doubles
> The newlib memory copy functions
Will try both and test time

Do you know what is the theorical "write to VRAM" speed ?

> WritePixelArray [...] will even use DMA
Interesting
Can we imagine it can serve to copy ANY memory area from RAM to a location in VRAM if this destination pointer is encapsulated in a RastPort/bitmap with same pixel format ?











Go to top
Re: PCC ASM memcpy ?
Not too shy to talk
Not too shy to talk


See User information
@thellier

Did you notice low performance on memcpy or CopyMem? Does it concern a specific board? If so, which one?

In the past, I realized benchmarks on memory operations and functions written by hand are rarely optimal (or better on some models but worst on some others).

Go to top
Re: PCC ASM memcpy ?
Home away from home
Home away from home


See User information
@thellier

Quote:
Do you know what is the theorical "write to VRAM" speed ?

GfxBench2D will give you that speed for multiple methods (incl. WritePixelArray()).

Quote:
Can we imagine it can serve to copy ANY memory area from RAM to a location in VRAM if this destination pointer is encapsulated in a RastPort/bitmap with same pixel format ?

Not sure exactly what you're asking. WritePixelArray() writes to a rastport. If both the source and destination are bitmaps, then use BltBitMap(). You should make the one in RAM "userprivate," with the same bytes-per-row as the one in VRAM. See CompositeYUVBlitStream example here.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: PCC ASM memcpy ?
Not too shy to talk
Not too shy to talk


See User information

@corto
My concern was to write to a VBO in GPU VRAM the more efficient way
>Did you notice low performance
Not especially : I was just hoping it exists "the more efficient memcpy in ppc asm" so I can exclude that part of my program as cause for eventual slowlyness

In fact Clib/memcpy do that so I got my answer

@Hans
I was meaning something like that
Src is in RAM
Dst is a VBO in GPU VRAM
If i set up a (fake) rastport/bitmap that point to Dst
then can WritePixelArray copy the data with DMA ?
I mean if pixel formats are same in Src and Dst(array=RGBA,bitmap=RGBA) WritePixelArray should copy transparently with no changes to data, no ?

Just an old hacker idea...

Go to top
Re: PCC ASM memcpy ?
Home away from home
Home away from home


See User information
@thellier
Quote:
I was meaning something like that
Src is in RAM
Dst is a VBO in GPU VRAM
If i set up a (fake) rastport/bitmap that point to Dst
then can WritePixelArray copy the data with DMA ?

I doubt it, because you're actually copying to a shadow buffer in RAM. The driver then converts between big and little endian while copying the data into VRAM.

I'm not sure WritePixelArray() would use DMA for RAM=>RAM copies. Added to that, it's a pretty nasty hack that'll stop working the moment the graphics library's bitmap structure changes. Best treat the bitmap internals like a black box...

Using memcpy() should be fine. Theoretically that could be DMA accelerated too, but I'm not sure if that's done on any of our platforms.

Quote:
I mean if pixel formats are same in Src and Dst(array=RGBA,bitmap=RGBA) WritePixelArray should copy transparently with no changes to data, no ?

Yes, WritePixelArray() does a direct byte-for-byte copy when the formats match.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: PCC ASM memcpy ?
Not too shy to talk
Not too shy to talk


See User information
@thellier Using asm is not always (and even rarely) the solution. Before focusing no optimization, that's better to measure performance and then identify problems.

Go to top
Re: PCC ASM memcpy ?
Just can't stay away
Just can't stay away


See User information
@Hans

Quote:

Using memcpy() should be fine. Theoretically that could be DMA accelerated too, but I'm not sure if that's done on any of our platforms.


Both CopyMem() and CopyMemQuick() use DMA on Sam440 and Sam460 for larger memory copies, and since memcpy() is just wrapper for CopyMem() it does so too.

Go to top
Re: PCC ASM memcpy ?
Not too shy to talk
Not too shy to talk


See User information
@salass00
Thanks for the info

Go to top

  Register To Post

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project