Home

Recent files OS4Depot

IRC Channel info

Discord invite link

In cooperation with

OS4Depot.net [Bugs]

Other

Statement of Intent

Terms of Service

Search the site

Login

Lost Password?

Register now!

Sections

Home

Forums

Articles

News

User Profile

Headlines

Images

Polls

Who's Online

77 user(s) are online (60 user(s) are browsing Forums)

Members: 2
Guests: 75

ncafferkey, kas1e, more...

Support us!

Headlines

baphomet.lha - utility/misc
Apr 1, 2026
acm.lha - audio/convert
Mar 30, 2026
amiarcadia.lha - emulation/gamesystem
Mar 29, 2026
scummvm.lha - game/misc
Mar 29, 2026
marstankattack.zip - game/action
Mar 28, 2026
audioclassid.lha - utility/misc
Mar 28, 2026
ticklish.lha - utility/misc
Mar 28, 2026
curaengine.lha - graphics/edit
Mar 27, 2026
pa6t_eth.lha - driver/network
Mar 26, 2026
sfputc.lha - utility/script
Mar 26, 2026

PCC ASM memcpy ?

	Bottom Previous Topic Next Topic
Register To Post

PCC ASM memcpy ?

Posted on: 2018/4/20 9:22 #1

Not too shy to talk

Not too shy to talk

Hello

Does someone have a ready to use higly optimzed memcpy routine for PPC ?

Something that do memcpy (APTR src, APTR dst, ULONG size) with ASM registers in the most efficient way

Thanks

Re: PCC ASM memcpy ?

Posted on: 2018/4/20 15:58 #2

Supreme Council

Supreme Council

The newlib memory copy functions are optimised for the host hardware as best they can be. You might squeeze a bit more, but they are well tried and tested.

Simon

Comments made in any post are personal opinion, and are in no-way representative of any commercial entity unless specifically stated as such.
----
http://codebench.co.uk

Re: PCC ASM memcpy ?

Posted on: 2018/4/20 18:32 #3

Home away from home

Home away from home

@thellier

like Altivec memcpy routine.

https://github.com/cmassiot/vlc-broadc ... /modules/altivec/memcpy.c

I guess 64bit memcpy using doubles, might be fasts, if unrolled by 4.

Is interlining integer operations with fpu ops faster on PPC?

(NutsAboutAmiga)

Basilisk II for AmigaOS4
AmigaInputAnywhere
Excalibur
and other tools and apps.

Re: PCC ASM memcpy ?

Posted on: 2018/4/23 1:33 #4

Home away from home

Home away from home

@thellier

I have no ready-to-use routines to share, sorry.

If you're copying within RAM, then the exec.library's mem copy function should already be optimal for whatever machine you're on. Consequently, memcpy() should be too.

If you're transferring to/from VRAM, then WritePixelArray()/ReadPixelArray() are the best option (will even use DMA on platforms where DMA routines are available).

If you really want to embed it in your code, then things get complicated. On altivec machines, using altivec is best (in a cache-aligned manner), using doubles is optimal on most others, except for the e500 core (Tabor/A1222), which has a non-standard FPU. Then there are cache instructions that can help boost performance, but which one you should use depends on which CPU (e.g., dcbz on 32-bit CPUs, and dcbzl on 64-bit CPUs).

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Re: PCC ASM memcpy ?

Posted on: 2018/4/23 8:35 #5

Not too shy to talk

Not too shy to talk

Ok Thanks for answers

> 64bit memcpy using doubles
> The newlib memory copy functions
Will try both and test time

Do you know what is the theorical "write to VRAM" speed ?

> WritePixelArray [...] will even use DMA
Interesting
Can we imagine it can serve to copy ANY memory area from RAM to a location in VRAM if this destination pointer is encapsulated in a RastPort/bitmap with same pixel format ?

Re: PCC ASM memcpy ?

Posted on: 2018/4/23 13:50 #6

Not too shy to talk

Not too shy to talk

@thellier

Did you notice low performance on memcpy or CopyMem? Does it concern a specific board? If so, which one?

In the past, I realized benchmarks on memory operations and functions written by hand are rarely optimal (or better on some models but worst on some others).

Re: PCC ASM memcpy ?

Posted on: 2018/4/24 3:57 #7

Home away from home

Home away from home

@thellier

Quote:

Do you know what is the theorical "write to VRAM" speed ?

GfxBench2D will give you that speed for multiple methods (incl. WritePixelArray()).

Quote:

Can we imagine it can serve to copy ANY memory area from RAM to a location in VRAM if this destination pointer is encapsulated in a RastPort/bitmap with same pixel format ?

Not sure exactly what you're asking. WritePixelArray() writes to a rastport. If both the source and destination are bitmaps, then use BltBitMap(). You should make the one in RAM "userprivate," with the same bytes-per-row as the one in VRAM. See CompositeYUVBlitStream example here.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Re: PCC ASM memcpy ?

Posted on: 2018/4/24 8:52 #8

Not too shy to talk

Not too shy to talk

@corto
My concern was to write to a VBO in GPU VRAM the more efficient way
>Did you notice low performance
Not especially : I was just hoping it exists "the more efficient memcpy in ppc asm" so I can exclude that part of my program as cause for eventual slowlyness

In fact Clib/memcpy do that so I got my answer

@Hans
I was meaning something like that
Src is in RAM
Dst is a VBO in GPU VRAM
If i set up a (fake) rastport/bitmap that point to Dst
then can WritePixelArray copy the data with DMA ?
I mean if pixel formats are same in Src and Dst(array=RGBA,bitmap=RGBA) WritePixelArray should copy transparently with no changes to data, no ?

Just an old hacker idea...

Re: PCC ASM memcpy ?

Posted on: 2018/4/25 0:31 #9

Home away from home

Home away from home

@thellier
Quote:

I was meaning something like that
Src is in RAM
Dst is a VBO in GPU VRAM
If i set up a (fake) rastport/bitmap that point to Dst
then can WritePixelArray copy the data with DMA ?

I doubt it, because you're actually copying to a shadow buffer in RAM. The driver then converts between big and little endian while copying the data into VRAM.

I'm not sure WritePixelArray() would use DMA for RAM=>RAM copies. Added to that, it's a pretty nasty hack that'll stop working the moment the graphics library's bitmap structure changes. Best treat the bitmap internals like a black box...

Using memcpy() should be fine. Theoretically that could be DMA accelerated too, but I'm not sure if that's done on any of our platforms.

Quote:

I mean if pixel formats are same in Src and Dst(array=RGBA,bitmap=RGBA) WritePixelArray should copy transparently with no changes to data, no ?

Yes, WritePixelArray() does a direct byte-for-byte copy when the formats match.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Re: PCC ASM memcpy ?

Posted on: 2018/4/26 8:07 #10

Not too shy to talk

Not too shy to talk

@thellier Using asm is not always (and even rarely) the solution. Before focusing no optimization, that's better to measure performance and then identify problems.

Re: PCC ASM memcpy ?

Posted on: 2018/4/26 9:57 #11

Just can't stay away

Just can't stay away

@Hans

Quote:

Using memcpy() should be fine. Theoretically that could be DMA accelerated too, but I'm not sure if that's done on any of our platforms.

Both CopyMem() and CopyMemQuick() use DMA on Sam440 and Sam460 for larger memory copies, and since memcpy() is just wrapper for CopyMem() it does so too.

Re: PCC ASM memcpy ?

Posted on: 2018/4/26 11:47 #12

Not too shy to talk

Not too shy to talk

@salass00
Thanks for the info

Register To Post
	Top Previous Topic Next Topic

Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )

Polls

Running AmigaOS 4 on?
AmigaOne SE/XE or microA1	12% (26)
Pegasos2	3% (8)
X5000	22% (48)
X1000	14% (30)
A1222	8% (19)
Sam 440/460	18% (40)
Classic PowerPC Amiga	2% (6)
WinUAE emulation	7% (16)
Qemu emulation	9% (21)

Total Votes: 214
The poll closed at 2025/12/1 12:00
8 Comments

Powered by XOOPS 2.0 © 2001-2024 The XOOPS Project