The better solution is to let gcc do the works for you I mean: put the function to optimise in separate file say "myfunction.c" then ask gcc to compile to asm with -S option you will obtain a myfunction.s that you can edit to optimise the code then build/link the project with this modified myfunction.s as source
I'm trying to rewrite a simple Mandebrot function from StormAsm to Gcc compatible one. it's still untested because I got a DSI error due missing prolog and epilog instructions sequence. In StormAsm I just inserted "prolog" and "epilog" at start and end of function and all worked very well, sadly under Gcc this syntax seems to be incompatible.
Does anyone can help me finding the right stack/register preservation instructions? T he function is hand optimized to be compatible with powerpc from 604e to G5 and all embedded cores too.
I'm not sure also on this line of code:
lfd %f8,0(Radius) #MaxDist = Radius
So I ask you if this is correct too.
######################################################################
# Written by Dino Papararo 15-Jan-2020
#
# FUNCTION
#
# MandelPPC -- perform Z = Z^n + C iteration.
#
# SYNOPSIS
#
# int MandelPPC (long Iterations,double Cre,double Cim)
#
#
# This function tests if a point belongs or not at mandelbrot's set
#
# Optimized for pipelines of PPC processors.
#
# r3:Iterations r4:Power r5:Temp
# f1:Cre f2:Cim f3:Zr f4:Zi f5:Tmp1/Zr2 f6:Tmp2/Zi2 f7:Dist f8:MaxDist
#######################################################################
When using bl you need to save and restore the link register or the blr won't return to correct point in the program (kas1e gets away without because of the call to exit which never returns).
Basically you need to do:
asm_func:
mflr %r0 # move lr to r0
stwu %r1,-16(%r1) # allocate stack frame
stw %r0,20(%r1) # save lr
lis %r3,.msg@ha #
la %r3,.msg@l(%r3) # printf("aaaa");
bl printf #
lwz %r0,20(%r1) # retrieve lr
addi %r1,%r1,16 # free stack frame
mtlr %r0 # move r0 to lr
blr
Since the function call is at the end and the stack is not used you could also get away with just:
asm_func:
lis %r3,.msg@ha #
la %r3,.msg@l(%r3) # printf("aaaa");
b printf #
Thanks a lot, your help was useful to let me look at right way.. I have just rewrote my function in a GCC compatible way wihout need to allocate and free aby stack space or sare any particular register, I just used only voatile registers and optimized scheduling to get some instructions cycles for free.
So I paste my results here, in case it can be useful as example to some other guy. There were some differncies between old StormAsm and GCC (AS) assembler, ie. in constant declarations and absence of proper automatic prolog and epilog routines, among other stuff too. It was also not simple to declare the 4.0 float constant due IEEE rapresentation.. The loop was expanded two times to gain speed and has been applied some code reorg to match best speed without nreak compatibility with all powerpc cpus. If domeone has some suggestion to offer is welcome. Next step is to make a Tabor specific version of FlashMandel based on SPE mah.. meanwhile I'll go with a little update. Now it's mostly compatible with MorphOS too via OS4EMU
####################################################################
# Written by Dino Papararo 15-Jan-2020
#
# FUNCTION
#
# MandelPPC -- perform Z = Z^n + C iteration.
#
# SYNOPSIS
#
# unsigned int MandelnPPC (long Iterations,double Cre,double Cim)
#
#
# This function tests if a point belongs or not at Mandelbrot's set
# Handmade optimized for PowerPC processors.#
#
# r3:Iterations r4:Power r5:Temp
# f1:Cre f2:Cim f3:Zr f4:Zi f5:Zr2 f6:Zi2 f7:Dist f8:MaxDist
####################################################################