I've noticed that the speed of BltMaskBitMapRastPort() seems to depend on whether the mask has much transparent or opaque areas visible. This seems to imply the CPU is doing work that the graphics card should be doing
I'm assuming that the problem is how I'm allocating the mask bitmap. Can anyone tell me how I *should* be allocating the mask? At the moment I'm using AllocRaster() to allocate bitmap->plane[0], where the width & height are the same as the corresponding (main) bitmap.
I suppose it might be how I'm allocating the main bitmap. I'm using AllocBitMap() with BMF_DISPLAYABLE and BMF_MINPLANES, and made a friend with the window's bitmap.
I've noticed that the speed of BltMaskBitMapRastPort() seems to depend on whether the mask has much transparent or opaque areas visible. This seems to imply the CPU is doing work that the graphics card should be doing
Does it? By what logic?
Whichever is used to do the blit, copying data will always take longer than not copying data so I'd expect a difference whether GPU or CPU. (maybe I would be expecting wrong though )
I'm not sure about allocating the mask (the only examples I've got to hand are old bits of AWeb code which whilst they work may not be using best practice modern practice, they use AllocVec to allocate the mask)
But you should be allocating the bitmap as a friend bit map so that sounds right to me.
ChrisH wrote: I've noticed that the speed of BltMaskBitMapRastPort() seems to depend on whether the mask has much transparent or opaque areas visible. This seems to imply the CPU is doing work that the graphics card should be doing
BltMaskBitMapRastPort() is indeed software rendered. I suggest that you replace the mask with an alpha bitmap, and use CompositeTags().
@Hans Thanks for the explanation. It's a shame that BltMaskBitMapRastPort() doesn't (or can't?) have it's common case optimised to use the graphics card.
OK, I've (finally) looked into CompositeTagList(), and it seems relatively easy to use. The requirement for the 'mask' to be the alpha-channel in a bitmap seems to be why OS4 can't use CompositeTagList() to h/w accelerate BltMaskBitMapRastPort()...
... but I'm puzzled as to why CompositeTagList() isn't used to h/w accelerate BitMapScale(). At first glance it would appear trivial! (I'm going to have a go at doing that now.)
ChrisH wrote: ... but I'm puzzled as to why CompositeTagList() isn't used to h/w accelerate BitMapScale(). At first glance it would appear trivial! (I'm going to have a go at doing that now.)
It is trivial, and gives a massive speed boost. Same for BltBitMapTags with SrcAlpha. It seems a little ridiculous that these don't automatically use CompositeTags when available.
As a result I have to do something like: if gfx.lib >= 53 then compositetags else bitmapscale
@Chris I've written a simple wrapper (UNFINISHED - DO NOT USE!) for PortablE, but I have found that OFTEN (not always) the results are horizontally off-set by 1 pixel compared to the real BitMapScale():
PROC bitMapScale(bitScaleArgs:PTR TO bitscaleargs)
DEF result
result := CompositeTagList(COMPOSITE_SRC, bitScaleArgs.srcbitmap, bitScaleArgs.destbitmap, [
COMPTAG_SRCX, bitScaleArgs.srcx,
COMPTAG_SRCY, bitScaleArgs.srcy,
COMPTAG_SRCWIDTH, bitScaleArgs.srcwidth,
COMPTAG_SRCHEIGHT, bitScaleArgs.srcheight,
COMPTAG_DESTX, bitScaleArgs.destx,
COMPTAG_DESTY, bitScaleArgs.desty,
COMPTAG_DESTWIDTH, bitScaleArgs.destwidth,
COMPTAG_DESTHEIGHT,bitScaleArgs.destheight,
COMPTAG_OFFSETX, bitScaleArgs.destx,
COMPTAG_OFFSETY, bitScaleArgs.desty,
COMPTAG_SCALEX, bitScaleArgs.xdestfactor * COMP_FIX_ONE / bitScaleArgs.xsrcfactor,
COMPTAG_SCALEY, bitScaleArgs.ydestfactor * COMP_FIX_ONE / bitScaleArgs.ysrcfactor,
OAT / bitScaleArgs.ysrcfactor),
COMPTAG_FLAGS,COMPFLAG_IGNOREDESTALPHA,
TAG_END]:tagitem)
IF result <> COMPERR_SUCCESS THEN Throw("BUG", 'pAmigaGraphics; bitMapScale(); CompositeTagList() failed')
ENDPROC
The reason I can tell it is offset, is that I am scaling the bitmap using this (CompositeTagList) but the mask using the real BitMapScale (because CompositeTagList doesn't like scaling masks), and sometimes the masked area shows on one edge.
I am scaling by exactly *2, so differences in the scaling algorithm should not be an issue.
And even forcing the software implementation of CompositeTagList makes no difference! (Although as I am currently scaling non-displayable bitmaps, it would probably be using software anyway.)
Do you have any suggestions?
As an aside, I need to add a fall-back for when CompositeTagList() fails (say due to incompatible bitmap formats, e.g. when scaling masks).
@Chris COMPTAG_SCALEX & Y receive exactly 0x20000, and thus should scale exactly by *2 (as mentioned earlier).
BTW, I can "work around" the differences with BitMapScale() by getting CompositeTagList() to scale the masks as well, but that requires copying them to Video memory (which isn't super fast IIRC) and then copying the scaled result back from Video memory (which is horrendously slow). So this isn't a real solution :( .
Unless I can find a real fix, it is not possible to reliably emulate BitMapScale() using CompositeTagList(). Which may be why OS4 devs haven't already done it.
ChrisH wrote: @Chris BTW, I can "work around" the differences with BitMapScale() by getting CompositeTagList() to scale the masks as well, but that requires copying them to Video memory (which isn't super fast IIRC) and then copying the scaled result back from Video memory (which is horrendously slow). So this isn't a real solution :( .
I don't understand what you're trying to do. CompositeTags() can use an alpha mask that is the same size as the source bitmap (i.e., the source alpha mask, as opposed to the destination alpha mask), so why are you rescaling the alpha mask? Also, why do you need to copy the scaled alpha mask back to main memory? That implies that you're doing more software rendering.
@Hans I was simply trying to emulate BitMapScale(), rather than rewrite my program to use an (8-bit) Alpha channel instead a 1-bit mask! (That kind of rewrite was supposed to come later, after I got more comfortable with CompositeTagList().)
As far as I can tell, 1-bit masks must be stored in non-video ram, because otherwise the result of BltMaskBitMapRastPort() tends to be garbage. (This makes some kind of sense when the CPU is doing the masking.)
BTW, do you know if CompositeTagList()'s COMPTAG_SrcAlphaMask tag will accept an 8-bit bitmap (as a valid Alpha channel), rather than requiring a whole 32-bit bitmap (where 24-bits will be ignored)? (That would greatly simplify my switch from 1-bit masks to 8-bit alpha channels.)
BTW, do you know if CompositeTagList()'s COMPTAG_SrcAlphaMask tag will accept an 8-bit bitmap (as a valid Alpha channel), rather than requiring a whole 32-bit bitmap (where 24-bits will be ignored)? (That would greatly simplify my switch from 1-bit masks to 8-bit alpha channels.)
If the bitmap has pixel format RGBFB_ALPHA8 then I would assume that it does.
BTW, do you know if CompositeTagList()'s COMPTAG_SrcAlphaMask tag will accept an 8-bit bitmap (as a valid Alpha channel), rather than requiring a whole 32-bit bitmap (where 24-bits will be ignored)? (That would greatly simplify my switch from 1-bit masks to 8-bit alpha channels.)
If the bitmap has pixel format RGBFB_ALPHA8 then I would assume that it does.
RGBFB_CLUT should work too, but RGBFB_ALPHA8 is the correct pixel format for alpha masks.
@Hans & salass00 Thanks for the info... but in the end, for the sake of compatibility with uh "other OSes", I've decided to keep the alpha channel as part of the bitmap. I guess it might be slightly faster as well!