Any altivec experts? (H.264 codec)

	Bottom Previous Topic Next Topic
Register To Post

« 1 ... 4 5 6 (7) 8 9 10 ... 36 »

Anonymous

Re: Any altivec experts? (H.264 codec)

#121

@Elwood

Due to Hans being the developer of the RadeonHD driver and surely having access to a beta system i guess he knows that the new driver and FE won't speed up anything.

Hence he brought up this thread

Severin

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/11/30 13:47 #122

Just can't stay away

@ddni

Quote:

I have seen the prometheus trailer being used as a baseline. How far on average is that from being perfect? This difference from baseline to perfect could be the stated improvement required for a bounty to be deemed successful.

The prometheus trailer IS the baseline, it displays at 23.976 fps, for most films you need 25 fps or even 30 fps
plus these tests are without sound so you need to add another couple of fps to allow for audio decoding plus another one for window mode.


Fullscreen with sound:

BENCHMARKs: VC:  66.275s VO:   4.771s A:   2.382s Sys:   5.969s =   79.396s

BENCHMARK%: VC: 83.4735% VO:  6.0090% A:  2.9996% Sys:  7.5179% = 100.0000%

Internal COMP YUV FPS 21



Window with sound:

BENCHMARKs: VC:  68.237s VO:   5.277s A:   1.265s Sys:   6.178s =   80.956s

BENCHMARK%: VC: 84.2887% VO:  6.5184% A:  1.5620% Sys:  7.6309% = 100.0000%

Internal COMP YUV FPS 20

@Elwood

My test results are done with FE and RandeonHD 2.4

Amiga user since 1985
AOS4, A-EON, IBrowse & Alinea Betatester

Ps. I hate the new amigans website. <shudder>

Hans

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/11/30 19:52 #123

Home away from home

@Elwood
Quote:

What about waiting the release of Final Edition and see how FE + RadeonHD 2.4 perform together before taking a decision for this bounty?
Or is it a known fact that it won't increase the video framerates and it's only decoding that needs improvement?

As Severin said, his results are with FE + RadeonHD 2.4, and yes, it most certainly ddoes make a difference.

Faster decoding would improve performance even more. Hence this thread...

@Raziel
Quote:

Due to Hans being the developer of the RadeonHD driver and surely having access to a beta system i guess he knows that the new driver and FE won't speed up anything.

As I said above, the FE + new driver most certainly does speed things up. That was the whole point of implementing composited video!

@ddni
Quote:

@Hans

Thanks.

Yes, I am aware of the benchmark. What I meant was how much of an improvement is expected / acceptable to justify the expense?

I have seen the prometheus trailer being used as a baseline. How far on average is that from being perfect? This difference from baseline to perfect could be the stated improvement required for a bounty to be deemed successful.

Severin's tests (with FE + RadeonHD 2.4) show that it runs at 18 fps with the loopfilter enabled, which would need a 33%+ improvement to hit the needed 24 fps. At this stage we have no idea how much improvement we could expect if the missing altivec code were added, so it would be unrealistic to expect/demand a 33% improvement. Don't forget that the H.264 codec in ffmpeg is already partially altivec optimized. You can't expect feanor to do all the work and have his payment contingent to something he can't control.

Severin also tested with the loop filter disabled (which is faster at the expense of quality), and got 22 fps. That would need a 10%+ improvment, but I still couldn't give you a realistic estimate for how much of an improvement we could expect.

It really is a bit of a gamble. The added altivec code will make a difference, but we have no idea how much.

Hans

NOTE: For reference, here's feanor's quote, with details of the work.

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

logicalheart

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/11/30 20:14 #124

Just popping in

Who will own the resulting optimized code?

Sam460 : X1000 : X5000

Hans

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/11/30 20:23 #125

Home away from home

@logicalheart
Quote:

Who will own the resulting optimized code?

Ffmpeg is GPL, so the code will be open-source & GPL too. So, all PowerPC platforms with altivec will be able to benefit from this (incl. MorphOS).

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

corto

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/11/30 23:32 #126

Not too shy to talk

Compared to other architectures, the ffmpeg project misses Altivec specific code, what will improve performance. I will donate if a bounty is created, to obtain such code but also to read how the work has been done (I would like to learn about the approach in this kind of work).

But are we sure this is the main bottleneck in H264 decoding?

I did some tests, not with mplayer but with ffmpeg directly, because we need to focus on what we want to measure (video decoding). Keeping the mplayer layer will make things heavier and more complicated.

When talking about optimization, we must have a benchmark that is:
- measurable
- reproducible
- configurable (easy to set/unset AltiVec)

I extracted a 20-second sequence of the Prometheus video and I only measured the decoding (that takes 100% of the CPU), on my MacMini under Debian 7.

With Altivec:
Quote:

./ffmpeg_g -benchmark -i ~/Videos/Prometheus-1080p-20s.mp4 -f null /dev/null
frame= 402 fps= 17 q=0.0 Lsize=N/A time=00:00:20.18 bitrate=N/A
video:25kB audio:3444kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=21.728s
bench: maxrss=28596kB

Without Altivec:
Quote:

./ffmpeg_g -cpuflags 0 -benchmark -i ~/Videos/Prometheus-1080p-20s.mp4 -f null /dev/null
frame= 402 fps= 13 q=0.0 Lsize=N/A time=00:00:20.18 bitrate=N/A
video:25kB audio:3444kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=29.008s
bench: maxrss=28668kB

So the first conclusion is with Altivec, the video takes almost 22 seconds to be decoded, compared to 29 seconds (a 24% speedup).
Note that ffmpeg has an option to set/unset the use of Altivec!

Then, I used the Linux perf tool in various ways.

If we want to control improvements, we have to set references and describe a process. I don't mean a complex protocol.

Results here show statistically where the time is spent (again, with and without Altivec enabled):

Quote:

sudo perf record -a ./ffmpeg_g -cpuflags altivec -benchmark -i ~/Videos/Prometheus-1080p-20s.mp4 -f null /dev/null
sudo perf report --stdio

9.29% ffmpeg_g ffmpeg_g [.] ff_h264_decode_mb_cabac
7.62% ffmpeg_g ffmpeg_g [.] put_h264_chroma_mc8_altivec
6.59% ffmpeg_g ffmpeg_g [.] get_cabac
5.83% ffmpeg_g ffmpeg_g [.] hl_decode_mb_simple_8
3.94% ffmpeg_g ffmpeg_g [.] loop_filter
3.89% ffmpeg_g ffmpeg_g [.] ff_put_pixels16_altivec
3.74% ffmpeg_g ffmpeg_g [.] fill_decode_caches

sudo perf record -a ./ffmpeg_g -cpuflags 0 -benchmark -i ~/Videos/Prometheus-1080p-20s.mp4 -f null /dev/null
sudo perf report --stdio

7.16% ffmpeg_g ffmpeg_g [.] ff_h264_decode_mb_cabac
6.89% ffmpeg_g ffmpeg_g [.] put_h264_qpel8_h_lowpass_8
6.13% ffmpeg_g ffmpeg_g [.] put_h264_chroma_mc8_8_c
4.73% ffmpeg_g ffmpeg_g [.] get_cabac
4.70% ffmpeg_g ffmpeg_g [.] put_h264_qpel8_hv_lowpass_8
4.65% ffmpeg_g ffmpeg_g [.] hl_decode_mb_simple_8
4.52% ffmpeg_g ffmpeg_g [.] put_h264_qpel8_v_lowpass_8
3.98% ffmpeg_g ffmpeg_g [.] put_h264_qpel16_mc00_8_c
3.42% ffmpeg_g ffmpeg_g [.] avg_h264_chroma_mc8_8_c
3.13% ffmpeg_g ffmpeg_g [.] loop_filter

That gives an idea of the most time consuming functions.

Then, I ran "perf stat" to get an overview and it reports 18% of branch misses, what seems to be high!

Finally I measured changing the perf event:

Quote:

sudo perf record -a -e instructions ./ffmpeg_g -cpuflags altivec -benchmark -i ~/Videos/Prometheus-1080p-20s.mp4 -f null /dev/null

# Events: 22K instructions
#
# Overhead Command Shared Object Symbol
# ........ ............. ........................... ..................................................
#
12.79% ffmpeg_g ffmpeg_g [.] get_cabac
7.35% ffmpeg_g ffmpeg_g [.] ff_h264_decode_mb_cabac
4.31% ffmpeg_g ffmpeg_g [.] hl_decode_mb_simple_8
4.29% ffmpeg_g ffmpeg_g [.] decode_cabac_residual_nondc_internal
4.17% ffmpeg_g ffmpeg_g [.] loop_filter
3.93% ffmpeg_g ffmpeg_g [.] get_cabac_noinline
3.89% ffmpeg_g ffmpeg_g [.] put_h264_qpel8_h_lowpass_8
3.85% ffmpeg_g ffmpeg_g [.] fill_decode_caches
3.62% ffmpeg_g ffmpeg_g [.] put_h264_chroma_mc8_altivec
3.52% ffmpeg_g ffmpeg_g [.] ff_h264_filter_mb

# Events: 22K branch-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................... ..............................................
#
22.30% ffmpeg_g ffmpeg_g [.] ff_h264_decode_mb_cabac
13.48% ffmpeg_g ffmpeg_g [.] decode_cabac_residual_nondc_internal
11.27% ffmpeg_g ffmpeg_g [.] ff_h264_filter_mb
7.56% ffmpeg_g ffmpeg_g [.] fill_decode_caches
4.10% ffmpeg_g ffmpeg_g [.] get_cabac
3.69% ffmpeg_g ffmpeg_g [.] hl_decode_mb_simple_8
3.17% ffmpeg_g ffmpeg_g [.] h264_idct_add8_altivec

gregthecanuck

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/1 3:44 #127

Just popping in

@corto

Nice work I was just going to suggest running the decoding through a code profiler!

Edited by gregthecanuck on 2014/12/1 15:39:15

Hans

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/1 6:04 #128

Home away from home

@corto

Nice work! That's certainly useful data.

How detailed are the perf tool's reports? I assume that ff_h264_decode_mb_cabac calls lots of other functions. Can it list how much it spends in each? Perhaps a video that's decoded in 22-29 seconds is too short to build up enough statistical data for more detail.

It would be great if we could get similar profiler reports for ffmpeg on an x86. If it's detailed enough then we could gain insights into which functions are SSE optimised, and how much of a difference they make.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

corto

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/1 6:58 #129

Not too shy to talk

@gregthecanuck, @Hans

Thanks! At the moment, results give a subset of functions that are time consuming (to optimize):
ff_h264_decode_mb_cabac
get_cabac
hl_decode_mb_simple_8
loop_filter
fill_decode_caches

To check which could be Altivec optimized.

About the performance monitor, note that there are specific counters for Altivec, that could be used to find specific Altivec problems, notably if an vector unit "is waiting for an operand".

Other events could also be checked, like L2 cache misses, DTLB misses, ...

I looked at get_cabac functions, not sure that could be Altivec optimized but they have asm version in ARM and x86. It rather looks like there is register pressure there, and that these asm codes tend to become branchless.

@Hans Not sure perf is able to list time spent in each subfunction but it can provide a callgraph.
You're right, the chosen duration is maybe too short for accurate results but it is ok to find functions that cause penalty.

@feanor What machines do you own?

I've started to look the generated code of get_cabac ...

Hans

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/1 7:16 #130

Home away from home

@corto

Quote:

You're right, the chosen duration is maybe too short for accurate results but it is ok to find functions that cause penalty.

Okay. If you look for longer videos, do realize that CABAC is one of many options in H.264. Not all H.264 videos use it. So, you'll have to look for videos that were encoded with similar settings. I'm not sure what you could use to check the encoding settings, though.

Interestingly, the wikipedia article about CABAC says that it's hard to parallelize and vectorize.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

K-L

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/1 8:55 #131

Just can't stay away

If I can just give my little contribution to this thread : http://www.amiga-ng.org/resources/OptimizeforAltivec.pdf

Anyway, I also found that G4 AltiVec and G5/PA6-T AltiVec implemntations are slightly different.

When I wrote my X1000 article for Amiga Power, I found these results :

DnetC benchmarks :

OGR-NG with G4 1.26 Ghz :

OGR-NG: using core #1 (KOGE 3.1 Hybrid).
OGR-NG: Benchmark for core #1 (KOGE 3.1 Hybrid)
0.00:00:17.25 [26,837,018 nodes/sec]

OGR-NG with PA6T 1.8 Ghz :

OGR-NG: using core #1 (KOGE 3.1 Hybrid).
OGR-NG: Benchmark for core #1 (KOGE 3.1 Hybrid)
0.00:00:16.24 [22,526,735 nodes/sec]

RC5-72 with G4 1.26 Ghz :

RC5-72: using core #4 (KKS 7450).
RC5-72: Benchmark for core #4 (KKS 7450)
0.00:00:16.08 [13,142,653 keys/sec]

RC5-72 with PA6T 1.8 Ghz :

RC5-72: using core #4 (KKS 7450).
RC5-72: Benchmark for core #4 (KKS 7450)
0.00:00:16.80 [9,770,362 keys/sec]

FFMpeg benchmarks :

MP4 Video from 640x360 to MPG 720x576 30FPS:

*****************
AmigaOne XE G4 :
*****************

FFmpeg Altivec : 11mn56 (14FPS)
FFmpeg Non Altivec : 17mn12 (10 FPS)

*****************
AmigaONE X1000
*****************

FFmpeg Altivec : 6mn58 (25FPS)
FFmpeg Non Altivec : 14mn06 (12FPS)

For DnetC, G4 AltiVec is fatser than the PA6T (maybe because Oliver has never updated the client to match the G5 version).

For FFmpeg which is more recent, PA6T is really faster than the G4.

For G4/G5 AltiVec differences : http://www.powerdeveloper.org/forums/viewtopic.php?t=839

--
AmigaONE X1000 and Radeon RX 560
Sam460 and Radeon RX 560
MiST
FPGA Replay + 060 DB

feanor

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/1 10:00 #132

Just popping in

@corto

Quote:

@feanor
What machines do you own?

I have a Powerbook 12" G4@1Ghz, and both a ppc64 and a ppc64le VM -that are VSX enabled.

@Hans

I don't think it's a good idea for me to handle the bounty personally.

Edited by feanor on 2014/12/1 10:58:15

Hans

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/1 19:16 #133

Home away from home

@feanor

Quote:

I don't think it's a good idea for me to handle the bounty personally.

Kickstarter isn't a bounty system. It's more of a micro-investment system, where people can invest in a project (i.e., you get the money once it has been funded, and not after the project is done).

I only suggested it because we're not getting anywhere with setting up a bounty. There are disagreements over which bounty website to use, and some want the bounty to come with minimum performance guarantees that you simply can't provide.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

tommysammy

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/2 5:11 #134

Quite a regular

Kickstarter or Bounty? I will create next week a bounty. There is still time for reflections

Amiga600/Vampire2/PrismaMegaMix/32GB CF Card/2x Rys Mk2/A604n/IndivisionECS/Gotek

feanor

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/2 9:36 #135

Just popping in

@Hans

I know, I've created one in the past, the thing is that this case is not a full blown vectorization effort to justify a kickstarter project, that is, we're not really vectorizing a full codec (like x265 for example), but only adding a few optimizations where needed to get the extra % of performance.

Also, with a kickstarter there is always a risk of never gathering the necessary funds, but also not receiving them. I think a bounty is safer as you can always request your money back if the bounty fails (at least I've done it before). As a suggestion, here is a possible workflow on bountysource:

1) forking ffmpeg on github
2) attaching it to bountysource, by installing the relevant plugin on the github project (one can login to bountysource by their github account)
3) create the particular tickets on github (they appear automatically on bountysource)
4) I can state that I will work on this particular (or more) tickets for the requested amount
5) people can donate to this particular tickets
6) when done, I can post the patch inside the ticket itself or even do a pull request from my own tree and claim the bounty
7) people can review it and when happy, they can accept the claim

that's it. I've already done this before on bountysource (I've done the VSX port of Eigen, which was coincidentally an IBM bounty on bountysource, and I was already 70% done when I discovered that so it was easy for me :)

Hans

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/3 7:18 #136

Home away from home

@feanor

I'd never heard of bountysource before, but it sounds like a good system. Whoever creates the bounty should bear in mind the 10% withdrawal fee, though.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

feanor

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/3 7:28 #137

Just popping in

@Hans

Yeah I know, got bitten by that already, I still think it's worth it though and intend to add at least one of my own projects there in the future.

Edited by feanor on 2014/12/3 8:33:36

Tuxedo

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/5 13:59 #138

Not too shy to talk

Really interesting topic!

I'll can made a little donation if needed!

Since we speek about mplayer why not enlarge a bit the bounty, if possible, to get an extended altivec optimization where possible on whole code?

And maybe feanor can be interested to get a full altivec AmigaOS4 (maybe an x1000)capable machine in change of its work if A-Eon itself will made a discount to encourage it?

Dunno how much money was needed for that but maybe we can get a new developer in that way, and what a developer! :D

Simone"Tuxedo"Monsignori, Perugia, ITALY.

LiveForIt

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/5 14:23 #139

Home away from home

@feanor

Sounds like good bounty system, I will keep this in mind next time someone asks me to do something; at least it makes sense on open source projects.

(NutsAboutAmiga)

Basilisk II for AmigaOS4
AmigaInputAnywhere
Excalibur
and other tools and apps.

feanor

Re: Any altivec experts? (H.264 codec)

Posted on: 2014/12/5 15:59 #140

Just popping in

@Tuxedo

Thank you for your kind words, I really appreciate it, but I have to be modest. I consider myself a decent developer, with enough skills to cope with many tasks (not all, and kernel/firmware stuff is where I draw the line, I just don't have the necessary experience), maybe my 'advantage' is that I *really* like SIMD stuff in general to have devoted literally thousands of hours in this. Now with regards to getting a new Amiga system in exchange for my services, as much as I would like that, I'm pretty sure my wife would object to that as that doesn't really help to feed my family, as I'm sure many of you already know. :)

I have already too many projects I'm working for free, like Debian/armhf, Eigen(where I maintain both NEON and Altivec/VSX ports), working on a new portable SIMD library -yes, I know there are many, but I'm working on some features not found in others, at least I hope it works that way- and writing a SIMD book (that of course covers Altivec), and have a day job on the side (that unfortunately does not include SIMD in the slightest). If I'm going to justify allocating yet more hours out of my already pressed schedule, then I'll have to make sure that it helps my family, otherwise I'm just not going to do it.

Regarding mplayer itself, I think it's prudent to take it in small steps. Let's see if/how this works out, if it satisfies the requirements people have set, if there is more room for improvement, etc. Usually, there is *always* more room for more optimizations, but it has to justify the means.

Register To Post	« 1 ... 4 5 6 (7) 8 9 10 ... 36 »
	Top Previous Topic Next Topic

Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )