@tommysammy Great! Good Work! Thanx for your effort. Just one question: Why is feanor paid in advance? Was/is this part of the deal? Did I miss something (written in this thread or on the fundraiser page?
edited: Just to be clear, there is no rush on my part, I've already sent tommysammy the receipt as promised, but I agree that it would appear as bad manners on my behalf to expect to get paid wholly for such a small project in advance (for a bigger/longer project, I would definitely request an advance payment of 20-30%, but this is too small/short a project for that).
In any case, expect progress updates here.
Edited by feanor on 2015/1/8 22:00:29 Edited by feanor on 2015/1/8 22:18:57
first I added -fno-inline in the compile flags to get inlined functions also appear in the perf reports, from the list, I found one that was at first trivial to optimize and wanted to do it just to see the impact -if any. Proved to be non-trivial, mostly because of alignment (if data was aligned it would be 3x faster, but it wasn't).
So, with the perf run:
$ sudo perf record -a ./ffmpeg_g -cpuflags altivec -benchmark -i Prometheus\ -\ Trailer.mp4 -f null /dev/null
...
Running time/fps didn't change, both took 121-125secs. To be honest, I didn't expect a big change as it doesn't get called so often, so nothing to get excited about just yet, but testing with perf I was able to measure the instruction from
0.65% ffmpeg_g ffmpeg_g [.] write16x4
to
0.50% ffmpeg_g ffmpeg_g [.] write16x4
This took a total of 4 hours so far (I'm excluding the initial setup/code traversal). I have some better candidates to work on, so a 2nd update will come soon.
Code tag takes the same space no matter how much code. So a post of 4 lines might look inefficient but a post of 400 will get scrollers and keep the main test readable (in theory).
Look forward to reading about to yoru further work.
Optimized some of the pred16x16 functions and some others, and I managed to get an extra fps (up to 14 on my G4@1Ghz) :) Total runtime from 121secs to ~114s.
These patches have not been committed yet, want to clean them up a bit first, but I double and triple checked them and they produce correct results. Will probably commit the remaining patches during the weekend.
Still have a lot of stuff to do left (and most likely I will do that past the bounty hours, just because I like coding Altivec :). For that matter, I've spent ~18hours so far.
In my todo list: AAC AC3 FLAC
and in general optimizations where I see room for.
First, I'll submit patches to ffmpeg (actually to libav as well, since they're similar projects and this codebase is more or less common). Afterwards, and I assume they will be accepted, maybe with minor fixes/modifications, they'll propagate to the respective projects that use them. In this particular case, the OS4 projects might want to pull the patches directly and not wait for upstream.
Therefore, after your commits, it will be up to MickJT (our ffmpeg expert) to rebuild it and to LiveForIt/Kas1e (our MPlayer experts) to rebuild MPlayer after, right ?
Optimized some of the pred16x16 functions and some others, and I managed to get an extra fps (up to 14 on my G4@1Ghz) :) Total runtime from 121secs to ~114s.
Nice. So, just under 6% speed up on your machine. Are you testing just the decoder with the display output disabled?