Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
47 user(s) are online (26 user(s) are browsing Forums)

Members: 1
Guests: 46

VooDoo, more...

Headlines

 
  Register To Post

Profiling on amigaos4
Home away from home
Home away from home


See User information
To not deral apitrace-glsnoop topic, let's continue there.

So, questions is : what we have today to do normal profiling of our apps ?

There was "hieronymus" on os4depot. It wasn't feature rich, but it was a start. Through it wasn't working on many machines, and work only on AmigaOne (G3/G4), Sam440 and Sam460. Also started from os4 FE, it didn't works Sam460ex. As it also no more in develop we probably can forget about for now.

Another one, which was of big hope for me, its gcc's "-pg" which create profiling stat which can be then read by "gprof". But sadly, our gcc binaries which compiled with -pg , simple crashed. I ask SBA about, he say that never worry about and never looked at , so it can be easy bug to fix, or some thing not implemented, but whatever is currently it didn't works. For sure, it can be good idea to have a look at this, maybe it will be not _that_ hard to fix.

More of it we have "gprof" ported already, so something seems wrong with gcc part when we build it with -pg , and that should be fixable for sure.



Also, lately Capehill create a simple CPU profiler, called "tequila" : https://github.com/capehill/tequila. That one i tried to use now to profile some game, and that what i have:


Processing symbol data (max addresses 2970)...
297/2970
594
/2970
891
/2970
1188
/2970
1485
/2970
1782
/2970
2079
/2970
2376
/2970
2673
/2970
Found 14 unique 
and 2898 non-zero symbols
Sorting symbols
...

  
Sample %      Count                                  Symbol name (module + function)
     
63.66       1845                                                   Not available 
     14.35        416                                                    kernel 
(null)
     
12.87        373                                       newlib.library.kmod (null)
      
3.80        110                                             RadeonHD.chip (null)
      
3.69        107                                                  SetPatch (null)
      
0.45         13                                     graphics.library.kmod (null)
      
0.28          8                        CURRDIR:PCI_SoundCardHornet.driver (null)
      
0.28          8                                              NGFileSystem (null)
      
0.21          6                           LIBS:Warp3DNova/W3DN_SI.library (null)
      
0.14          4                                       LIBS:locale.library (null)
      
0.10          3                                         timer.device.kmod (null)
      
0.10          3                                       LIBS:ogles2.library (null)
      
0.03          1                                                   AmiDock (null)
      
0.03          1                           CLASSES:gadgets/clicktab.gadget (null)

Sorted by module:

  
Sample %      Count                                               Function in module 'Not available'
     
63.66       1845                                                                 

  Sample 
%      Count                                               Function in module 'kernel'
     
14.35        416                                                           (null)

  
Sample %      Count                                               Function in module 'newlib.library.kmod'
     
12.87        373                                                           (null)

  
Sample %      Count                                               Function in module 'RadeonHD.chip'
      
3.80        110                                                           (null)

  
Sample %      Count                                               Function in module 'SetPatch'
      
3.69        107                                                           (null)

  
Sample %      Count                                               Function in module 'graphics.library.kmod'
      
0.45         13                                                           (null)

  
Sample %      Count                                               Function in module 'CURRDIR:PCI_SoundCardHornet.driver'
      
0.28          8                                                           (null)

  
Sample %      Count                                               Function in module 'NGFileSystem'
      
0.28          8                                                           (null)

  
Sample %      Count                                               Function in module 'LIBS:Warp3DNova/W3DN_SI.library'
      
0.21          6                                                           (null)

  
Sample %      Count                                               Function in module 'LIBS:locale.library'
      
0.14          4                                                           (null)

  
Sample %      Count                                               Function in module 'timer.device.kmod'
      
0.10          3                                                           (null)

  
Sample %      Count                                               Function in module 'LIBS:ogles2.library'
      
0.10          3                                                           (null)

  
Sample %      Count                                               Function in module 'AmiDock'
      
0.03          1                                                           (null)

  
Sample %      Count                                               Function in module 'CLASSES:gadgets/clicktab.gadget'
      
0.03          1                                                           (null)


I also tried to build that binary with -gstabs (in hope that those "Not available" symbols will showups, but nope).

At least, it for sure show that 15% spend by kernel and 12% by newlib.library, while another massive time waster are unknown.

But it works as can be seen (tested that on x5000).







Edited by kas1e on 2019/12/15 19:25:21
Go to top
Re: Profiling on amigaos4
Just can't stay away
Just can't stay away


See User information
@kas1e

I'm not sure how reliable is Tequila's way to fetch the instruction pointer of the current task (it seems to work in some cases).

But in many other real world cases the symbol lookup fails and it's puzzling. If the symbol was valid, was it removed later or why it cannot be looked up. I would have expected at the least the "module" part to show up.

Go to top
Re: Profiling on amigaos4
Home away from home
Home away from home


See User information
@Capehill
That profiling was about loading part (which i trying to understand why it take 40-50 seconds vs 5sec on win32).


But then, i tried to profile actual gameplay of the game via glsnoop. And what i found, that during plaing, while my CPU hit 100%, warp3dnova and ogles2 take only 40% of it. For example ogles2's drawelements() take most of time : 32% (that include warp3dnovas drawlements (10%), bufferunlock(9%), sumbit() and other stuff).

That mean that another 60% is taken up by CPU itself. I also do run "tequalia" when i actually play, and this time it show me that:

Processing symbol data (max addresses 2970)...
297/2970
594
/2970
891
/2970
1188
/2970
1485
/2970
1782
/2970
2079
/2970
Found 14 unique 
and 2049 non-zero symbols
Sorting symbols
...

  
Sample %      Count                                  Symbol name (module + function)
     
63.59       1303                                                   Not available 
     15.91        326                                     graphics
.library.kmod (null)
     
13.23        271                                                    kernel (null)
      
3.27         67                                             RadeonHD.chip (null)
      
1.51         31                           LIBS:Warp3DNova/W3DN_SI.library (null)
      
1.17         24                                       LIBS:ogles2.library (null)
      
0.54         11                                                   Tequila (null)
      
0.39          8                                       newlib.library.kmod (null)
      
0.10          2                                    intuition.library.kmod (null)
      
0.10          2                                                   AmiDock (null)
      
0.05          1   System:Utilities/Dockies/topbar/RamDock/RAMDock.docky FONTNAME
      0.05          1                                             RadeonRX
.chip (null)
      
0.05          1                                          dos.library.kmod (null)
      
0.05          1                                         massstorage.usbfd (null)

Sorted by module:

  
Sample %      Count                                               Function in module 'Not available'
     
63.59       1303                                                                 

  Sample 
%      Count                                               Function in module 'graphics.library.kmod'
     
15.91        326                                                           (null)

  
Sample %      Count                                               Function in module 'kernel'
     
13.23        271                                                           (null)

  
Sample %      Count                                               Function in module 'RadeonHD.chip'
      
3.27         67                                                           (null)

  
Sample %      Count                                               Function in module 'LIBS:Warp3DNova/W3DN_SI.library'
      
1.51         31                                                           (null)

  
Sample %      Count                                               Function in module 'LIBS:ogles2.library'
      
1.17         24                                                           (null)

  
Sample %      Count                                               Function in module 'Tequila'
      
0.54         11                                                           (null)

  
Sample %      Count                                               Function in module 'newlib.library.kmod'
      
0.39          8                                                           (null)

  
Sample %      Count                                               Function in module 'intuition.library.kmod'
      
0.10          2                                                           (null)

  
Sample %      Count                                               Function in module 'AmiDock'
      
0.10          2                                                           (null)

  
Sample %      Count                                               Function in module 'System:Utilities/Dockies/topbar/RamDock/RAMDock.docky'
      
0.05          1                                                         FONTNAME

  Sample 
%      Count                                               Function in module 'RadeonRX.chip'
      
0.05          1                                                           (null)

  
Sample %      Count                                               Function in module 'dos.library.kmod'
      
0.05          1                                                           (null)

  
Sample %      Count                                               Function in module 'massstorage.usbfd'
      
0.05          1                                                           (null)


See, there now something strange : glsnoop says that warp3dnova/ogles2 take about 50% of time, but tequila says that warp3dnova take only 1.51% and ogles 1.17% ?

Go to top
Re: Profiling on amigaos4
Not too shy to talk
Not too shy to talk


See User information
@kas1e

Quote:

So, questions is : what we have today to do normal profiling of our apps ?

There was "hieronymus" on os4depot. It wasn't feature rich, but it was a start. Through it wasn't working on many machines, and work only on AmigaOne (G3/G4), Sam440 and Sam460. Also started from os4 FE, it didn't works Sam460ex. As it also no more in develop we probably can forget about for now.


Hieronymus is not as advanced as I would like but don't consider it dead yet

I have unreleased improvements that I would have to publish in a new version, to make it working in an alternative mode without machine restriction. And other things.
I was not aware about problems with Sam460.

Go to top
Re: Profiling on amigaos4
Quite a regular
Quite a regular


See User information

Go to top
Re: Profiling on amigaos4
Just can't stay away
Just can't stay away


See User information
@kas1e

If you use the SAMPLES parameter, you can get more accuracy. Default is 99 per second.

Quote:

glsnoop says that warp3dnova/ogles2 take about 50% of time, but tequila says that warp3dnova take only 1.51% and ogles 1.17% ?


Right. So those percents reported by glSnoop are "wrapped" times: a function starts and ends, its duration is recorded. They include everything else that is called by OGLES2 or Nova (for example graphics.library).

With Tequila we can see graphics.library taking 15%. That could be blit functions for display update, and it's likely that this is belongs to that 50% total that glSnoop reports.

Go to top
Re: Profiling on amigaos4
Home away from home
Home away from home


See User information
@Capehill
Quote:

Right. So those percents reported by glSnoop are "wrapped" times: a function starts and ends, its duration is recorded. They include everything else that is called by OGLES2 or Nova (for example graphics.library).

With Tequila we can see graphics.library taking 15%. That could be blit functions for display update, and it's likely that this is belongs to that 50% total that glSnoop reports.


How intersting .. I mean, if graphics.library take up to 15% from all warp3dnova and ogles2 stuff its quite a lot ! Probabaly that mean that DMA support for x5000 in graphics.library will help pretty much, even without DMA for ram->vram memory accesses.

Is there any way how we can know what functions of graphics.library are involved ? Will be indeed interesting to know all kind of functions involved in some game, so to see full picture of what is un-optimized and take up the time most.

And yeah, using SAMPLES of 10000 make it all be more accurate about processes being involved, but output still all those "60% Not available", ad all functions in modules are (null).

As side note i still think about that "-pg" switch in gcc and ability to fix it somehow. But i am not sure, if without debug build of graphics.library, we can know what functions is used in (and does not matter if it tequila or gcc with -pg) ?

Go to top
Re: Profiling on amigaos4
Not too shy to talk
Not too shy to talk


See User information
@Spectre660

Quote:

@corto

https://www.amigans.net/modules/xforum ... id=105477#forumpost105477


Oops! I'm so sorry ... I really did not remember. So I did not keep my promise ... I was too far too long ... but after an house, my third child and several job changes, I start again to have some free time

Go to top
Re: Profiling on amigaos4
Home away from home
Home away from home


See User information
@Capehill
From code of tequila i not very much understand how it all works, but is it going some oldschool method like we get time at begining of fnction, then at end, calcualte delte, and put that to log ?

Go to top
Re: Profiling on amigaos4
Just can't stay away
Just can't stay away


See User information
@kas1e

1) It's based on a timer interrupt. When interrupt triggers, it read the current task address from ExecBase. That seems to be enough to check which task is using the CPU.

2) For function profiling, in addition to the first step, it tries to get the Link Register (see elfspec_ppc.pdf) from task's stack, to know where in the memory task was running when interrupt came.

3) Finally there is an attempt to query the symbol names based on the collected addresses (last 30 seconds).



Go to top
Re: Profiling on amigaos4
Just can't stay away
Just can't stay away


See User information
@corto

I just tried to use hieronymus on my Sam460 and it crashes when I run it on my 4.1 Final Edition or beta installations, but works fine under 4.1 Update 6.

The crash AFAICT is because of a NULL elf handle being passed to IElf->CloseElf() from:
hieronymus:Bosch_StopAcquisition()+0x3fc (section 1 @ 0xBA8)

Go to top
Re: Profiling on amigaos4
Just can't stay away
Just can't stay away


See User information
@salass00

Quote:

The crash AFAICT is because of a NULL elf handle being passed to IElf->CloseElf() from:
hieronymus:Bosch_StopAcquisition()+0x3fc (section 1 @ 0xBA8)


Not quite that simple unfortunately. I added a NULL pointer check in CloseElf() but it just lead to a different DSI (again caused by a NULL pointer access) in a different function, but still from a call to IElf->CloseElfTags() from hieronymus:Bosch_StopAcquisition().

Go to top
Re: Profiling on amigaos4
Home away from home
Home away from home


See User information
@All
There are new verson of hieronymus on os4depot (upload query for now) !

Go to top
Re: Profiling on amigaos4
Site Builder
Site Builder


See User information
@corto
Thank you so much for the new version of hieronymus.
I played with it a little bit today, and I will do that again tomorrow.

I wanted to ask you, when you run hieronymus with profile, how can you set the exit time to be more than 30sec?

Quote:
(what requires to have compiled the object with debug option).

Also, in the readme file you mention the above. By that you mean to compile apps with -gstabs? Is that correct?

Go to top
Re: Profiling on amigaos4
Not too shy to talk
Not too shy to talk


See User information
@kas1e
Thanks for the news. Here the file still does not appear in the recent files.

@walkero
I hope you will enjoy this new version!
You can change the default sampling duration with the option "duration", for example "duration=60" for one minute (and you have the possibility to stop in that time with CTRL-C). I may have to clarify the command "help" where the information appears.
About debug option, it seems that -g is enough to access function symbols.

@salass0
I just see your old comment. I know more details about the crash in CloseElfTags(). I will create a test file and see if I can fix it or create a Bugzilla ticket.

@all
This release comes after a huge rework of the tool. Reporting is improved and among new features, A1222/X5000 support is added (at the moment, limited to OS4 beta users due to a change in the kernel) ... and finally, after several approaches, I found a solution to make profiling working on X1000 (not the best implementation but seems to do the job though).

Go to top
Re: Profiling on amigaos4
Not too shy to talk
Not too shy to talk


See User information
Note that after 3 days, Hieronymus is still in the Uploads page

Do I have something to do? What happens?

Go to top
Re: Profiling on amigaos4
Home away from home
Home away from home


See User information
@Corto
Probably you forgot a password you use with previous upload, so it can't be automatically overwritten. When i have such kind of problems i wrote directly to Orgin.

Go to top
Re: Profiling on amigaos4
Site Builder
Site Builder


See User information
@corto
Thank you so much for this release and for your reply.
I was using the duration argument without the equals sign, i.e. duration 60
Yeah, a clarification in the help would be useful.

The old version had an argument to specify the binary. If I recall correctly that filtered the results to be only for the specified one? Do you plan to add something like that in the new version, or it is there and I missed it?

Quote:
Note that after 3 days, Hieronymus is still in the Uploads page


Check your email account (and spams) for a confirmation email.

Go to top
Re: Profiling on amigaos4
Not too shy to talk
Not too shy to talk


See User information
@kas1e
When I submitted it, that seemed to be good ... anyway, I sent a message to orgin.

@walkero
Thanks for the idea about spam, I checked.
About the option to track a single program, this feature was not in the previous version ... even if I think about it for a long time. Now that I started to move using the performance monitor by default, it should not be so difficult, the mechanism is already used for the gcc profiling. For other platforms, I suppose I will have to be creative, one more time (like for the current X1000 support).
That's one of the feature for a future version, in my todo list.

Go to top

  Register To Post

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2016 The XOOPS Project