Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
15 user(s) are online (4 user(s) are browsing Forums)

Members: 1
Guests: 14

NinjaCyborg, more...

Support us!

Headlines

 
  Register To Post  

clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@All

We recently discovered that clib2 exhibits significantly slower Console I/O and File I/O performance compared to newlib for several functions. Specifically, functions like printf(), fwrite(), puts(), fopen()/fclose(), fputs(), sprintf(), and snprintf() (those we measured) perform up to 10x slower in clib2 than in newlib.

For example, running this test case:

#include <stdio.h>

int main(void) {
    for (
int i 01000000i++) {
        
printf("%3d %3d %3d\n"2551280);
    }
    return 
0;
}


When executed with output redirection (test > output), the clib2 build takes 21 seconds, while the newlib build completes in 2.5 seconds.

To investigate, I first used gprof for profiling. It reported only ~2.5 seconds of execution time, leaving ~18 seconds unaccounted for. I then rebuilt clib2’s libc.a with -pg to include library functions in the profile. The gprof output still showed ~2.5 seconds for both the binary and libc.a functions (similar to newlib’s total runtime), but the additional ~18 seconds in clib2’s execution remained unexplained:

Each sample counts as 0.01 seconds.
  %   
cumulative   self              self     total
 time   seconds   seconds    calls   s
/call   s/call  name
 32.88      0.96     0.96  1000000     0.00     0.00  vfprintf
 15.75      1.42     0.46                             arg_init_ctor
 13.01      1.80     0.38                             __udivdi3
  6.51      1.99     0.19  4000000     0.00     0.00  __check_abort
  3.08      2.08     0.09  1000000     0.00     0.00  __flush
  3.08      2.17     0.09  1000000     0.00     0.00  printf
  2.74      2.25     0.08  1000000     0.00     0.00  __flush_iob_write_buffer
  2.40      2.32     0.07  1000000     0.00     0.00  __get_file_descriptor
  2.05      2.38     0.06                             __grow_iob_table
  1.37      2.42     0.04  1000000     0.00     0.00  __fd_hook_entry
  0.68      2.44     0.02  1000000     0.00     0.00  __fputc_check
  0.68      2.46     0.02  1000000     0.00     0.00  __iob_hook_entry
  0.68      2.48     0.02        1     0.02     1.58  main
  0.00      2.48     0.00        2     0.00     0.00  profil
  0.00      2.48     0.00        1     0.00     0.00  __exit_trap_trigger
  0.00      2.48     0.00        1     0.00     0.00  
exit


Next, I used Hieronymus for system-wide profiling, which revealed that most of the time (~18-19 seconds) is spent in the kernel. I then tested with an unstripped kernel compiled with -gstabs to include debug symbols. While Hieronymus identified some kernel functions (e.g., _impl_Supervisor, J_Write), the majority of the ~18-19 seconds was still attributed to <symbols not found>:

%  | Program                                 
------------------------------------------------
 
54.8 SYS:Kickstart/kernel
 31.7 
SYS:Kickstart/SmartFilesystem
 12.8 
notfound
  0.4 
SYS:Kickstart/dos.library.kmod
  0.1 
SYS:Kickstart/graphics.library.kmod
  0.1 
DEVS:USB/fd/hid.usbfd
  0.0 
SYS:Kickstart/intuition.library.kmod
  0.0 
LIBS:bsdsocket.library
  0.0 
L:appdir-handler
  0.0 
SYS:Kickstart/uhci.usbhcd
  0.0 
SYS:Kickstart/timer.device.kmod

Report by 
function (sorted by decreasing percents spent in each function):

   %  | Function                                 | 
Program
------------------------------------------------------------------------------------------
 
34.7 | <symbols not found>                      | SYS:Kickstart/kernel
 31.7 
| <symbols not found>                      | SYS:Kickstart/SmartFilesystem
 15.7 
_impl_Supervisor                         SYS:Kickstart/kernel
 12.8 
| <symbols not found>                      | notfound
  0.8 
Internal_MemCpy.part.0                   SYS:Kickstart/kernel
  0.4 
_impl_Permit                             SYS:Kickstart/kernel
  0.4 
btpool_get_small                         SYS:Kickstart/kernel
  0.3 
_impl_SetSignal                          SYS:Kickstart/kernel
  0.2 
_impl_FreePooled                         SYS:Kickstart/kernel
  0.2 
Internal_MemCpy                          SYS:Kickstart/kernel
  0.2 
_impl_FindTask                           SYS:Kickstart/kernel
  0.2 
_impl_Disable                            SYS:Kickstart/kernel
  0.2 
_impl_ReleaseSemaphore                   SYS:Kickstart/kernel
  0.2 
__handle_page_list                       SYS:Kickstart/kernel
  0.2 
Internal_BZero                           SYS:Kickstart/kernel
  0.2 
_impl_AttemptSemaphoreWithSignal         SYS:Kickstart/kernel
  0.2 
btpool_free                              SYS:Kickstart/kernel
  0.2 
btpool_free_chunk_nomerge                SYS:Kickstart/kernel
  0.2 
J_Write                                  SYS:Kickstart/dos.library.kmod
  0.1 
_impl_AllocPooled                        SYS:Kickstart/kernel
  0.1 
_impl_Enable                             SYS:Kickstart/kernel
  0.1 
getProcess                               SYS:Kickstart/dos.library.kmod
  0.1 
btpool_small_alloc.part.0                SYS:Kickstart/kernel
  0.1 
HAL_GetPageAttrs                         SYS:Kickstart/kernel
  0.1 
btpool_merge_front                       SYS:Kickstart/kernel
  0.1 
_impl_Forbid                             SYS:Kickstart/kernel
  0.1 
| <symbols not found>                      | SYS:Kickstart/graphics.library.kmod
  0.1 
| <symbols not found>                      | DEVS:USB/fd/hid.usbfd



Do these unresolved kernel symbols maybe represent console I/O driver calls (e.g., ConsoleDevice_Write) or something else? If so, while Hieronimous output show time spend in Kernel, and not in dos.library for example, because of dos.library fast do kernel calls and so it catches as "kernel" ?

Anyway, the next thing i do seeing how it all happens, is use Snoopy, and simple catch Write() calls of dos.library, and .. suprise, clib2 doing the same calls as in printf : 12 bytes per call. But checking the same via newlib, i can see it do 32768 bytes per call ! So seems that clib2 is in line-buffered mode, while newlib is full-buffered ..

I tried then that with clib2:

#include <stdio.h>
int main(void) {
    
setvbuf(stdoutNULL_IOFBF32768); // 32КБ буфер, как newlib
    
for (int i 01000000i++) {
        
printf("%3d %3d %3d\n"2551280);
    }
    return 
0;
}


It start to be better, 10 seconds instead of 21, but still, far aways from newlib's 2.5 second. What else can be involved in ?


Edited by kas1e on 2025/6/19 4:58:42
Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@kas1e
Quote:
Specifically, functions like printf(), [...], sprintf(), and snprintf() (those we measured) perform up to 10x slower in clib2 than in newlib.
All printf() like functions are actually only a single function, just called with different args, at least in newlib: vfprintf().

Quote:
It start to be better, 10 seconds instead of 21, but still, far aways from newlib's 2.5 second. What else can be involved in ?
The clib2 implementations of the I/O functions are simply crap, except for completely re-implementing them for clib4 from scratch there isn't much you can do.
Using the high level (C: fread(), fwrite(), vfprintf(), etc.) newlib I/O functions for clib4 should work without much (if any) changes, but IIRC the low level (POSIX: read(), write(), etc.) clib2 functions aren't very good either.


Edited by joerg on 2025/6/20 16:26:29
Edited by joerg on 2025/6/20 16:27:51
Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@joerg
Quote:

All printf() like functions are actually only a single function, just called with different args, at least in newlib: vfprintf().


Yeah, after gprof'ing different versions they all call vfprintf() (at least clib2 for sure, so then newlib too then if you says so).

Quote:

The clib2 implementations of the I/O functions are simply crap, except for completely re-implementing them for clib4 from scratch there isn't much you can do.
Using the high level (C: fread(), fwrite(), vfprintf(), etc.) newlib I/O functions for clib4 should work without much (if any) changes, but IIRC the low level (POSIX: read(), write(), etc.) clib2 functions aren't very good either.


File IO also suck quite much :) See the table i did some days ago testing about 30 different functions in clib2, clib4 and newlib:

https://github.com/AmigaLabs/clib4/issues/276

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top

  Register To Post

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2024 The XOOPS Project