Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
62 user(s) are online (55 user(s) are browsing Forums)

Members: 0
Guests: 62

more...

Support us!

Headlines

 
  Register To Post  

clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@All

We recently discovered that clib2 exhibits significantly slower Console I/O and File I/O performance compared to newlib for several functions. Specifically, functions like printf(), fwrite(), puts(), fopen()/fclose(), fputs(), sprintf(), and snprintf() (those we measured) perform up to 10x slower in clib2 than in newlib.

For example, running this test case:

#include <stdio.h>

int main(void) {
    for (
int i 01000000i++) {
        
printf("%3d %3d %3d\n"2551280);
    }
    return 
0;
}


When executed with output redirection (test > output), the clib2 build takes 21 seconds, while the newlib build completes in 2.5 seconds.

To investigate, I first used gprof for profiling. It reported only ~2.5 seconds of execution time, leaving ~18 seconds unaccounted for. I then rebuilt clib2’s libc.a with -pg to include library functions in the profile. The gprof output still showed ~2.5 seconds for both the binary and libc.a functions (similar to newlib’s total runtime), but the additional ~18 seconds in clib2’s execution remained unexplained:

Each sample counts as 0.01 seconds.
  %   
cumulative   self              self     total
 time   seconds   seconds    calls   s
/call   s/call  name
 32.88      0.96     0.96  1000000     0.00     0.00  vfprintf
 15.75      1.42     0.46                             arg_init_ctor
 13.01      1.80     0.38                             __udivdi3
  6.51      1.99     0.19  4000000     0.00     0.00  __check_abort
  3.08      2.08     0.09  1000000     0.00     0.00  __flush
  3.08      2.17     0.09  1000000     0.00     0.00  printf
  2.74      2.25     0.08  1000000     0.00     0.00  __flush_iob_write_buffer
  2.40      2.32     0.07  1000000     0.00     0.00  __get_file_descriptor
  2.05      2.38     0.06                             __grow_iob_table
  1.37      2.42     0.04  1000000     0.00     0.00  __fd_hook_entry
  0.68      2.44     0.02  1000000     0.00     0.00  __fputc_check
  0.68      2.46     0.02  1000000     0.00     0.00  __iob_hook_entry
  0.68      2.48     0.02        1     0.02     1.58  main
  0.00      2.48     0.00        2     0.00     0.00  profil
  0.00      2.48     0.00        1     0.00     0.00  __exit_trap_trigger
  0.00      2.48     0.00        1     0.00     0.00  
exit


Next, I used Hieronymus for system-wide profiling, which revealed that most of the time (~18-19 seconds) is spent in the kernel. I then tested with an unstripped kernel compiled with -gstabs to include debug symbols. While Hieronymus identified some kernel functions (e.g., _impl_Supervisor, J_Write), the majority of the ~18-19 seconds was still attributed to <symbols not found>:

%  | Program                                 
------------------------------------------------
 
54.8 SYS:Kickstart/kernel
 31.7 
SYS:Kickstart/SmartFilesystem
 12.8 
notfound
  0.4 
SYS:Kickstart/dos.library.kmod
  0.1 
SYS:Kickstart/graphics.library.kmod
  0.1 
DEVS:USB/fd/hid.usbfd
  0.0 
SYS:Kickstart/intuition.library.kmod
  0.0 
LIBS:bsdsocket.library
  0.0 
L:appdir-handler
  0.0 
SYS:Kickstart/uhci.usbhcd
  0.0 
SYS:Kickstart/timer.device.kmod

Report by 
function (sorted by decreasing percents spent in each function):

   %  | Function                                 | 
Program
------------------------------------------------------------------------------------------
 
34.7 | <symbols not found>                      | SYS:Kickstart/kernel
 31.7 
| <symbols not found>                      | SYS:Kickstart/SmartFilesystem
 15.7 
_impl_Supervisor                         SYS:Kickstart/kernel
 12.8 
| <symbols not found>                      | notfound
  0.8 
Internal_MemCpy.part.0                   SYS:Kickstart/kernel
  0.4 
_impl_Permit                             SYS:Kickstart/kernel
  0.4 
btpool_get_small                         SYS:Kickstart/kernel
  0.3 
_impl_SetSignal                          SYS:Kickstart/kernel
  0.2 
_impl_FreePooled                         SYS:Kickstart/kernel
  0.2 
Internal_MemCpy                          SYS:Kickstart/kernel
  0.2 
_impl_FindTask                           SYS:Kickstart/kernel
  0.2 
_impl_Disable                            SYS:Kickstart/kernel
  0.2 
_impl_ReleaseSemaphore                   SYS:Kickstart/kernel
  0.2 
__handle_page_list                       SYS:Kickstart/kernel
  0.2 
Internal_BZero                           SYS:Kickstart/kernel
  0.2 
_impl_AttemptSemaphoreWithSignal         SYS:Kickstart/kernel
  0.2 
btpool_free                              SYS:Kickstart/kernel
  0.2 
btpool_free_chunk_nomerge                SYS:Kickstart/kernel
  0.2 
J_Write                                  SYS:Kickstart/dos.library.kmod
  0.1 
_impl_AllocPooled                        SYS:Kickstart/kernel
  0.1 
_impl_Enable                             SYS:Kickstart/kernel
  0.1 
getProcess                               SYS:Kickstart/dos.library.kmod
  0.1 
btpool_small_alloc.part.0                SYS:Kickstart/kernel
  0.1 
HAL_GetPageAttrs                         SYS:Kickstart/kernel
  0.1 
btpool_merge_front                       SYS:Kickstart/kernel
  0.1 
_impl_Forbid                             SYS:Kickstart/kernel
  0.1 
| <symbols not found>                      | SYS:Kickstart/graphics.library.kmod
  0.1 
| <symbols not found>                      | DEVS:USB/fd/hid.usbfd



Do these unresolved kernel symbols maybe represent console I/O driver calls (e.g., ConsoleDevice_Write) or something else? If so, while Hieronimous output show time spend in Kernel, and not in dos.library for example, because of dos.library fast do kernel calls and so it catches as "kernel" ?

Anyway, the next thing i do seeing how it all happens, is use Snoopy, and simple catch Write() calls of dos.library, and .. suprise, clib2 doing the same calls as in printf : 12 bytes per call. But checking the same via newlib, i can see it do 32768 bytes per call ! So seems that clib2 is in line-buffered mode, while newlib is full-buffered ..

I tried then that with clib2:

#include <stdio.h>
int main(void) {
    
setvbuf(stdoutNULL_IOFBF32768); // 32КБ буфер, как newlib
    
for (int i 01000000i++) {
        
printf("%3d %3d %3d\n"2551280);
    }
    return 
0;
}


It start to be better, 10 seconds instead of 21, but still, far aways from newlib's 2.5 second. What else can be involved in ?


Edited by kas1e on 2025/6/19 4:58:42
Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@kas1e
Quote:
Specifically, functions like printf(), [...], sprintf(), and snprintf() (those we measured) perform up to 10x slower in clib2 than in newlib.
All printf() like functions are actually only a single function, just called with different args, at least in newlib: vfprintf().

Quote:
It start to be better, 10 seconds instead of 21, but still, far aways from newlib's 2.5 second. What else can be involved in ?
The clib2 implementations of the I/O functions are simply crap, except for completely re-implementing them for clib4 from scratch there isn't much you can do.
Using the high level (C: fread(), fwrite(), vfprintf(), etc.) newlib I/O functions for clib4 should work without much (if any) changes, but IIRC the low level (POSIX: read(), write(), etc.) clib2 functions aren't very good either.


Edited by joerg on 2025/6/20 16:26:29
Edited by joerg on 2025/6/20 16:27:51
Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@joerg
Quote:

All printf() like functions are actually only a single function, just called with different args, at least in newlib: vfprintf().


Yeah, after gprof'ing different versions they all call vfprintf() (at least clib2 for sure, so then newlib too then if you says so).

Quote:

The clib2 implementations of the I/O functions are simply crap, except for completely re-implementing them for clib4 from scratch there isn't much you can do.
Using the high level (C: fread(), fwrite(), vfprintf(), etc.) newlib I/O functions for clib4 should work without much (if any) changes, but IIRC the low level (POSIX: read(), write(), etc.) clib2 functions aren't very good either.


File IO also suck quite much :) See the table i did some days ago testing about 30 different functions in clib2, clib4 and newlib:

https://github.com/AmigaLabs/clib4/issues/276

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: clib2 vs newlib perfomance issues
Amigans Defender
Amigans Defender


See User information
@kas1e

Presumably this also affects 68k clib2. Is clib4 available for 68k/OS3? If not, can any fixes be backported to clib2?

Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@Chris
Clib4 is definately "only" os4 one, as it were starting in mind to ditch 68k support because of easy mantain, and usage of only specific os4 functions with all related stuff. Probably some thing can be backported (that only Andrea know). You can check sources if you interested : https://github.com/AmigaLabs/clib4 (development branch is the most active). But slow IO still here, and by slow i mean damn slow :) And console one, and File one, and even string formatting slow.

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@joerg
Quote:

Just browsed the clib4 sources a bit, and there is still a lot, way too much, crap from clib2 remaining.
Very simple example:
https://github.com/AmigaLabs/clib4/blob/master/library/stdio/lock.c
Just replacing the old, and probably only used for the AmigaOS 1.x-3.x compatibility of clib2, semaphores (based on Forbid()/Permit()!) by OS4 mutexes (based on atomic increment/decrement instructions) should result in a noticeable speed improvement.


Andrea tried to change it all on mutexes : https://github.com/AmigaLabs/clib4/commits/mutexes/ It change a shit :) And i really mean it : no single speed up. Same slow stuff for fwrite(), printf(), puts() and sprintf().

But what were found that if we commented out for example in fwrite.c, check_abourt, lock/unlock, we then gain a bit : like for fwrite it was 13s for test case, and with commented check_abort/lock/unlock start to be 8. Of course nothing mostly, as newlib gives 0.5s on same test, but still something to think about..

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: clib2 vs newlib perfomance issues
Just popping in
Just popping in


See User information
@kas1e

Quote:

#include 

int main(void) {
    for (
int i 01000000i++) {
        
printf("%3d %3d %3d\n"2551280);
    }
    return 
0;
}



What results do you get if you change printf line to

printf("255 128   0\n");


And what if you change it to:

puts("255 128   0");


And what if you redirect output to different place. Like some file in RAM: or to NIL:

Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@George
Quote:

And what if you redirect output to different place. Like some file in RAM: or to NIL:


I doing all tests being in RAM: with >1 redirect, but retested again, and be it redirected to file, to NIL - all the same for original test case : ~19seconds.

Then tried "255 128 0" with "n" : 13 seconds
Then tried "255 128 0" without "n" : 13 seconds too

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@kas1e
As I wrote several times already the I/O implementations, and several other parts, of clib2 are simply crap. IIRC even the libnix code was much better, at least for the quite limited parts it supports.

You only have 2 options:
- Re-implement all clib4 parts from scratch and remove any code from clib4, not only the I/O related parts, which were based on clib2 sources.
- Throw away the whole junk and restart from scratch, porting a usable C library like newlib, uClib, AROS C library, etc., (a glibc port is next to impossible, a NetBSD libc port way too much work, and my OS4 port of ixemul incomplete and too complicated to install for casual users...), instead.

If you have a snprintf() benchmark compare clib2, clib4, newlib, libnix, ixemul and IUtility->SNPrintf() results... Maybe additionally vclib (like libnix rather limited) using VBCC.
Except for clib4, which didn't exists yet, I had support for all 4 C libraries in my gcc specs file, and I used VBCC, or at least vlink, which at that time was much better than the binutils ld, for a lot of my AmigaOS software.

Go to top
Re: clib2 vs newlib perfomance issues
Home away from home
Home away from home


See User information
@joerg
Quote:

You only have 2 options:
- Re-implement all clib4 parts from scratch and remove any code from clib4, not only the I/O related parts, which were based on clib2 sources.


Probably only that one to go.


Quote:

- Throw away the whole junk and restart from scratch, porting a usable C library like newlib, uClib, AROS C library, etc., (a glibc port is next to impossible, a NetBSD libc port way too much work, and my OS4 port of ixemul incomplete and too complicated to install for casual users...), instead.


There only newlib and uClib can be pretended for, because AROS one for sure will have more bugs and there the same no developers mostly (just a few as on os4), and it will be surely full of ifdefs of any sort because of too much different platforms support. We get rid of clib2 just to not have anymore os3/mos ifdefs in which no one use anymore.. But then, its anyway too late to change the route for a complete new rewrite: Andrea already spend a year or two in clib4, so he for sure will start nothing new, he currently have no time to deal with replacing FILE * clib2 crap.

From another side, maybe someone on payment basis want to replace FILE * crap in clib4 on proper newlibs one or uClib one ?

Join us to improve dopus5!
AmigaOS4 on youtube
Go to top

  Register To Post

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2024 The XOOPS Project