We recently discovered that clib2 exhibits significantly slower Console I/O and File I/O performance compared to newlib for several functions. Specifically, functions like printf(), fwrite(), puts(), fopen()/fclose(), fputs(), sprintf(), and snprintf() (those we measured) perform up to 10x slower in clib2 than in newlib.
For example, running this test case:
#include <stdio.h>
int main(void) {
for (int i = 0; i < 1000000; i++) {
printf("%3d %3d %3d\n", 255, 128, 0);
}
return 0;
}
When executed with output redirection (test > output), the clib2 build takes 21 seconds, while the newlib build completes in 2.5 seconds.
To investigate, I first used gprof for profiling. It reported only ~2.5 seconds of execution time, leaving ~18 seconds unaccounted for. I then rebuilt clib2’s libc.a with -pg to include library functions in the profile. The gprof output still showed ~2.5 seconds for both the binary and libc.a functions (similar to newlib’s total runtime), but the additional ~18 seconds in clib2’s execution remained unexplained:
Next, I used Hieronymus for system-wide profiling, which revealed that most of the time (~18-19 seconds) is spent in the kernel. I then tested with an unstripped kernel compiled with -gstabs to include debug symbols. While Hieronymus identified some kernel functions (e.g., _impl_Supervisor, J_Write), the majority of the ~18-19 seconds was still attributed to <symbols not found>:
Do these unresolved kernel symbols maybe represent console I/O driver calls (e.g., ConsoleDevice_Write) or something else? If so, while Hieronimous output show time spend in Kernel, and not in dos.library for example, because of dos.library fast do kernel calls and so it catches as "kernel" ?
Anyway, the next thing i do seeing how it all happens, is use Snoopy, and simple catch Write() calls of dos.library, and .. suprise, clib2 doing the same calls as in printf : 12 bytes per call. But checking the same via newlib, i can see it do 32768 bytes per call ! So seems that clib2 is in line-buffered mode, while newlib is full-buffered ..
I tried then that with clib2:
#include <stdio.h>
int main(void) {
setvbuf(stdout, NULL, _IOFBF, 32768); // 32КБ буфер, как newlib
for (int i = 0; i < 1000000; i++) {
printf("%3d %3d %3d\n", 255, 128, 0);
}
return 0;
}
It start to be better, 10 seconds instead of 21, but still, far aways from newlib's 2.5 second. What else can be involved in ?
Specifically, functions like printf(), [...], sprintf(), and snprintf() (those we measured) perform up to 10x slower in clib2 than in newlib.
All printf() like functions are actually only a single function, just called with different args, at least in newlib: vfprintf().
Quote:
It start to be better, 10 seconds instead of 21, but still, far aways from newlib's 2.5 second. What else can be involved in ?
The clib2 implementations of the I/O functions are simply crap, except for completely re-implementing them for clib4 from scratch there isn't much you can do. Using the high level (C: fread(), fwrite(), vfprintf(), etc.) newlib I/O functions for clib4 should work without much (if any) changes, but IIRC the low level (POSIX: read(), write(), etc.) clib2 functions aren't very good either.
Edited by joerg on 2025/6/20 16:26:29 Edited by joerg on 2025/6/20 16:27:51
All printf() like functions are actually only a single function, just called with different args, at least in newlib: vfprintf().
Yeah, after gprof'ing different versions they all call vfprintf() (at least clib2 for sure, so then newlib too then if you says so).
Quote:
The clib2 implementations of the I/O functions are simply crap, except for completely re-implementing them for clib4 from scratch there isn't much you can do. Using the high level (C: fread(), fwrite(), vfprintf(), etc.) newlib I/O functions for clib4 should work without much (if any) changes, but IIRC the low level (POSIX: read(), write(), etc.) clib2 functions aren't very good either.
File IO also suck quite much :) See the table i did some days ago testing about 30 different functions in clib2, clib4 and newlib:
@Chris Clib4 is definately "only" os4 one, as it were starting in mind to ditch 68k support because of easy mantain, and usage of only specific os4 functions with all related stuff. Probably some thing can be backported (that only Andrea know). You can check sources if you interested : https://github.com/AmigaLabs/clib4 (development branch is the most active). But slow IO still here, and by slow i mean damn slow :) And console one, and File one, and even string formatting slow.
Just browsed the clib4 sources a bit, and there is still a lot, way too much, crap from clib2 remaining. Very simple example: https://github.com/AmigaLabs/clib4/blob/master/library/stdio/lock.c Just replacing the old, and probably only used for the AmigaOS 1.x-3.x compatibility of clib2, semaphores (based on Forbid()/Permit()!) by OS4 mutexes (based on atomic increment/decrement instructions) should result in a noticeable speed improvement.
Andrea tried to change it all on mutexes : https://github.com/AmigaLabs/clib4/commits/mutexes/ It change a shit :) And i really mean it : no single speed up. Same slow stuff for fwrite(), printf(), puts() and sprintf().
But what were found that if we commented out for example in fwrite.c, check_abourt, lock/unlock, we then gain a bit : like for fwrite it was 13s for test case, and with commented check_abort/lock/unlock start to be 8. Of course nothing mostly, as newlib gives 0.5s on same test, but still something to think about..
And what if you redirect output to different place. Like some file in RAM: or to NIL:
I doing all tests being in RAM: with >1 redirect, but retested again, and be it redirected to file, to NIL - all the same for original test case : ~19seconds.
Then tried "255 128 0" with "n" : 13 seconds Then tried "255 128 0" without "n" : 13 seconds too
@kas1e As I wrote several times already the I/O implementations, and several other parts, of clib2 are simply crap. IIRC even the libnix code was much better, at least for the quite limited parts it supports.
You only have 2 options: - Re-implement all clib4 parts from scratch and remove any code from clib4, not only the I/O related parts, which were based on clib2 sources. - Throw away the whole junk and restart from scratch, porting a usable C library like newlib, uClib, AROS C library, etc., (a glibc port is next to impossible, a NetBSD libc port way too much work, and my OS4 port of ixemul incomplete and too complicated to install for casual users...), instead.
If you have a snprintf() benchmark compare clib2, clib4, newlib, libnix, ixemul and IUtility->SNPrintf() results... Maybe additionally vclib (like libnix rather limited) using VBCC. Except for clib4, which didn't exists yet, I had support for all 4 C libraries in my gcc specs file, and I used VBCC, or at least vlink, which at that time was much better than the binutils ld, for a lot of my AmigaOS software.
You only have 2 options: - Re-implement all clib4 parts from scratch and remove any code from clib4, not only the I/O related parts, which were based on clib2 sources.
Probably only that one to go.
Quote:
- Throw away the whole junk and restart from scratch, porting a usable C library like newlib, uClib, AROS C library, etc., (a glibc port is next to impossible, a NetBSD libc port way too much work, and my OS4 port of ixemul incomplete and too complicated to install for casual users...), instead.
There only newlib and uClib can be pretended for, because AROS one for sure will have more bugs and there the same no developers mostly (just a few as on os4), and it will be surely full of ifdefs of any sort because of too much different platforms support. We get rid of clib2 just to not have anymore os3/mos ifdefs in which no one use anymore.. But then, its anyway too late to change the route for a complete new rewrite: Andrea already spend a year or two in clib4, so he for sure will start nothing new, he currently have no time to deal with replacing FILE * clib2 crap.
From another side, maybe someone on payment basis want to replace FILE * crap in clib4 on proper newlibs one or uClib one ?