@All
We recently discovered that clib2 exhibits significantly slower Console I/O and File I/O performance compared to newlib for several functions. Specifically, functions like printf(), fwrite(), puts(), fopen()/fclose(), fputs(), sprintf(), and snprintf() (those we measured) perform up to 10x slower in clib2 than in newlib.
For example, running this test case:
#include <stdio.h>
int main(void) {
for (int i = 0; i < 1000000; i++) {
printf("%3d %3d %3d\n", 255, 128, 0);
}
return 0;
}
When executed with output redirection (test > output), the clib2 build takes 21 seconds, while the newlib build completes in 2.5 seconds.
To investigate, I first used gprof for profiling. It reported only ~2.5 seconds of execution time, leaving ~18 seconds unaccounted for. I then rebuilt clib2’s libc.a with -pg to include library functions in the profile. The gprof output still showed ~2.5 seconds for both the binary and libc.a functions (similar to newlib’s total runtime), but the additional ~18 seconds in clib2’s execution remained unexplained:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
32.88 0.96 0.96 1000000 0.00 0.00 vfprintf
15.75 1.42 0.46 arg_init_ctor
13.01 1.80 0.38 __udivdi3
6.51 1.99 0.19 4000000 0.00 0.00 __check_abort
3.08 2.08 0.09 1000000 0.00 0.00 __flush
3.08 2.17 0.09 1000000 0.00 0.00 printf
2.74 2.25 0.08 1000000 0.00 0.00 __flush_iob_write_buffer
2.40 2.32 0.07 1000000 0.00 0.00 __get_file_descriptor
2.05 2.38 0.06 __grow_iob_table
1.37 2.42 0.04 1000000 0.00 0.00 __fd_hook_entry
0.68 2.44 0.02 1000000 0.00 0.00 __fputc_check
0.68 2.46 0.02 1000000 0.00 0.00 __iob_hook_entry
0.68 2.48 0.02 1 0.02 1.58 main
0.00 2.48 0.00 2 0.00 0.00 profil
0.00 2.48 0.00 1 0.00 0.00 __exit_trap_trigger
0.00 2.48 0.00 1 0.00 0.00 exit
Next, I used Hieronymus for system-wide profiling, which revealed that most of the time (~18-19 seconds) is spent in the kernel. I then tested with an unstripped kernel compiled with -gstabs to include debug symbols. While Hieronymus identified some kernel functions (e.g., _impl_Supervisor, J_Write), the majority of the ~18-19 seconds was still attributed to <symbols not found>:
% | Program
------------------------------------------------
54.8 | SYS:Kickstart/kernel
31.7 | SYS:Kickstart/SmartFilesystem
12.8 | notfound
0.4 | SYS:Kickstart/dos.library.kmod
0.1 | SYS:Kickstart/graphics.library.kmod
0.1 | DEVS:USB/fd/hid.usbfd
0.0 | SYS:Kickstart/intuition.library.kmod
0.0 | LIBS:bsdsocket.library
0.0 | L:appdir-handler
0.0 | SYS:Kickstart/uhci.usbhcd
0.0 | SYS:Kickstart/timer.device.kmod
Report by function (sorted by decreasing percents spent in each function):
% | Function | Program
------------------------------------------------------------------------------------------
34.7 | <symbols not found> | SYS:Kickstart/kernel
31.7 | <symbols not found> | SYS:Kickstart/SmartFilesystem
15.7 | _impl_Supervisor | SYS:Kickstart/kernel
12.8 | <symbols not found> | notfound
0.8 | Internal_MemCpy.part.0 | SYS:Kickstart/kernel
0.4 | _impl_Permit | SYS:Kickstart/kernel
0.4 | btpool_get_small | SYS:Kickstart/kernel
0.3 | _impl_SetSignal | SYS:Kickstart/kernel
0.2 | _impl_FreePooled | SYS:Kickstart/kernel
0.2 | Internal_MemCpy | SYS:Kickstart/kernel
0.2 | _impl_FindTask | SYS:Kickstart/kernel
0.2 | _impl_Disable | SYS:Kickstart/kernel
0.2 | _impl_ReleaseSemaphore | SYS:Kickstart/kernel
0.2 | __handle_page_list | SYS:Kickstart/kernel
0.2 | Internal_BZero | SYS:Kickstart/kernel
0.2 | _impl_AttemptSemaphoreWithSignal | SYS:Kickstart/kernel
0.2 | btpool_free | SYS:Kickstart/kernel
0.2 | btpool_free_chunk_nomerge | SYS:Kickstart/kernel
0.2 | J_Write | SYS:Kickstart/dos.library.kmod
0.1 | _impl_AllocPooled | SYS:Kickstart/kernel
0.1 | _impl_Enable | SYS:Kickstart/kernel
0.1 | getProcess | SYS:Kickstart/dos.library.kmod
0.1 | btpool_small_alloc.part.0 | SYS:Kickstart/kernel
0.1 | HAL_GetPageAttrs | SYS:Kickstart/kernel
0.1 | btpool_merge_front | SYS:Kickstart/kernel
0.1 | _impl_Forbid | SYS:Kickstart/kernel
0.1 | <symbols not found> | SYS:Kickstart/graphics.library.kmod
0.1 | <symbols not found> | DEVS:USB/fd/hid.usbfd
Do these unresolved kernel symbols maybe represent console I/O driver calls (e.g., ConsoleDevice_Write) or something else? If so, while Hieronimous output show time spend in Kernel, and not in dos.library for example, because of dos.library fast do kernel calls and so it catches as "kernel" ?
Anyway, the next thing i do seeing how it all happens, is use Snoopy, and simple catch Write() calls of dos.library, and .. suprise, clib2 doing the same calls as in printf : 12 bytes per call. But checking the same via newlib, i can see it do 32768 bytes per call ! So seems that clib2 is in line-buffered mode, while newlib is full-buffered ..
I tried then that with clib2:
#include <stdio.h>
int main(void) {
setvbuf(stdout, NULL, _IOFBF, 32768); // 32КБ буфер, как newlib
for (int i = 0; i < 1000000; i++) {
printf("%3d %3d %3d\n", 255, 128, 0);
}
return 0;
}
It start to be better, 10 seconds instead of 21, but still, far aways from newlib's 2.5 second. What else can be involved in ?
Edited by kas1e on 2025/6/19 4:58:42