I do not remember, but wasn't there some issue with MAC addresses on x1000 where they all the same? I sure remember something of that sort, just dunno if for x1000.
No idea about the X1000, but the X5000 has, or at least had, the same issue: In the default U-Boot env variables on the MicroSD card delivered with it the variables for the MAC addresses were set All systems using this default use those same 2 MAC addresses, probably the ones of a X5000 of an U-Boot developer. If the MAC address env variables are set (and saved) the values from the env variables are used. Only if it's not set/deleted the firmware reads it from the hardware, which is, or at least should be, different for each system, and stores it to the U-Boot MAC env variables again.
But it's only a problem if you have >= 2 X1000 or >= 2 X5000 in the same LAN
Using AOS4 version of maybe undocumented ~PA_CALL/~PA_FASTCALL instead of PA_SOFTINT for the timer replyport would work better.
I don't know what AROS ~PA_CALL/~PA_FASTCALL is, and what the difference between both is, but in AmigaOS 4.x it' the same as in AmigaOS 0.x-3.9, except that on AmigaOS 4.x it has to be a PPC native function and no emulated m68k code: A single, undocumented but implemented method.
I'm not sure anymore, but I think the value required for it is (the same as) PF_ACTION.
Are gfx drivers in AOS4 already loaded/running when S:startup-sequence is executed?
Of course they are, required for example to display the "Early Startup Menu" on the gfx card. But it's limited to a special "BootVGA" screen mode, IIRC 800x600. It's the same when booting without executing S:Startup-Sequence, from the early startup menu, or with some special keys.
Only after the OS is completey loaded, after starting the DEVS:Monitors drivers, etc., you have access to FHD, 4k, 8k, etc. (depending on gfx card and monitor hardware) screen modes.
There is also a "DRIVER_OVERVIEW.txt" explaining how it works in brief.
Also there is a "linux_ref" dir: it contains the Linux source code of the drivers for everything supported on the X1000 (including our network driver), and the necessary parts from the platform support, so you have no need to search for anything anywhere -- everything is there.
Please check this, maybe some of you will immediately find what is wrong, or will have some ideas, etc.
Thanks a lot!
ps. And thanks to Derffs for device skeleton code, that helps much!
EDIT: Check again latest version (the one on github) via my stress tool which doing that: open multiple TCP connections and recieve data as fast as possible (on the server side i send bunch of AAAAAA when connected by stress tool from x1000), so we force drirver's RX ring wraps many times per second. Now first run after boot pass 300 seconds run fine, then immediately i run it second time, and
[stress] =====================================================
[stress] PA6T-1682M RX ring-wrap stress tester
[stress] =====================================================
[stress] Server : 192.168.0.144:9999
[stress] Connections: 8
[stress] Duration : 300 seconds
[stress] Ring wrap : every 64 frames
[stress] Connection 0 open (fd=0)
[stress] Connection 1 open (fd=1)
[stress] Connection 2 open (fd=2)
[stress] Connection 3 open (fd=3)
[stress] Connection 4 open (fd=4)
[stress] Connection 5 open (fd=5)
[stress] Connection 6 open (fd=6)
[stress] Connection 7 open (fd=7)
[stress] 8/8 connections open. Starting receive...
Sometime,s i not need to even transfer big amount, i can just run it + ctrl, and repeat it few times and if i lucky (or not lucky) can lockup right after 5-6 ctrl+c. So it's not amount of the transfered data or the time, it just happens anytime.
@All Some bug-hunt progress: till today i do all tests on 100m/s cable. And so those lockups were for me not very easy and fast to reproduce, i had to run stress tool for some time usually (200-300 seconds, sometime more, sometime less), sometime just 10-20 runs/ctrl+c was enough, but most time i need to wait to reproduce it. Speed was ~13m/s, fully stressed to maximum.
Now, when i switch to cables which handle 1GB fine, i lockups IMMEDEATELY always when trying to run stress tests. Switched back to 100mb/s cable , and lockup happens not as fast, and take longer. Once again back 1GB - immediately.
Did it give us any clue ? Only that something overflows ?
EDIT: Also, checking the linux pasemi related platform code, found arch/powerpc/platforms/pasemi/setup.c, that they configures bunch of SoC-internal debug registers as part of their MCE handler setup. As we don't have datasheet it's unknown how many of debug info registers present in SOC , but what we know for sure, that in linux (this setup.c) they uncover 8 of them:
so made a diagnostic bit in the driver which on every irq entry read all 8 of these SoC debug registers plus the DMA operational status registers and write them into a ram which will survive reboot after lockup, so i can in CFE do "d 0xXXXXXX" to read what were last before die and result: all 8 debug registers completely zero before death. WTF !
Edited by kas1e on 2026/3/19 19:22:46 Edited by kas1e on 2026/3/19 19:23:31
kas1e wrote:@All Now, when i switch to cables which handle 1GB fine, i lockups IMMEDEATELY always when trying to run stress tests. Switched back to 100mb/s cable , and lockup happens not as fast, and take longer. Once again back 1GB - immediately.
Did it give us any clue ? Only that something overflows ?
Are you just changing the cable, or the physical ports? I.e. when you say you're switching the cable, do you mean you're switching between cat 5e, cat 6e etc., or changing from a 100 Mbit link partner to a Gigabit link partner?
BTW, have you tried doing a ping flood from another machine to the X1000?
@ncafferkey Just a cable, in same ports, i.e. older cable vs cat6x ,etc. So it mean that when i have better speed, the ring wraps (probably) happens faster => died faster.
As for icmp flood : currently tried only tcp one (i.e. i send from PC to x1000 big buffers, and receive them on x1000 to use speed at maximum). I count just RX buffers, and now for sake of less ring wraps, i test with RX_RING_SIZE=2048 (same lockups anyway), so 2048 pakets per wrap.
So.. I was about to try "ping" only to realize that this of course near to impossible to meet same ring-wrap speed (minutes for just one ring-wrap), i start thinking about making a flooder to flood from PC by x1000, but then realize i can just do UDP flood (all in all driver doesn't care about ip protocol, it's all ethernet pakets). With UDP flood no TX happens, X1000 need send nothing back, etc, so if crash we can rule more and more..
EDIT: Ok tested, pure UDP sender, packets of 1472 size, nothing else: when connection is 100mb/s , harder to reproduce in compare with rx_stress which receive TCP, but still lockup. When swith to 1gb cable, then this udp flooder locked up x1000 immediately too.
At least now, we can rule out completely TX path..
EDIT2: And i completely rule out RoadShow now, i just comment out CopyToBuff, and doing same udp-flood (having static mac in arp table on PC, so it still can flood our driver) : lockup.
@all I just created simple test case in which just setup 512 rx buffers with 2048 ring size (just to have big space room), using just RX channel #1 for tests.
In this test case all i do is minimal initialisation of MAC/DMA, allocating/configuring of RX interface, RX Channel and RX Buffer. No interrupts, nothing.
Then, in the main() loop, all i do it's simple blindly set ownder bit (so recycle of slots happens) and DMA again can reuse slots, and update INCr registers.
I mean i don't reply, don't copy, just receive and throw away.
I.e.:
1. Writes zeros to rx_ring[idx] (4 uint64 writes per descriptor) 2. Writes buffer pointers to rx_buf_ring[] 3. Writes INCR registers via wr_dma()
And that all. This is by alone enough to immediately lockup on 1gb speed. On 100mb/s speed, it takes a bit longer, but also lockups. And that is surely same lockup i have in driver, just same look of bus-deadlock.
It looks like if DMA is working (just recieve anything) and CPU is writing (or reading) the rings at the same time we have a deadlock.
There are the code of this simple test case, maybe some of you will find obvious issues there:
@kas1e You clearly should not overwrite (or maybe even read) a DMA buffer that is being used by the chip. So you should wait for an interrupt from the chip to tell you it's finished with the DMA for a buffer and only then overwrite that buffer without touching the ohers the chip might still use. There should be some synchronisation between the chip and the OS driver so when a buffer is passed to the chip, the driver should not touch it until the chip signals back that it's ready. I guess it signals back via an interrupt but I don't know how the chip knows which buffers are free for DMA but you should not write that until you've got the data from the buffer and won't touch it again. Could it be that the driver passed the pointer to a RX buffer to the OS then set it to be reused but the OS later tried to access the data from it while the chip also tried to receive new data into it?
@kas1e Are you still using the random number 5 for the DMA channel, or are you allocating one with dma.resource now? Or did you ask the X1000 dma.resource author(s) which one might be reserved for the X1000 NIC on AmigaOS? No idea about the X1000, but on most systems with DMA engines you can disable DMA accelerated IExec->CopyMem[Quick]() optimizations with some "os4_commandline" option. Even if that doesn't prevent anything else, SATA or USB or example, to use the same DMA channel you are using, without allocating it, it might help a little bit.
@kas1e I have a question—does your X1000 driver share code with your virtio-net.device driver? The symptoms are identical to those of virtio-net.device—maybe you didn’t notice them before when using the network card in the X1000 in (100Base-TX Full Duplex) mode. If so, it might be easier to diagnose this on QEMU than on the X1000 (it’s a shame to wear out that hardware with constant reboots).
@balaton That if i use interrupts, which i don't for now just going cpu, and i also didn't read any buffers, i didn't read anything it's just dma enabled and read on it's own i only check descriptors and skip it all completely just a do recycle so dma keep continue filling in. And i even now do check the O-bit before touching a descriptor (just a reading of descriptr), so only clear slots DMA has finished writing..
@joerg Quote:
Are you still using the random number 5 for the DMA channel, or are you allocating one with dma.resource now?
For RX interface itself i use 5, because it's hardcoded by hardware, for RX channel i currently use #1 (the one i can see not used in system after i boot by test tool, the only used on after system boots with pasemi_dma.resource is TX channel 0).
Quote:
Or did you ask the X1000 dma.resource author(s) which one might be reserved for the X1000 NIC on AmigaOS?
If you mean interface, then it's RX interface 5 : that written cleary in linux sources, and that what i find my simple test-tool too. But if you mean RX Channel : there it's not hardware limits, i can use any (currently #1)
I now tried just random RX channel #44 : same lockup. Then channel #56: same. So that not it for sure.
Quote:
No idea about the X1000, but on most systems with DMA engines you can disable DMA accelerated IExec->CopyMem[Quick]() optimizations with some "os4_commandline" option. Even if that doesn't prevent anything else, SATA or USB or example, to use the same DMA channel you are using, without allocating it, it might help a little bit.
Will check ..
@smarkusg Quote:
@kas1e I have a question—does your X1000 driver share code with your virtio-net.device driver?
Even not as driver (but this test case configure dma/mac as driver for test purposes)..
@All I think next steps for me will be :
1). install 32bit linux, rebuild myself their driver, and do the same udp-flood, to see how it will be. If all ok, then i can dump all registers states, etc,etc to compare one per one.
Was there ever 32bit linux for x1000, or they all was 64bit from beginning ? (so to be close as possible to us)
2). if linux will be fine (at least it should as i told by those who use it on x1000), then, i can go heavy way - wrote driver (test code for reading udp-flood from PC) for CFE: they do have TFTP minimal driver already, and it kind of works i assume, so i will just reset it in my init code, and go with it . There will be no mmu, nothing, just bare-bare-metal: if it will crash under udp flood, then hardware or bus-level issue if survive then it's something about how AmigaOS sets up memory/MMU/PCI.
Sounds logical ?
Btw, it's interesting what i found now : can be completely not related, but who know: I tested now Derf's Vulkan1.3 examples (they use ogles2.library, which use warp3dnova, which use dma and stuff, of course). Now, i didn't run any of my test codes or drivers, but one of examples while render lockup machine just the same way as with my dma-tests. I reboot, repeat : no lockup. Of course that can be just pure luck and just random, but ..
Edited by kas1e on 2026/3/21 6:25:15 Edited by kas1e on 2026/3/21 6:25:45 Edited by kas1e on 2026/3/21 6:28:46 Edited by kas1e on 2026/3/21 6:30:08 Edited by kas1e on 2026/3/21 6:40:41
Surround the "for(;;)" loop with Forbid()/Permit() (or even Disable()/Enable()).
And see if the debug output stops or not after some time. You will not be able to ctrl-c break (unfortunately "btst #6,$bfe001" may not work ...) the loop so you'll always have to reboot.
@Georg Thanks will try this out today, but for now want to simple make CFE based test case (so no amigaos4, no linux, pure hardware, pure bare-metal CFE almost real mode code). So far have that:
CFE> boot -elf 192.168.0.144:cfe_hello.elf
Loader:elf Filesys:tftp Dev:eth0 File:192.168.0.144:cfe_hello.elf Options:(null)
Loading: 0x0000000000100000/200 0x00000000001000C8/12 0x00000000001000D4/4 Entry at 0x0000000000100000
Closing network.
Starting program at 0x0000000000100000
[RUN!]
WE IN CFE
[EXCP]*** program exit status = 0
CFE>
I.e. can already wrote to uart from CFE, find the address from which code can runs, i.e. all the basics. Now will try to cookie dma test in , and we will see. At least, i can rule out EVERYTHING now.
@kas1e You can test under CFE but may be simpler to port your dma_test.c to Linux and run it there after blacklisting the pasemi driver so nothing touches the network interface. If you now don't overwrite buffers before DMA finished bit is set then it might be something hardware related (testing with Linux should check that) or maybe you can't even read DMA memory while DMA is accessing it (to check ready bit) so you really need to wait for interrupt. Or it could always be some AmigaOS kernel bug as was the case on pegasos2 as well.
@balaton With linux it can be not that easy : as i aware linux didn't support RadeonHD (but maybe just in 3D, but ok in 2D , or maybe some simple framebuffer mode works and that will be enough, will see later) ..
But then, I did working test case for CFE now and test : damn, same lockup ! It mean something completely wrong with my test case, because graphics.library for sure use PA6T dma engine for (just do not know for what), and it didn't lockups that fast.
Next, i will try linux now in hope it will works on RadeonHD at least in some framebuffer mode (CFE worked at least), that the only way to be 100% sure. They surely do something which i didn't... Of course i trying now polling version (without interrupts), but it should work anyway : all i do it's just read descriptor and write minimal to registers, data untouched at all. And i can see how test case under cfe that Ethernet start receiving packets, one by one, and then bah : stop randomly for no reasons and locked up.
@kas1e I don't see why do you need RadeonHD in Linux. As long as you get a command prompt you can run simple test. In fact you should not have full desktop just the kernel booted and even that without network driver to make sure nothing else interferes. (Or if you want to test Linux driver you can do that from command line too.) Did you upload the CFE test case somewhere to have a look? If you can identify the condition where it locks up then maybe you can fix it. If you think it's when receive wraps around how about not trying to recycle buffers but wait until it finishes one round, stop DMA, realloc buffers then start DMA again. That way it should never wrap so if that's when it locks up should not happen.