And the point is there is no conditions. Pure random in time. Does not matter what i try can happens fast, can happens not so fast. All i can say for now is:
Sustained concurrent DMA writes + CPU access to ring memory causes a hard internal bus deadlock, with no exception, no error register, no info in debug registers (all clean), nothing. The CPU completely dies — not stuck in a loop, not in an exception handler. Three independent watchdogs (VERTB, CPU timer, IRQ counters) all stop simultaneously , serial also becomes inaccessible, bus lockup.
In test case interrupts not involved, amigaos4/roadshow not involved, same from pure CFE. Descriptor parsing is not involved — blind recycling (no data reading, no validation) locks up identically. DMA channel number doesn't matter(1, 44, 56 were tried, all same). TX channels not invovled, just enough RX one.
But what for sure is that continuous DMA + CPU ring access is REQUIRED to crash - when DMA fills the ring and stops (no recycling), no crash. When CPU doesn't touch the ring (NO_POLL), no crash. Only when both run simultaneously does it lock up.
Only that fewer packets per second = longer before lockup, but it can be just luck and random.
Quote:
I don't see why do you need RadeonHD in Linux. As long as you get a command prompt you can run simple test. In fact you should not have full desktop just the kernel booted and even that without network driver to make sure nothing else interferes. (Or if you want to test Linux driver you can do that from command line too.)
Yes, just framebuffer and command prompt is enough. I just fighting now to find any distro which i boot up from usb on x1000. And all i will do, is setup network , and then do udp-flood from pc as i do now for aos4 version and for CFE version. Will it die the same or not will clear a lot. If it will survive i can just debug it till death and simple mimic the same.
You have 3 different indexes (idx, buf_idx, rx_next_buf).
Sure this is all correct "logic"? That you don't end up having two (or more) identical rx_buf_phys pointers that are active (~dma'ing) in the rx_buf_ring at the same time?
@Georg The indices are confusing because of the different naming: it's two variables for one thing.. buf_idx and br_idx always same value because NUM_RX_BUFS == RX_BUF_RING_SIZE == 2048
@Georg Oh, missed that, will check later, but what i want to say now:
@All i installed some "Void" ppc linux on x1000, and run udpflood on the onboard network, over 1gb link : millions packets were handled (i can see by "ip -s link show" that they are recieved and counted) and no lockup !
So, it indeed works under linux. For today that for sure enough : it proved that hardware CAN work, mean that i dind't do something. Ok for amigaos4 can be all sort of bugs with roadshow and kernel, but CFE version lockup mean my own issue, it should work if it work under linux. I just need to find what they do differently, how they handle it all.
And the point is there is no conditions. Pure random in time. Does not matter what i try can happens fast, can happens not so fast. All i can say for now is:
Sustained concurrent DMA writes + CPU access to ring memory causes a hard internal bus deadlock, ... But what for sure is that continuous DMA + CPU ring access is REQUIRED to crash - when DMA fills the ring and stops (no recycling), no crash. When CPU doesn't touch the ring (NO_POLL), no crash. Only when both run simultaneously does it lock up.
There you have it. As I said before it's likely that you can't touch memory that is used by DMA. Looks like you can't even read it. (That makes sense because if network chip is accessing some memory, the CPU can't access the same area. I don't know if it's just the same address or a whole page.) So in your driver you must make sure that if you pass a pointer to the network interface for DMA nothing will touch that buffer until the DMA is finished so you will have to manage these buffers to avoid concurrent use by the network chip and CPU. This seems to explain the test case as you just run DMA then concurently read buffers by CPU if you're lucky both are the same speed so they are racing around until CPU read reaches the buffer that is being DMA'd into. So you have to keep off of buffers that are active in DMA and only replace it when either DMA has finished or stopped to not try to simlutaneously access the same buffer by card and DMA. This is what I said before that you need some kind of synchronisation between the network chip and CPU access and I think that's what the interrupt is for. The chip will raise an interrupt when it finished DMA and only then you can touch the buffer from the CPU. If the chip is continuing DMA it might wrap around before you finish accessing the buffer so either stop DMA until that or have another buffer ready and quickly replace the pointer in the ring so DMA will use new buffer.
1). installed linux and realized that network driver inbuild into kernel, so, i had to build my own kernel with network driver build as module, so i can insmod/lsmod/rmmod it in realtime, and do same with my test ones.
2). build myself the stock linux driver for to confirm that it 100% didn't lockup, the same UDP flood for many many minutes didn't kill it, everything fine on 1G link.
3). created my own test-case driver, which just do basic init, DON'T USE IRQ AT ALL, only polling, no TX channel at all, just simple let dma do the stuff and only process recycling, but, and there is big but : by the linux functions: dma_map_single, dma_alloc_coherent, dev_alloc_skb, i.e. stuff which handle IOMMU translation, cache coherency, and memory allocation..
So this version also working, no lockup, no freezes, and i can flood as much as i want without problems.
Now i will simple step by step replace functions on my own inlines, till found what wrong. Will be surprise if it will work anyway !
but : by the linux functions: dma_map_single, dma_alloc_coherent, dev_alloc_skb, i.e. stuff which handle IOMMU translation, cache coherency, and memory allocation..
Btw, your allocations with AllocVecTags(). You use "AVT_Alignment, 64". Docs say there's also "AVT_PhysicalAlignemnt" ("mainly used for DMA Drivers").
How does this work later if doing IMMU->SetMemoryAttrs on memory which is only 64 byte aligned, if MMU works per-page? What if someone else has memory in same page and changes memory attributes, too?
Maybe try to change your driver (not "dma_test.c" before checking the confusing-index-thing I mentioned) to use allocations with page size alignments.
Yup, that is also my take away from doing DMA in the USB drivers. You must align your DMA buffers to MMU cache line size. If input buffers are not aligned you need to bounce-buffer it (at least partially).
And also, yes, do not access memory setup for DMA. It *will* get you into trouble. Setting up for DMA flushes/invalidates CPU caches, and if you access the memory afterwards from the CPU you undo that preparation, leading to the DMA either using values you didn't expect (the values you expect are in the CPU cache still), or read back data you didn't expect (you're reading data from the CPU cache, not what DMA placed into the memory).
@Georg, graff Thanks will have a note on this one later, but for now i seems find what was wrong, but before some users will confirm it is working now i will hold with bananas and explains what it was :) But what for sure is that for now i for 4 hours download stuff on 1GB link all around: from internet, from local network, doing stress tests with tcp, etc. Yes, i can't get full 1GB download even from local network, but i have 27MB/s, which while is about ~22-23% of 1GB speed, and which i think i know how to improve, but it also RoadShow, our CPU, etc,etc .. But first thing first: only if it will works now for mightily almighty boing balls !
Replying to myself, while the network still works, (writting here from the X1000), i've had a stalled download, from debian homepage. Edit: having restarted the download as a new file, again, for now it's downloading.No lockups so far.(fingers crossed). Edit2: confirmed network stalls from some downloads, yet i can browse without issue. Edit 3: aaand, finally locked up. I went on to do something else and when i returned it was locked.
@Ami603 Can be Odyssey acts or debian's site ? (I hope). Worth trying same stalled files with wget just to confirm. If there will be issue we will need some pattern, but that for later. My x1000 can lockup even without network so how can we check if it driver or anything else now (after such a long online?).
Imho worth to check without Odyssey for example (wget many instances in parallel, etc)..
Hi @Kas1e, the previous version caused a complete system freeze immediately upon starting a 600MB file transfer via FTP. I am happy to report that the latest version has resolved this issue and is performing very well.
I conducted a performance comparison between the pa6_eth driver and the rtl8169 card using the PFTP utility to transfer a 600MB file. The results are as follows: Performance Comparison
rtl8169 Average Speed: 21M CPS - Peak Speed: 24M CPS pa6_eth Average Speed: 36M CPS - Peak Speed: 39M CPS
I also performed these two tests: a few minutes of casual browsing with IBrowse and exploring remote folders mounted via smb2fs; both felt much more responsive.
Not all over the ways, I did some speed tests using Curl, browsing with Oddyssey, files transfer over network to my NAS/MAC.
I remark that during transfer, my dock did not refresh. Also, If I download something on my other machine, I can see the traffic on the PA6TETH in the net dockie!
I remark that during transfer, my dock did not refresh.
You mean dock not working ? And that only with pa6t driver, but work with realtek drivers ?
Quote:
Also, If I download something on my other machine, I can see the traffic on the PA6TETH in the net dockie!
Can you explain more plz: you mean local network and driver monitor whole network traffic like arp and co ? And in this situation dock works ? Cant it be that you mess with docks settings ?
Anyway can you share what dock you use and with what params so i can play with