I guess what Hans meant with "Looks like the driver's PEX bridge code needs updating." is bridge code in his Radeon HD/RX drivers, not something in the AmigaOS kernel.
Yes, seems you are right : i just searched in binarys of debug version of RAdeonRX and RAdeonHD drivers, and find out those strings in both of them:
Enabling blind prefetch on the PEX 8112 bridge
Cannot enable blind prefetch for pci:%ld.%ld,%ld, because this device doesn't support it
And the same for 8111.
So seems RadeonRX/HD drivers have some configuration magic when detect usage of 8112 or 8111 bridge, but seeing the results of the gfxbench for 8112 it looks like something wrong there.. at least, how else we can explain such a differences with Pericom's one which had default configuration, and faster much in Copy from RAM to VRAM tests, and 2 times faster in from VRAM to RAM copy tests.
ps. seeing datasheet on Pericom's one, find out this:
Maybe, if default settings for Pericom's bridge isn't good enough, we can speed it up as well.. At least copy from RAM to VRAM is more or less ok already and will be not noticable in everyday's use , but copy from VRAM to RAM need updating for sure, at least to be on some sane level ..
@everyone Maybe anyone have a clue how to check if Debian for pegasos2 do support GART for old Radeon9250 ? What we need to know, is if Linux enable full memory coherence (which need it for GART), and if so, we can check how it done, and then do same to make it works on os4 via bridge too (which then, later, will help also QEMU users as well).
Currently, as of now, Radeon drivers running on peg2 via bridge says:
Quote:
RadeonRX (4): GART: num cpu pages 786432, num gpu pages 786432 (3072 MB RadeonRX (0): Platform doesn't have full memory coherence, Disabling GART.
And then redirecting GART allocation to VRAM.
So we need to know how to enable full memory coherence on pegasos2, so GART will works, and while it will be of no help with ReadPixelArray() or WritePixelArray(), it still will improve things in some areas anyway.
If anyone can point out of how to do it, or, how to check at least if this latest Debian for pegasos2 support full memory coherence for old Radeon9250, that will be helpfull.
@all Just for better understanding that how results of gfxbench looks like for now (for all images hit "open image in new tab" for fullsize):
Copy To/From Ram/Vram:
FillRect:
Blit ones:
Composite ones:
As can be seen, with current state of things (i.e. default and first working version) we can made a conclusion that:
1). Pericom's bridge with PI7C9X111SL chipset works on ~50% better in almost all operations than PLX's one with PEX8112-AA66BI F chipset.
That can be and as cause of the default registers settings and the fact that Pericom's chipset do have in PCI_Status (found by DumpBridge tool) "Fast back-to-back capable" options, which PLX's one didn't have. But that to be seen if it possible to configure both bridges better (from RadeonHD/RX driver's code) to have better results. As of now, it seems Radeons driver can detect at least PLX 8112 bridge, but prefetching didn't set correctly.
2). We do have awful speed of copy from VRAM to RAM back for both bridges. It's just about 20 times (!) slower than with Radeon9250. Yes, Pericom's one again 2 times better, but still too slow.
3). Also we can see, that every small operations, such as 16x16, 32x32, 64x64 and 128x128 are slower than on older Radeon9250. And slower _very_ _very_ much. That was like this for RadeonRX even in compare with RadeonHD.
4). From another side, all operations starting from 256x256 in some cases, but in general starting from 512x512 are much faster, and with each double of size faster a lot. 1024x1024 gives sometime x10 boost in some operations.
That all can explain why currently while for example we do have no problems with VA video playing, or Spencer giving 30-40 FPS on maximum details, we still suffers from small micro-pauses when slide the icons in workbench windowses via scrollbar (read from VRAM to RAM probably play role there).
Another thing, is that currently we don't have GART enabled for pegasos2 too, because as of now, drivers says that "can't enable full memory coherence", so without that GART can't work, but which, if it will work, will give us another boost. Not in WritePixelArray/ReadPixelArray of course (for those configuring of bridges may only help), but in all the other operations.
So ! How to check if full memory coherence is enabled in linux on pegasos2 with Radeon9250 ?:) if we can find out if it possible to have full memory coherence on peg2, we csn have GART for, and it will help QEMU users too
In general is looks like it's faster on the very old R100/R200 Radeon gfx cards than on HD/RX ones, maybe because with those ancient gfx cards other OSes used CPU based accesses instead of GART as well and it was more optimized for that than on the newer gfx cards.
@joerg Yep, looks like this is the problem of RadeonHD/RX card, but at least 40Mib/S, is not 2Mib/S as with bridges :) And that probably not issues of PCI 33mhz usage , but of bridge configuring ? As it just too slow.
And interesting to know, on the other systems (windwos, linux) , copy from VRAM also that awfull slow as we have on our systems ? And same for 16x16, 32x32, 64x64 and 128x128 copies ? I.e. even on windows Radeon9250 will be faster in those operations than RadeonHD/RX ?
The Accelerated Graphics Port (AGP) architecture in a Linux system is designed to provide a high-speed point-to-point channel for attaching a video card to a computer's motherboard, primarily to assist in the acceleration of 3D graphics. Here's an overview of the architecture:
### 1. **AGP Overview**
AGP is a dedicated interface between the video card and the motherboard, allowing the graphics card to directly access the system memory. This is crucial for efficient rendering of graphics as it allows textures and other graphic data to be transferred quickly.
### 2. **Key Components of AGP in Linux**
#### 2.1 AGP Bridge
The AGP bridge is a hardware interface that connects the AGP slot on the motherboard to the system’s northbridge. It manages the communication between the CPU, system memory, and the AGP card.
#### 2.2 Graphics Address Remapping Table (GART)
GART is a crucial feature of AGP that allows the video card to directly access the system memory. It handles the mapping of the AGP address space to the physical address space, enabling the video card to use system memory as texture memory.
#### 2.3 AGP Driver
In Linux, the AGP driver is responsible for initializing and managing the AGP bridge. The main AGP driver is typically located at `/usr/src/linux/drivers/char/agp/`. It interacts with the hardware to set up the AGP bridge, configure the aperture, and manage the GART.
#### 2.4 Kernel Modules
Several kernel modules are involved in AGP support in Linux:
- `agpgart`: This is the core AGP support module.
- `intel-agp`, `nvidia-agp`, etc.: These are chipset-specific AGP drivers that handle specific hardware configurations.
### 3. **AGP Architecture in Linux**
#### 3.1 Initialization
- **Kernel Configuration**: The AGP support is enabled in the kernel configuration (`CONFIG_AGP=y`).
- **Module Loading**: The `agpgart` module and appropriate chipset-specific module (e.g., `intel-agp`) are loaded.
#### 3.2 GART Setup
- **Aperture Configuration**: The AGP driver configures the aperture size and base address.
- **Mapping**: GART maps the AGP address space to the physical system memory, allowing the video card to access large blocks of memory efficiently.
#### 3.3 Memory Management
- **Page Tables**: The AGP driver maintains page tables to keep track of memory mappings.
- **Caching and Coherence**: The driver ensures memory coherence by managing cache consistency between the CPU and the GPU.
#### 3.4 Communication
- **DMA Transfers**: The AGP interface allows Direct Memory Access (DMA) transfers, enabling the video card to read and write system memory without CPU intervention.
- **Interrupt Handling**: The AGP driver handles interrupts generated by the AGP device, facilitating efficient communication between the CPU and the GPU.
### 4. **AGP Driver Components**
#### 4.1 Core AGP Driver (`agpgart`)
- Initializes the AGP bridge.
- Configures the GART.
- Manages memory mappings and aperture.
#### 4.2 Chipset-Specific Drivers
- Handle chipset-specific initialization and configuration.
- Examples include `intel-agp`, `amd-agp`, `nvidia-agp`.
#### 4.3 DRM (Direct Rendering Manager)
- Works in conjunction with the AGP driver to provide direct rendering capabilities.
- Manages GPU resources and provides an interface for user-space applications to interact with the GPU.
### 5. **AGP in Xorg**
- **AGP Support in Xorg**: The Xorg server utilizes AGP for efficient rendering. The Xorg logs will show AGP card detection and initialization details.
- **DRI (Direct Rendering Infrastructure)**: Enables direct access to the graphics hardware under the X Window System in a safe and efficient manner.
### 6. **Typical Workflow**
1. **Kernel Boot**: During the boot process, the kernel initializes the AGP subsystem.
2. **Module Loading**: The `agpgart` and chipset-specific AGP modules are loaded.
3. **GART Configuration**: The AGP driver configures the GART, setting up the aperture and mappings.
4. **Xorg Initialization**: When Xorg starts, it initializes the AGP card and configures the rendering pipeline.
5. **Rendering**: Applications utilize the AGP interface via the Direct Rendering Manager to render graphics efficiently.
### 7. **Troubleshooting**
- **dmesg and Xorg Logs**: Checking these logs can provide insight into any issues during AGP initialization or usage.
- **Kernel Configuration**: Ensuring that the kernel is correctly configured for AGP support is crucial.
- **Module Compatibility**: Ensuring that the correct AGP and chipset-specific modules are loaded.
### Conclusion
AGP architecture in a Linux system is a complex interaction between hardware and software components designed to provide efficient graphics rendering. Understanding the role of each component and how they interact can help in configuring and troubleshooting AGP-related issues.
Here's a diagram to reflect the architecture of AGP in a Linux system:
1. **User Space**:
- **Xorg Server**: The Xorg server is responsible for handling graphical rendering in user space.
- **Direct Rendering Infrastructure (DRI)**: Part of Xorg that allows direct access to the graphics hardware, facilitating efficient rendering.
2. **Kernel Space**:
- **DRM Subsystem**: The Direct Rendering Manager handles GPU resource management and provides an interface for user-space applications to interact with the GPU.
- **AGP Driver (agpgart)**: The core driver that initializes and manages the AGP bridge and GART.
- **Chipset-Specific AGP Drivers**: These drivers handle the initialization and management of AGP for specific chipsets (e.g., `intel-agp`, `amd-agp`, `nvidia-agp`).
- **GART (Graphics Address Remapping Table)**: Handles the mapping of AGP address space to physical memory, allowing the GPU to access system memory.
3. **Hardware**:
- **AGP Bridge**: The hardware interface that connects the AGP slot on the motherboard to the system’s northbridge, managing communication between the CPU, system memory, and the AGP card.
- **System Memory**: The main memory of the computer that the GPU can access via the GART.
- **AGP Graphics Card**: The video card that utilizes the AGP interface for high-speed communication with the system memory (e.g., Radeon 9250).
This diagram provides a visual representation of the interaction between different components involved in AGP architecture in a Linux system, from user space applications down to the hardware level.
Sure, here's a detailed guide on what to look for in each step:
1. **Check Kernel Configuration**:
- **Command**: `grep -i 'gart\|radeon' /boot/config-$(uname -r)`
- **What to Look For**:
- `CONFIG_AGP=y`
- `CONFIG_AGP_AMD64=y`
- `CONFIG_DRM=y`
- `CONFIG_DRM_RADEON=y`
- `CONFIG_GART_IOMMU=y`
- These entries indicate that AGP (which GART is part of) and Radeon support are compiled into the kernel.
2. **Verify Kernel Modules**:
- **Command**: `lsmod | grep 'agp\|radeon'`
- **What to Look For**:
- `agpgart`
- `radeon`
- `drm`
- `drm_kms_helper`
- These modules being loaded indicates that AGP support and the Radeon DRM driver are active.
3. **Inspect dmesg Output**:
- **Command**: `dmesg | grep -i 'gart\|radeon'`
- **What to Look For**:
- Lines indicating the initialization of the AGP bridge:
``` [ 1.234567] agpgart: Detected AMD GART aperture
[ 1.234567] agpgart: AGP aperture is 256M @ 0xd0000000
```
- Lines showing the Radeon driver initialization:
``` [ 2.345678] [drm] radeon kernel modesetting enabled.
[ 2.345678] [drm] GART: num cpu pages 65536, num gpu pages 65536
```
- Look for any errors or warnings related to AGP or Radeon.
4. **Review Xorg Configuration**:
- **Command**: `grep -i 'gart\|radeon' /var/log/Xorg.0.log`
- **What to Look For**:
- Entries indicating the use of AGP:
``` (II) RADEON(0): AGP card detected
(II) RADEON(0): AGP 4x mode enabled
```
- Entries showing successful initialization of the Radeon driver:
``` (II) RADEON(0): Direct rendering enabled
(II) RADEON(0): GART: num cpu pages 65536, num gpu pages 65536
```
5. **Check Radeon DRM Driver Documentation**:
- **Location**: `/usr/src/linux/Documentation/gpu`
- **What to Look For**:
- Look for documentation files such as `radeon.rst`, `drm-mm.rst`, and others that mention GART support and memory coherence.
- Specifically, any notes on enabling AGP GART or specific configuration options.
6. **Test Memory Coherence**:
- This step requires custom code or tools. Typically, you would look for tools that can benchmark memory access patterns to verify coherence.
- **Example tool**: `memtester`
- **Command**: `memtester 1024 5` (where `1024` is the amount of memory to test in MB, and `5` is the number of iterations)
- **What to Look For**: Successful completion without errors indicates good memory coherence.
7. **Refer to Source Code**:
- **Location**: The Radeon driver source code in the Linux kernel, usually found in the `drivers/gpu/drm/radeon/` directory.
- **What to Look For**:
- Look for function calls and definitions related to GART initialization and memory coherence.
- Specifically, functions like `radeon_gart_table_init`, `radeon_gart_bind`, and `radeon_gart_unbind`.
### Example Outputs
Here are examples of what the outputs might look like:
**dmesg Output**
```plaintext
[ 1.234567] agpgart: Detected AMD GART aperture
[ 1.234567] agpgart: AGP aperture is 256M @ 0xd0000000
[ 2.345678] [drm] radeon kernel modesetting enabled.
[ 2.345678] [drm] GART: num cpu pages 65536, num gpu pages 65536
```
**Xorg.0.log**
```plaintext
(II) RADEON(0): AGP card detected
(II) RADEON(0): AGP 4x mode enabled
(II) RADEON(0): Direct rendering enabled
(II) RADEON(0): GART: num cpu pages 65536, num gpu pages 65536
```
By following these steps and looking for these specific entries, you can determine if GART is supported and if full memory coherence is enabled for the Radeon 9250 on Debian for Pegasos II.
Here's an overview of what the code might look like at each stage or level of the AGP architecture in a Linux system:
### 1. **User Space: Xorg Server and Direct Rendering Infrastructure**
In user space, the Xorg server and Direct Rendering Infrastructure (DRI) interact with the AGP driver via ioctl calls and user-space libraries like libdrm.
#### Xorg Server Initialization
```c
// Sample code to initialize the Xorg server and DRI
#include <xf86drm.h>
#include <xf86drmMode.h>
#include <drm.h>
// Open the DRM device
int fd = open("/dev/dri/card0", O_RDWR | O_CLOEXEC);
// Initialize the DRM device
drmVersionPtr version = drmGetVersion(fd);
if (version) {
printf("DRM Version: %s\n", version->name);
drmFreeVersion(version);
}
// Set up DRI
drmSetClientCap(fd, DRM_CLIENT_CAP_UNIVERSAL_PLANES, 1);
drmSetClientCap(fd, DRM_CLIENT_CAP_ATOMIC, 1);
// ... additional setup for Xorg and DRI
```
### 2. **Kernel Space: DRM Subsystem and AGP Driver**
The kernel space involves the Direct Rendering Manager (DRM) subsystem, the AGP driver, and chipset-specific drivers.
In the kernel space, the interaction with hardware is handled by chipset-specific drivers and the core AGP driver. The GART is set up and managed to allow the GPU to access system memory.
This example code provides a simplified view of the various stages and levels involved in the AGP architecture in a Linux system, from user space initialization to kernel space management and hardware interaction.
For engineers, providing additional context, clear explanations, and practical examples can help bridge the gap between high-level concepts and low-level implementation details. Let's refine the explanation to ensure it is comprehensible and actionable for engineers.
### Overview of AGP in Linux
#### 1. What is AGP?
- AGP (Accelerated Graphics Port) is an interface specification designed for the fast transfer of 3D graphics data between the video card and the main memory.
#### 2. Key Components:
- **AGP Bridge**: Hardware that connects the AGP slot to the system’s memory and CPU.
- **GART (Graphics Address Remapping Table)**: Allows the GPU to access system memory.
- **AGP Driver**: Software that manages AGP bridge and GART.
### Steps to Implement AGP Support
#### 1. **User Space: Xorg Server and Direct Rendering Infrastructure**
**Explanation**:
- This code opens the DRM (Direct Rendering Manager) device and initializes it. DRM is responsible for managing graphics resources and enabling direct rendering.
#### 2. **Kernel Space: DRM Subsystem and AGP Driver**
**Explanation**:
- This module defines a simple DRM driver with basic open and release operations. DRM handles GPU resources and enables direct rendering.
**Explanation**:
- This function enables AGP mode by reading the status register, configuring the command register, and writing the appropriate value to enable AGP.
**Explanation**:
- This code sets up an interrupt handler for the AGP device, enabling interrupts and handling them by clearing the interrupt flag.
### Conclusion
These detailed explanations and code examples provide a comprehensive overview of AGP architecture in Linux, from user space initialization to kernel space management and hardware register-level interactions. This should be sufficient for engineers to understand and implement AGP support in a Linux system.
To comprehensively compile and implement AGP support with GART for the Radeon 9250 on OS4, you need to consider several aspects beyond just the code itself. Here’s a detailed list of what you need:
### Development Environment Setup
1. **Toolchain**
- Ensure you have a compatible C/C++ compiler (e.g., GCC or Clang) installed for OS4.
- Set up necessary build tools (e.g., make, cmake).
2. **Development Libraries**
- Install or have access to the required development libraries, such as those for handling PCI configuration space and memory allocation (similar to `libpci` and `libdrm` in Linux).
3. **Kernel Source and Headers**
- Obtain the kernel source code for OS4 to ensure you can compile kernel modules.
- Ensure you have the necessary headers for kernel development.
### Detailed Implementation Steps
#### 1. **Configure AGP Bridge and GART**
**AGP Bridge Configuration:**
- Implement functions to configure the AGP bridge’s aperture size and base address.
```c
#include <os4_pci.h> // Hypothetical header for PCI functions in OS4
// Read the interrupt status register
os4_pci_read_config_dword(pdev, AGP_INTERRUPT_STATUS_REGISTER, &status);
if (status & 0x1) {
// Handle the interrupt (e.g., clear the interrupt flag)
os4_pci_write_config_dword(pdev, AGP_INTERRUPT_STATUS_REGISTER, status);
return IRQ_HANDLED;
}
return IRQ_NONE;
}
int os4_enable_agp_interrupts(struct pci_dev *pdev) {
int irq = pdev->irq;
int ret;
// Request an IRQ and register the interrupt handler
ret = os4_request_irq(irq, os4_agp_interrupt_handler, IRQF_SHARED, "agp_irq", pdev);
if (ret)
return ret;
// Enable interrupts in the AGP device
os4_pci_write_config_dword(pdev, AGP_INTERRUPT_ENABLE_REGISTER, 0x1);
return 0;
}
```
### Testing and Validation
1. **Compile and Load Modules:**
- Compile the AGP and Radeon drivers as kernel modules for OS4.
- Load the modules and check for successful initialization.
2. **Check Logs:**
- Use `dmesg` or OS4-specific logging tools to verify the AGP and Radeon initialization logs.
3. **Run Diagnostic Tools:**
- Create or use existing diagnostic tools to verify GART setup, memory coherence, and overall AGP functionality.
4. **Performance Testing:**
- Test with graphics-intensive applications to ensure proper functioning and performance gains.
5. **QEMU Testing:**
- If applicable, test the implementation in QEMU to ensure compatibility and broader usage.
### Documentation and Support
- **Document the Implementation:**
- Ensure all steps, functions, and configurations are well-documented to help other developers understand and maintain the code.
- **Community Support:**
- Engage with the OS4 and broader developer communities to get feedback, report bugs, and receive support for the implementation.
By following these detailed steps, you can implement and compile comprehensive AGP support with GART for the Radeon 9250 on OS4, ensuring compatibility and performance improvements similar to those on Debian for Pegasos II.
In the Windows version there is no copyTo/FromVRAM as that's not possible on this OS, however Hans may still have some stats on how the same Radeon HD/RX cards on Windows compare to AmigaOS.
Can't run for Radeon R5 M330 , as seems GfxBench choose the first one to test with.
As can be seen, 16x16, 32x32 and 64x64 is not that bad: yes, they not ultra fast, but they faster in 2-3 times than old Radeon9250 ones. In case with our drivers, we have some pretty bad results for those :( And i don't mean pegasos2 there, but in whole, on any platforms (especially on x5000).
Also WritePixelArray() and ReadPixelArary() (dunno through what kind of functions it used on windows, is windows have those?) gives the same results, while ReadPixelArray() should mean reading from VRAM to RAM mainly, so should be slower too ?
Also WritePixelArray() and ReadPixelArary() (dunno through what kind of functions it used on windows, is windows have those?) gives the same results, while ReadPixelArray() should mean reading from VRAM to RAM mainly, so should be slower too ?
I don't know how those functions work in the Windows versions of GfxBench2D, but on AmigaOS it's - Read/WritePixelArray() on supported platforms, like Sam4x0, X1000, X5000 and A1222, use larger (= faster) DMA transfers. - On Platforms without any DMA support (A1 SE/XE/µA1 and Peg2) those functions are basically the same as copyTo/FromVRAM and the CPU does the copies. On CPUs with AltiVec support (A1/Peg2 with G4 CPU, X1000) it should be at least twice as fast as on CPUs without AltiVec (classic Amigas, A1/Peg2 with a G3 CPU, Sam4x0, X5000).
Test failed. Your graphics card drivers may be faulty.
WritePixelArray: 9499.648 MiB/s (took 0.322000 seconds).
Test failed. Your graphics card drivers may be faulty.
ReadPixelArray: 9500.283 MiB/s (took 0.228000 seconds).
Vesa driver with shadowfb disabled:
read ( 20.4 MiB/sec): ShmGetImage 500x500 square
write( 306.1 MiB/sec): ShmPutImage 500x500 square
Of course not as slow as your Radeon RX with bridge results on Pegasos2, but slower than some X1000 and X5000 with Radeon HD or RX gfx card, and the read (copyFromVRAM) even slower than your AGP Radeon9250.
@all As interesting as the GfxBench2D results are, does anyone know if GART is enabled in the Pegasos 2 Linux kernel? The Marvell chipset documentation does mention memory coherency, but it would be great to know if it actually works.
If it works, then the next question would be how to set it up properly...
After some tests, we find that on Pegasos 2 the bus number that the RadeonHD/RX cards is plugged in doesn't match the bus number that's programmed into the bridge. That fixed now in peg2's kernel, and so, the code in RadeonHD/RX for enabling prefetching (radeon drivers have some for 8111 and 8112 plx chips), can detect the chips on pegasos2 too.
Now, we find out that seems that code for enabing prefetching never worked properly : memory regions for making prefetch working is wronly programmed. At least i tested plx bridge on x5000 as well and that what dumpbridge tool says:
Same on pegasos2 now. On x1000 cant test , as it didnt boot when i tried to use plx based bridge in, and on sam460 i have one single pci which is busy with sata card, so cant test if on all platforms memory regions for prefetch is wronly programmed. But on x5k and peg2 that for sure.
So.. next step to fix memory ranges to see if correct prefetching will improve situation on plx based bridge, and then to try to improve things also for Pericom's bridge, as this one looks much better with its default state in compare with plx one.
And, still question remain: is there any prooved to work code for peg2 hardware does not matter on what OS doing full memory coherence so we can see how ? Any help apprecated!