Jorik

Very brief

Jorik is about using graphics processing units in general purpose (GPGPU) computations: parallel sorting, dynamic PDE and image processing. Jorik is written on C++/DirectX/HSLS, supports both ATI and nVidia cards. It suits for not-gaming GPU benchmarking.

Description

Jorik is a collection of small programs that use enormous computation power of modern programmable GPUs (graphics processing units) of nVidia or ATI for not-graphical, general purpose (GP) computations through DirectX interface and HLSL (high level shading language). The programs rely on Shader Model 2.0, therefore they can be executed on most existing graphical cards starting from ATI Radeon 9xxx and nVidia GeForce FX series.

The project has as its objects three main things.

  1. Jorik provides compact and easy examples of using GPU for novice programmers already familiar with C++ language. In this sense Jorik is much simpler than libsh and does not require additional precompiler like in BrookGPU.
  2. Some Jorik programs can be used for benchmarking of GPUs in not too syntactic tasks (unlike repetitious execution of single assembler instruction of GPUBench). More importantly is that Jorik benchmarks can solve the same tasks with traditional means using CPU only and thus provide information of achievable acceleration on GPU against CPU.
  3. Jorik contains valuable implementation of hardware optimized CLUT color correction and bitonic sort algorithm. On modern GPUs the bitonic sort is even faster than ubiquitous quick sort on most expensive CPUs. The Jorik bisort implementation unlike GPUSort supports both ATI and nVidia hardware and arrays up to 16 million 32-bit float numbers, and even more theoretically if someone starts selling videocards capable of 4096x4096 textures and more than 512Mb memory.

Benchmarks

copy – measures time required to copy arrays from system memory to video memory and backward. This latency is critical for parallel GPGPU application. However copying (especially backward copying) is not very essential thing in 3D games, that is why some vendors seem to not optimize their drivers for the operation.

bisort2 – sorts long array with real 32-bit numbers using parallel bitonic algorithm on GPU and traditional quicksort algorithm on CPU, then compares results. GPU version supports various texture formats with 1, 2, 4 float components per pixel.

wave – solves simple dynamic PDE d2u/dtt - c*c*(d2u/dxx + d2u/dyy) = 0, representing acoustic waves spreading using both GPU and CPU. An explicit finite-difference scheme is used. This test shows GPUs from the best perspective.

Known benchmark results.

Other educational programs

color_correct - corrects an image using color look-up table (CLUT). The program demonstrates usage of volume textures and linear filtering. Problem statement is given here.

filter – performs simple image processing: a) blurring with kernel (1 2 1), b) median filtering with radius 1 or 2. The latter scenario is hard for some video cards and may lead them to hanging.

sum – summarizes colors of all pixels of an image, demonstrating how reduction is organized on GPU.

bisort – simplified version of bisort2, which can work only with 1D textures, thus maximum size of array is limited to 2048*4 or 4096*4 elements.

sort2d – alternative sorter of data stored in 2D-textures. It uses combination of bitonic sort for individual rows and shear sort for colums. sort2d works much slower than bisort2.

texcoord/texcoord2 are simplest examples demonstrating interpolation of texture coordinates as they seen from pixel shaders.

Software requirements

Windows 2000/XP + DirectX 9 for running programs.
Visual Studio 2003 + DirectX SDK for rebuilding.

Sources and binaries download.

© Fedor Chelnokov, 2006
mailto: fchs (at) patchmaker.net