[3/3] amd: Apply elf relocations and allow code with relocations

Submitted by Jan Vesely on June 4, 2019, 2:39 a.m.

Details

Message ID 20190604023917.9194-3-jan.vesely@rutgers.edu
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Jan Vesely June 4, 2019, 2:39 a.m.
Fixes piglits:
	call.cl
	calls-larget-struct.cl
	calls-struct.cl
	calls-workitem-id.cl
	realign-stack.cl
	tail-calls.cl

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
---
The piglit test now pass using llvm-7,8,git.
ImageMagick works on my raven, but some test still fail on
carrizo/iceland.
Other workloads (like shoc) that used function calls also work ok.
ocltoys work after removing static keyword from .cl files.
 src/amd/common/ac_binary.c                | 30 +++++++++++++++++++++++
 src/gallium/drivers/radeonsi/si_compute.c |  6 -----
 2 files changed, 30 insertions(+), 6 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c
index 18dc72c61f0..4d152fcf1be 100644
--- a/src/amd/common/ac_binary.c
+++ b/src/amd/common/ac_binary.c
@@ -178,6 +178,36 @@  bool ac_elf_read(const char *elf_data, unsigned elf_size,
 
 	parse_relocs(elf, relocs, symbols, symbol_sh_link, binary);
 
+	// Apply relocations
+	for (int i = 0; i < binary->reloc_count; ++i) {
+		struct ac_shader_reloc *r = &binary->relocs[i];
+		uint32_t *loc = (uint32_t*)(binary->code + r->offset);
+		/* Section target relocations store symbol offsets as
+		 * values in reloc location. We're expected to adjust it for
+		 * start of the section. However, R_AMDGPU_REL32 are
+		 * PC relative relocations, so we need to recompute the
+		 * delta between reloc locatin and the target adress.
+		 */
+		if (r->target_type == 0x3) { // section relocation
+			uint32_t target_offset = *loc; // already adjusted
+			int64_t diff = target_offset - r->offset;
+			if (r->type == 0xa) { // R_AMDGPU_REL32_LO
+				// address of the 'lo' instruction is 4B below
+				// the relocation point, but the target has
+				// alredy been adjusted.
+				*loc = (diff & 0xffffffff);
+			} else if (r->type == 0xb) { // R_AMDGPU_REL32_HI
+				// 'hi' relocation is 8B above 'lo' relocation
+				*loc = ((diff - 8) >> 32);
+			} else {
+				success = false;
+				fprintf(stderr, "Unsupported section relocation: type: %d, offset: %lx, value: %x\n",
+			                        r->type, r->offset, *loc);
+			}
+		} else
+			success = false;
+	}
+
 	if (elf){
 		elf_end(elf);
 	}
diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c
index b9cea00eeeb..88631369a62 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -246,12 +246,6 @@  static void *si_create_compute_state(
 			const amd_kernel_code_t *code_object =
 				si_compute_get_code_object(program, 0);
 			code_object_to_config(code_object, &program->shader.config);
-			if (program->shader.binary.reloc_count != 0) {
-				fprintf(stderr, "Error: %d unsupported relocations\n",
-					program->shader.binary.reloc_count);
-				FREE(program);
-				return NULL;
-			}
 		} else {
 			si_shader_binary_read_config(&program->shader.binary,
 				     &program->shader.config, 0);

Comments



Am 13.06.2019 07:10, schrieb Marek Olšák:
> FYI, I just pushed the new linker.
> 
> Marek

Thank you very much Marek and _Nicolai_ for this GREAT stuff.
It brings back some speed after 1/8 drop with glmark2, lately.
Maybe my amd-staging-drm-next tree (5.2-rc1) didn't honor the kernel 
mitigation parameter right.

@Jan
Go ahead with your nice relocation and image work.
Send me what you have in the works.

Latest Mesa git (with Nicolai's new linker) let all 3 luxmark versions 
run.
Only 'Hotel lobby' (with v3.0 and v3.1) show some corruption but do NOT 
crash any longer. Numbers for 'Neumann TLM-102 SE' (medium) show ~43000K 
(!!!).

https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1106085-linux-kernel-set-to-expose-hidden-nvidia-hda-controllers-helping-laptop-users?p=1106199#post1106199

Blender crash as expected ;-)

/home/dieter> trying to save userpref at 
/home/dieter/.config/blender/2.79/config/userpref.blend ok
Read blend: /data/Blender/barbershop_interior_gpu.blend
scripts disabled for "/data/Blender/barbershop_interior_gpu.blend", 
skipping 'generate_customprops.py'
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
skipping driver 'var', automatic scripts are disabled
Device init success
Compiling OpenCL program split
Kernel compilation of split finished in 8.41s.

Compiling OpenCL program base
Kernel compilation of base finished in 4.55s.

Compiling OpenCL program denoising
Kernel compilation of denoising finished in 2.08s.

blender: ../src/gallium/drivers/radeonsi/si_compute.c:319: 
si_set_global_binding: Assertion `first + n <= MAX_GLOBAL_BUFFERS' 
failed.

[1]    Abbruch                       blender (core dumped)

Gretings,
Dieter

> On Mon, Jun 3, 2019 at 10:39 PM Jan Vesely <jan.vesely@rutgers.edu>
> wrote:
> 
>> Fixes piglits:
>> call.cl [1]
>> calls-larget-struct.cl [2]
>> calls-struct.cl [3]
>> calls-workitem-id.cl [4]
>> realign-stack.cl [5]
>> tail-calls.cl [6]
>> 
>> Cc: mesa-stable@lists.freedesktop.org
>> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
>> ---
>> The piglit test now pass using llvm-7,8,git.
>> ImageMagick works on my raven, but some test still fail on
>> carrizo/iceland.
>> Other workloads (like shoc) that used function calls also work ok.
>> ocltoys work after removing static keyword from .cl files.
>> src/amd/common/ac_binary.c                | 30
>> +++++++++++++++++++++++
>> src/gallium/drivers/radeonsi/si_compute.c |  6 -----
>> 2 files changed, 30 insertions(+), 6 deletions(-)
>> 
>> diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c
>> index 18dc72c61f0..4d152fcf1be 100644
>> --- a/src/amd/common/ac_binary.c
>> +++ b/src/amd/common/ac_binary.c
>> @@ -178,6 +178,36 @@ bool ac_elf_read(const char *elf_data, unsigned
>> elf_size,
>> 
>> parse_relocs(elf, relocs, symbols, symbol_sh_link, binary);
>> 
>> +       // Apply relocations
>> +       for (int i = 0; i < binary->reloc_count; ++i) {
>> +               struct ac_shader_reloc *r = &binary->relocs[i];
>> +               uint32_t *loc = (uint32_t*)(binary->code +
>> r->offset);
>> +               /* Section target relocations store symbol offsets
>> as
>> +                * values in reloc location. We're expected to
>> adjust it for
>> +                * start of the section. However, R_AMDGPU_REL32 are
>> +                * PC relative relocations, so we need to recompute
>> the
>> +                * delta between reloc locatin and the target
>> adress.
>> +                */
>> +               if (r->target_type == 0x3) { // section relocation
>> +                       uint32_t target_offset = *loc; // already
>> adjusted
>> +                       int64_t diff = target_offset - r->offset;
>> +                       if (r->type == 0xa) { // R_AMDGPU_REL32_LO
>> +                               // address of the 'lo' instruction
>> is 4B below
>> +                               // the relocation point, but the
>> target has
>> +                               // alredy been adjusted.
>> +                               *loc = (diff & 0xffffffff);
>> +                       } else if (r->type == 0xb) { //
>> R_AMDGPU_REL32_HI
>> +                               // 'hi' relocation is 8B above 'lo'
>> relocation
>> +                               *loc = ((diff - 8) >> 32);
>> +                       } else {
>> +                               success = false;
>> +                               fprintf(stderr, "Unsupported section
>> relocation: type: %d, offset: %lx, value: %x\n",
>> +                                               r->type, r->offset,
>> *loc);
>> +                       }
>> +               } else
>> +                       success = false;
>> +       }
>> +
>> if (elf){
>> elf_end(elf);
>> }
>> diff --git a/src/gallium/drivers/radeonsi/si_compute.c
>> b/src/gallium/drivers/radeonsi/si_compute.c
>> index b9cea00eeeb..88631369a62 100644
>> --- a/src/gallium/drivers/radeonsi/si_compute.c
>> +++ b/src/gallium/drivers/radeonsi/si_compute.c
>> @@ -246,12 +246,6 @@ static void *si_create_compute_state(
>> const amd_kernel_code_t *code_object =
>> si_compute_get_code_object(program,
>> 0);
>> code_object_to_config(code_object,
>> &program->shader.config);
>> -                       if (program->shader.binary.reloc_count != 0)
>> {
>> -                               fprintf(stderr, "Error: %d
>> unsupported relocations\n",
>> -
>> program->shader.binary.reloc_count);
>> -                               FREE(program);
>> -                               return NULL;
>> -                       }
>> } else {
>> 
>> si_shader_binary_read_config(&program->shader.binary,
>> &program->shader.config, 0);
>> --
>> 2.21.0
>> 
>> _______________________________________________
>> mesa-stable mailing list
>> mesa-stable@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
> 
> 
> Links:
> ------
> [1] http://call.cl
> [2] http://calls-larget-struct.cl
> [3] http://calls-struct.cl
> [4] http://calls-workitem-id.cl
> [5] http://realign-stack.cl
> [6] http://tail-calls.cl
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Am 14.06.2019 08:13, schrieb Jan Vesely:
> On Thu, 2019-06-13 at 21:20 +0200, Dieter Nützel wrote:
>> Am 13.06.2019 07:10, schrieb Marek Olšák:
>> > FYI, I just pushed the new linker.
>> >
>> > Marek
>> 
>> Thank you very much Marek and _Nicolai_ for this GREAT stuff.
>> It brings back some speed after 1/8 drop with glmark2, lately.
>> Maybe my amd-staging-drm-next tree (5.2-rc1) didn't honor the kernel
>> mitigation parameter right.
>> 
>> @Jan
>> Go ahead with your nice relocation and image work.
>> Send me what you have in the works.
> 
> The relocation work is no longer needed as the new linker handles
> things.
> The corruption is caused either by (still faulty) conversion builtins,
> or incorrect buffer coherence handling. Both need fixing, but I'm not
> sure which one is to blame in this case.
> 
>> 
>> Latest Mesa git (with Nicolai's new linker) let all 3 luxmark versions
>> run.
>> Only 'Hotel lobby' (with v3.0 and v3.1) show some corruption but do 
>> NOT
>> crash any longer. Numbers for 'Neumann TLM-102 SE' (medium) show 
>> ~43000K
>> (!!!).
>> 
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.phoronix.com%2Fforums%2Fforum%2Fphoronix%2Flatest-phoronix-articles%2F1106085-linux-kernel-set-to-expose-hidden-nvidia-hda-controllers-helping-laptop-users%3Fp%3D1106199%23post1106199&amp;data=02%7C01%7Cjan.vesely%40cs.rutgers.edu%7Cae4545df023e4910433c08d6f03438a8%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636960504419864592&amp;sdata=xSOotxsWyJDb2J14lNk1NV4bK2nRK3%2FzWoxNyRj6IqU%3D&amp;reserved=0
>> 
>> Blender crash as expected ;-)
>> 
>> /home/dieter> trying to save userpref at
>> /home/dieter/.config/blender/2.79/config/userpref.blend ok
>> Read blend: /data/Blender/barbershop_interior_gpu.blend
>> scripts disabled for "/data/Blender/barbershop_interior_gpu.blend",
>> skipping 'generate_customprops.py'
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> skipping driver 'var', automatic scripts are disabled
>> Device init success
>> Compiling OpenCL program split
>> Kernel compilation of split finished in 8.41s.
>> 
>> Compiling OpenCL program base
>> Kernel compilation of base finished in 4.55s.
>> 
>> Compiling OpenCL program denoising
>> Kernel compilation of denoising finished in 2.08s.
>> 
>> blender: ../src/gallium/drivers/radeonsi/si_compute.c:319:
>> si_set_global_binding: Assertion `first + n <= MAX_GLOBAL_BUFFERS'
>> failed.
>> 
>> [1]    Abbruch                       blender (core dumped)
> 
> The number of max global buffers was bumped in 06bf56725d to fix
> similar crash in luxmark. I guess it needs another bump.

Hello Jan,

I'm so blind...
...bumping it 48 and 64 (first try) works. 33 not ;-)
We shouldn't waste to much memory.
Now, let's start with the libclc work.
Luxmark 'Hotel' is very blocky and Blender 'barbershop_interior_gpu' 
mostly black. I have some images.

Shouldn't we better open a new ticket. Any hints for a good name?
Or do we have one already? I can put my pictures, there.
Simpler scenes work, but mostly gray (without colors/texture).

Dieter