[Mesa-dev] gallivm: force sse instructions for llvm 3.5+

Submitted by Maarten Lankhorst on Oct. 1, 2014, 2:56 p.m.

Details

Message ID 542C1624.6090000@mblankhorst.nl
State New, archived
Headers show

Not browsing as part of any series.

Commit Message

Maarten Lankhorst Oct. 1, 2014, 2:56 p.m.
This fixes a crash when llvmpipe tries to use sse instructions,
but llvm detects a cpu that doesn't support them.

Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
"LLVM ERROR: Do not know how to split the result of this operator!"

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---

Patch hide | download patch | download mbox

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index 55aa8b9..f2f8906 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -479,10 +479,38 @@  lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
       if (util_cpu_caps.has_f16c) {
          MAttrs.push_back("+f16c");
       }
-      builder.setMAttrs(MAttrs);
    }
 
 #if HAVE_LLVM >= 0x0305
+   /*
+    * llvm 3.5 no longer supports cpuid based autodetect.
+    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
+    * sys::getHostCPUName(), but does support sse2.
+    *
+    * For this reason force the use of sse extensions when available, so our
+    * understanding of the cpu is in sync with llvm's.
+    */
+
+   else if (util_cpu_caps.has_sse4_2)
+      MAttrs.push_back("+sse42");
+   else if (util_cpu_caps.has_sse4_1)
+      MAttrs.push_back("+sse41");
+   else if (util_cpu_caps.has_ssse3)
+      MAttrs.push_back("+ssse3");
+   else if (util_cpu_caps.has_sse3)
+      MAttrs.push_back("+sse3");
+   else if (util_cpu_caps.has_sse2)
+      MAttrs.push_back("+sse2");
+   else if (util_cpu_caps.has_sse)
+      MAttrs.push_back("+sse");
+   else if (util_cpu_caps.has_mmx)
+      MAttrs.push_back("+sse");
+
+   if (util_cpu_caps.has_3dnow_ext)
+      MAttrs.push_back("+3dnowa");
+   else if (util_cpu_caps.has_3dnow)
+      MAttrs.push_back("+3dnow");
+
    StringRef MCPU = llvm::sys::getHostCPUName();
    /*
     * The cpu bits are no longer set automatically, so need to set mcpu manually.
@@ -498,6 +526,7 @@  lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
     */
    builder.setMCPU(MCPU);
 #endif
+   builder.setMAttrs(MAttrs);
 
    ShaderMemoryManager *MM = new ShaderMemoryManager();
    *OutCode = MM->getGeneratedCode();

Comments

On Wed, Oct 1, 2014 at 10:56 AM, Maarten Lankhorst
<maarten@mblankhorst.nl> wrote:
> This fixes a crash when llvmpipe tries to use sse instructions,
> but llvm detects a cpu that doesn't support them.
>
> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
> on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
> "LLVM ERROR: Do not know how to split the result of this operator!"
>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> ---
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 55aa8b9..f2f8906 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -479,10 +479,38 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>        if (util_cpu_caps.has_f16c) {
>           MAttrs.push_back("+f16c");
>        }
> -      builder.setMAttrs(MAttrs);
>     }
>
>  #if HAVE_LLVM >= 0x0305
> +   /*
> +    * llvm 3.5 no longer supports cpuid based autodetect.
> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
> +    * sys::getHostCPUName(), but does support sse2.
> +    *
> +    * For this reason force the use of sse extensions when available, so our
> +    * understanding of the cpu is in sync with llvm's.
> +    */
> +
> +   else if (util_cpu_caps.has_sse4_2)
> +      MAttrs.push_back("+sse42");
> +   else if (util_cpu_caps.has_sse4_1)
> +      MAttrs.push_back("+sse41");
> +   else if (util_cpu_caps.has_ssse3)
> +      MAttrs.push_back("+ssse3");
> +   else if (util_cpu_caps.has_sse3)
> +      MAttrs.push_back("+sse3");
> +   else if (util_cpu_caps.has_sse2)
> +      MAttrs.push_back("+sse2");
> +   else if (util_cpu_caps.has_sse)
> +      MAttrs.push_back("+sse");
> +   else if (util_cpu_caps.has_mmx)
> +      MAttrs.push_back("+sse");

I'm not familiar with LLVM, but if this isn't a typo (i.e. "+mmx"),
this probably deserves a comment. Also does +sse42 imply all the other
"lower" SSE instructions?

> +
> +   if (util_cpu_caps.has_3dnow_ext)
> +      MAttrs.push_back("+3dnowa");
> +   else if (util_cpu_caps.has_3dnow)
> +      MAttrs.push_back("+3dnow");
> +
>     StringRef MCPU = llvm::sys::getHostCPUName();
>     /*
>      * The cpu bits are no longer set automatically, so need to set mcpu manually.
> @@ -498,6 +526,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>      */
>     builder.setMCPU(MCPU);
>  #endif
> +   builder.setMAttrs(MAttrs);
>
>     ShaderMemoryManager *MM = new ShaderMemoryManager();
>     *OutCode = MM->getGeneratedCode();
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Am 01.10.2014 16:56, schrieb Maarten Lankhorst:
> This fixes a crash when llvmpipe tries to use sse instructions,
> but llvm detects a cpu that doesn't support them.
> 
> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
> on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
> "LLVM ERROR: Do not know how to split the result of this operator!"
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> ---
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 55aa8b9..f2f8906 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -479,10 +479,38 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>        if (util_cpu_caps.has_f16c) {
>           MAttrs.push_back("+f16c");
>        }
> -      builder.setMAttrs(MAttrs);
>     }
>  
>  #if HAVE_LLVM >= 0x0305
> +   /*
> +    * llvm 3.5 no longer supports cpuid based autodetect.
> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
> +    * sys::getHostCPUName(), but does support sse2.
> +    *
> +    * For this reason force the use of sse extensions when available, so our
> +    * understanding of the cpu is in sync with llvm's.
> +    */
> +
> +   else if (util_cpu_caps.has_sse4_2)
> +      MAttrs.push_back("+sse42");
> +   else if (util_cpu_caps.has_sse4_1)
> +      MAttrs.push_back("+sse41");
> +   else if (util_cpu_caps.has_ssse3)
> +      MAttrs.push_back("+ssse3");
> +   else if (util_cpu_caps.has_sse3)
> +      MAttrs.push_back("+sse3");
> +   else if (util_cpu_caps.has_sse2)
> +      MAttrs.push_back("+sse2");
> +   else if (util_cpu_caps.has_sse)
> +      MAttrs.push_back("+sse");
> +   else if (util_cpu_caps.has_mmx)
> +      MAttrs.push_back("+sse");
> +
> +   if (util_cpu_caps.has_3dnow_ext)
> +      MAttrs.push_back("+3dnowa");
> +   else if (util_cpu_caps.has_3dnow)
> +      MAttrs.push_back("+3dnow");
> +
>     StringRef MCPU = llvm::sys::getHostCPUName();
>     /*
>      * The cpu bits are no longer set automatically, so need to set mcpu manually.
> @@ -498,6 +526,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>      */
>     builder.setMCPU(MCPU);
>  #endif
> +   builder.setMAttrs(MAttrs);
>  
>     ShaderMemoryManager *MM = new ShaderMemoryManager();
>     *OutCode = MM->getGeneratedCode();
> 

I'm not sure if that's really the right idea, it also misses half the
potential features (avx, f16c, ...).
Maybe using llvm::sys::getHostCPUFeatures() and passing that to mattrs
instead of setting the host cpu name would work?

Roland
Hey,

Op 02-10-14 om 04:22 schreef Roland Scheidegger:
> Am 01.10.2014 16:56, schrieb Maarten Lankhorst:
>> This fixes a crash when llvmpipe tries to use sse instructions,
>> but llvm detects a cpu that doesn't support them.
>>
>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
>> on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>> ---
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> index 55aa8b9..f2f8906 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> @@ -479,10 +479,38 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>        if (util_cpu_caps.has_f16c) {
>>           MAttrs.push_back("+f16c");
>>        }
>> -      builder.setMAttrs(MAttrs);
>>     }
>>  
>>  #if HAVE_LLVM >= 0x0305
>> +   /*
>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
>> +    * sys::getHostCPUName(), but does support sse2.
>> +    *
>> +    * For this reason force the use of sse extensions when available, so our
>> +    * understanding of the cpu is in sync with llvm's.
>> +    */
>> +
>> +   else if (util_cpu_caps.has_sse4_2)
>> +      MAttrs.push_back("+sse42");
>> +   else if (util_cpu_caps.has_sse4_1)
>> +      MAttrs.push_back("+sse41");
>> +   else if (util_cpu_caps.has_ssse3)
>> +      MAttrs.push_back("+ssse3");
>> +   else if (util_cpu_caps.has_sse3)
>> +      MAttrs.push_back("+sse3");
>> +   else if (util_cpu_caps.has_sse2)
>> +      MAttrs.push_back("+sse2");
>> +   else if (util_cpu_caps.has_sse)
>> +      MAttrs.push_back("+sse");
>> +   else if (util_cpu_caps.has_mmx)
>> +      MAttrs.push_back("+sse");
>> +
>> +   if (util_cpu_caps.has_3dnow_ext)
>> +      MAttrs.push_back("+3dnowa");
>> +   else if (util_cpu_caps.has_3dnow)
>> +      MAttrs.push_back("+3dnow");
>> +
>>     StringRef MCPU = llvm::sys::getHostCPUName();
>>     /*
>>      * The cpu bits are no longer set automatically, so need to set mcpu manually.
>> @@ -498,6 +526,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>      */
>>     builder.setMCPU(MCPU);
>>  #endif
>> +   builder.setMAttrs(MAttrs);
>>  
>>     ShaderMemoryManager *MM = new ShaderMemoryManager();
>>     *OutCode = MM->getGeneratedCode();
>>
> I'm not sure if that's really the right idea, it also misses half the
> potential features (avx, f16c, ...).
> Maybe using llvm::sys::getHostCPUFeatures() and passing that to mattrs
> instead of setting the host cpu name would work?
>
I was unaware that getHostCPUFeatures existed, but I think passing both getHostCPUName and getHostCPUFeatures would work.
Unfortunately it seems getHostCPUFeatures only exists on arm right now, but that should be fixable by re-introducing the cpuid stuff in llvm..

~Maarten
Op 02-10-14 om 09:10 schreef Maarten Lankhorst:
> Hey,
>
> Op 02-10-14 om 04:22 schreef Roland Scheidegger:
>> Am 01.10.2014 16:56, schrieb Maarten Lankhorst:
>>> This fixes a crash when llvmpipe tries to use sse instructions,
>>> but llvm detects a cpu that doesn't support them.
>>>
>>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
>>> on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
>>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>> ---
>>>
>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>> index 55aa8b9..f2f8906 100644
>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>> @@ -479,10 +479,38 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>        if (util_cpu_caps.has_f16c) {
>>>           MAttrs.push_back("+f16c");
>>>        }
>>> -      builder.setMAttrs(MAttrs);
>>>     }
>>>  
>>>  #if HAVE_LLVM >= 0x0305
>>> +   /*
>>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>>> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
>>> +    * sys::getHostCPUName(), but does support sse2.
>>> +    *
>>> +    * For this reason force the use of sse extensions when available, so our
>>> +    * understanding of the cpu is in sync with llvm's.
>>> +    */
>>> +
>>> +   else if (util_cpu_caps.has_sse4_2)
>>> +      MAttrs.push_back("+sse42");
>>> +   else if (util_cpu_caps.has_sse4_1)
>>> +      MAttrs.push_back("+sse41");
>>> +   else if (util_cpu_caps.has_ssse3)
>>> +      MAttrs.push_back("+ssse3");
>>> +   else if (util_cpu_caps.has_sse3)
>>> +      MAttrs.push_back("+sse3");
>>> +   else if (util_cpu_caps.has_sse2)
>>> +      MAttrs.push_back("+sse2");
>>> +   else if (util_cpu_caps.has_sse)
>>> +      MAttrs.push_back("+sse");
>>> +   else if (util_cpu_caps.has_mmx)
>>> +      MAttrs.push_back("+sse");
>>> +
>>> +   if (util_cpu_caps.has_3dnow_ext)
>>> +      MAttrs.push_back("+3dnowa");
>>> +   else if (util_cpu_caps.has_3dnow)
>>> +      MAttrs.push_back("+3dnow");
>>> +
>>>     StringRef MCPU = llvm::sys::getHostCPUName();
>>>     /*
>>>      * The cpu bits are no longer set automatically, so need to set mcpu manually.
>>> @@ -498,6 +526,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>      */
>>>     builder.setMCPU(MCPU);
>>>  #endif
>>> +   builder.setMAttrs(MAttrs);
>>>  
>>>     ShaderMemoryManager *MM = new ShaderMemoryManager();
>>>     *OutCode = MM->getGeneratedCode();
>>>
>> I'm not sure if that's really the right idea, it also misses half the
>> potential features (avx, f16c, ...).
>> Maybe using llvm::sys::getHostCPUFeatures() and passing that to mattrs
>> instead of setting the host cpu name would work?
>>
> I was unaware that getHostCPUFeatures existed, but I think passing both getHostCPUName and getHostCPUFeatures would work.
> Unfortunately it seems getHostCPUFeatures only exists on arm right now, but that should be fixable by re-introducing the cpuid stuff in llvm..
Hm no, it would seem it takes a stringmap, probably mapping whether certain features are available or not, and it doesn't create a nice string for passing along.

Regarding imirkin's comment, yes the mmx thing is a copy paste error, and newer versions of SSE/AVX imply the previous ones are supported.
I also don't enable all possible cpu extensions, just the ones used by llvmpipe and gallivm. This is to make sure there is no mismatch between
what llvmpipe expects and what llvm can make of it from the cpu string. I don't really care about the optimal code generation, I would just prefer there are no crashes. :-)

~Maarten
Am 02.10.2014 um 09:31 schrieb Maarten Lankhorst:
> Op 02-10-14 om 09:10 schreef Maarten Lankhorst:
>> Hey,
>>
>> Op 02-10-14 om 04:22 schreef Roland Scheidegger:
>>> Am 01.10.2014 16:56, schrieb Maarten Lankhorst:
>>>> This fixes a crash when llvmpipe tries to use sse instructions,
>>>> but llvm detects a cpu that doesn't support them.
>>>>
>>>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
>>>> on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
>>>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>>>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>> ---
>>>>
>>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>> index 55aa8b9..f2f8906 100644
>>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>> @@ -479,10 +479,38 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>>        if (util_cpu_caps.has_f16c) {
>>>>           MAttrs.push_back("+f16c");
>>>>        }
>>>> -      builder.setMAttrs(MAttrs);
>>>>     }
>>>>  
>>>>  #if HAVE_LLVM >= 0x0305
>>>> +   /*
>>>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>>>> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
>>>> +    * sys::getHostCPUName(), but does support sse2.
>>>> +    *
>>>> +    * For this reason force the use of sse extensions when available, so our
>>>> +    * understanding of the cpu is in sync with llvm's.
>>>> +    */
>>>> +
>>>> +   else if (util_cpu_caps.has_sse4_2)
>>>> +      MAttrs.push_back("+sse42");
>>>> +   else if (util_cpu_caps.has_sse4_1)
>>>> +      MAttrs.push_back("+sse41");
>>>> +   else if (util_cpu_caps.has_ssse3)
>>>> +      MAttrs.push_back("+ssse3");
>>>> +   else if (util_cpu_caps.has_sse3)
>>>> +      MAttrs.push_back("+sse3");
>>>> +   else if (util_cpu_caps.has_sse2)
>>>> +      MAttrs.push_back("+sse2");
>>>> +   else if (util_cpu_caps.has_sse)
>>>> +      MAttrs.push_back("+sse");
>>>> +   else if (util_cpu_caps.has_mmx)
>>>> +      MAttrs.push_back("+sse");
>>>> +
>>>> +   if (util_cpu_caps.has_3dnow_ext)
>>>> +      MAttrs.push_back("+3dnowa");
>>>> +   else if (util_cpu_caps.has_3dnow)
>>>> +      MAttrs.push_back("+3dnow");
>>>> +
>>>>     StringRef MCPU = llvm::sys::getHostCPUName();
>>>>     /*
>>>>      * The cpu bits are no longer set automatically, so need to set mcpu manually.
>>>> @@ -498,6 +526,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>>      */
>>>>     builder.setMCPU(MCPU);
>>>>  #endif
>>>> +   builder.setMAttrs(MAttrs);
>>>>  
>>>>     ShaderMemoryManager *MM = new ShaderMemoryManager();
>>>>     *OutCode = MM->getGeneratedCode();
>>>>
>>> I'm not sure if that's really the right idea, it also misses half the
>>> potential features (avx, f16c, ...).
>>> Maybe using llvm::sys::getHostCPUFeatures() and passing that to mattrs
>>> instead of setting the host cpu name would work?
>>>
>> I was unaware that getHostCPUFeatures existed, but I think passing both getHostCPUName and getHostCPUFeatures would work.
>> Unfortunately it seems getHostCPUFeatures only exists on arm right now, but that should be fixable by re-introducing the cpuid stuff in llvm..
So llvm does no longer have any ability to determine the features on its
own at all? All it can is deduce that from the cpu string (which is
quite fishy, IIRC it actually at some point used a "reduced" cpu string
in case of avx not being available despite the cpu supporting it)? What
a PITA. When this came up last time I was told one should use
llvm::sys::getHostCPUName() and llvm::sys::getHostCPUFeatures() to
determine what llvm thinks it can do, and that was only just back in
April (http://llvm.org/bugs/show_bug.cgi?id=19429).
From a quick look though it seems you are right and
llvm::sys::getHostCPUFeatures() only ever does anything on arm (that's
what I call inconsistent design...).

> Hm no, it would seem it takes a stringmap, probably mapping whether certain features are available or not, and it doesn't create a nice string for passing along.
> 
> Regarding imirkin's comment, yes the mmx thing is a copy paste error, and newer versions of SSE/AVX imply the previous ones are supported.
> I also don't enable all possible cpu extensions, just the ones used by llvmpipe and gallivm.
llvmpipe/gallivm will require both avx and f16c - it will definitely
crash if it thinks they are available but llvm thinks they are not.
Plus, specifying just half the features is a sure recipe for disaster
later when someone decides to emit some more intrinsics...


> This is to make sure there is no mismatch between
> what llvmpipe expects and what llvm can make of it from the cpu string. I don't really care about the optimal code generation, I would just prefer there are no crashes. :-)
I guess specifying all the MAttrs manually is fine then. Like I said
though, I really think we should specify all.

Roland


> 
> ~Maarten
>
On 02/10/14 03:09, Ilia Mirkin wrote:
> On Wed, Oct 1, 2014 at 10:56 AM, Maarten Lankhorst
> <maarten@mblankhorst.nl> wrote:
>> This fixes a crash when llvmpipe tries to use sse instructions,
>> but llvm detects a cpu that doesn't support them.
>>
>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
>> on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>> ---
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> index 55aa8b9..f2f8906 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> @@ -479,10 +479,38 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>         if (util_cpu_caps.has_f16c) {
>>            MAttrs.push_back("+f16c");
>>         }
>> -      builder.setMAttrs(MAttrs);
>>      }
>>
>>   #if HAVE_LLVM >= 0x0305
>> +   /*
>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
>> +    * sys::getHostCPUName(), but does support sse2.
>> +    *
>> +    * For this reason force the use of sse extensions when available, so our
>> +    * understanding of the cpu is in sync with llvm's.
>> +    */
>> +
>> +   else if (util_cpu_caps.has_sse4_2)
>> +      MAttrs.push_back("+sse42");
>> +   else if (util_cpu_caps.has_sse4_1)
>> +      MAttrs.push_back("+sse41");
>> +   else if (util_cpu_caps.has_ssse3)
>> +      MAttrs.push_back("+ssse3");
>> +   else if (util_cpu_caps.has_sse3)
>> +      MAttrs.push_back("+sse3");
>> +   else if (util_cpu_caps.has_sse2)
>> +      MAttrs.push_back("+sse2");
>> +   else if (util_cpu_caps.has_sse)
>> +      MAttrs.push_back("+sse");
>> +   else if (util_cpu_caps.has_mmx)
>> +      MAttrs.push_back("+sse");
>
> I'm not familiar with LLVM, but if this isn't a typo (i.e. "+mmx"),
> this probably deserves a comment. Also does +sse42 imply all the other
> "lower" SSE instructions?

Actually, I think it's better not to enable MMX, 3DNow, or anything that 
operates on XMM registers, as mixing code that operates on SSE and MMX 
registers is complicated and inefficient.

Jose
On 02/10/14 15:04, Roland Scheidegger wrote:
> Am 02.10.2014 um 09:31 schrieb Maarten Lankhorst:
>> Op 02-10-14 om 09:10 schreef Maarten Lankhorst:
>>> Hey,
>>>
>>> Op 02-10-14 om 04:22 schreef Roland Scheidegger:
>>>> Am 01.10.2014 16:56, schrieb Maarten Lankhorst:
>>>>> This fixes a crash when llvmpipe tries to use sse instructions,
>>>>> but llvm detects a cpu that doesn't support them.
>>>>>
>>>>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
>>>>> on i386 when run inside "qemu -cpu qemu32", which would otherwise error with:
>>>>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>>>>
>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>> ---
>>>>>
>>>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>>> index 55aa8b9..f2f8906 100644
>>>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>>> @@ -479,10 +479,38 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>>>         if (util_cpu_caps.has_f16c) {
>>>>>            MAttrs.push_back("+f16c");
>>>>>         }
>>>>> -      builder.setMAttrs(MAttrs);
>>>>>      }
>>>>>
>>>>>   #if HAVE_LLVM >= 0x0305
>>>>> +   /*
>>>>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>>>>> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by llvm's
>>>>> +    * sys::getHostCPUName(), but does support sse2.
>>>>> +    *
>>>>> +    * For this reason force the use of sse extensions when available, so our
>>>>> +    * understanding of the cpu is in sync with llvm's.
>>>>> +    */
>>>>> +
>>>>> +   else if (util_cpu_caps.has_sse4_2)
>>>>> +      MAttrs.push_back("+sse42");
>>>>> +   else if (util_cpu_caps.has_sse4_1)
>>>>> +      MAttrs.push_back("+sse41");
>>>>> +   else if (util_cpu_caps.has_ssse3)
>>>>> +      MAttrs.push_back("+ssse3");
>>>>> +   else if (util_cpu_caps.has_sse3)
>>>>> +      MAttrs.push_back("+sse3");
>>>>> +   else if (util_cpu_caps.has_sse2)
>>>>> +      MAttrs.push_back("+sse2");
>>>>> +   else if (util_cpu_caps.has_sse)
>>>>> +      MAttrs.push_back("+sse");
>>>>> +   else if (util_cpu_caps.has_mmx)
>>>>> +      MAttrs.push_back("+sse");
>>>>> +
>>>>> +   if (util_cpu_caps.has_3dnow_ext)
>>>>> +      MAttrs.push_back("+3dnowa");
>>>>> +   else if (util_cpu_caps.has_3dnow)
>>>>> +      MAttrs.push_back("+3dnow");
>>>>> +
>>>>>      StringRef MCPU = llvm::sys::getHostCPUName();
>>>>>      /*
>>>>>       * The cpu bits are no longer set automatically, so need to set mcpu manually.
>>>>> @@ -498,6 +526,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>>>       */
>>>>>      builder.setMCPU(MCPU);
>>>>>   #endif
>>>>> +   builder.setMAttrs(MAttrs);
>>>>>
>>>>>      ShaderMemoryManager *MM = new ShaderMemoryManager();
>>>>>      *OutCode = MM->getGeneratedCode();
>>>>>
>>>> I'm not sure if that's really the right idea, it also misses half the
>>>> potential features (avx, f16c, ...).
>>>> Maybe using llvm::sys::getHostCPUFeatures() and passing that to mattrs
>>>> instead of setting the host cpu name would work?
>>>>
>>> I was unaware that getHostCPUFeatures existed, but I think passing both getHostCPUName and getHostCPUFeatures would work.
>>> Unfortunately it seems getHostCPUFeatures only exists on arm right now, but that should be fixable by re-introducing the cpuid stuff in llvm..
> So llvm does no longer have any ability to determine the features on its
> own at all? All it can is deduce that from the cpu string (which is
> quite fishy, IIRC it actually at some point used a "reduced" cpu string
> in case of avx not being available despite the cpu supporting it)? What
> a PITA. When this came up last time I was told one should use
> llvm::sys::getHostCPUName() and llvm::sys::getHostCPUFeatures() to
> determine what llvm thinks it can do, and that was only just back in
> April (https://urldefense.proofpoint.com/v1/url?u=http://llvm.org/bugs/show_bug.cgi?id%3D19429&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=R4H96AFXJHWiMF4WgsKxMokaANWv60welwfnJM53KjQ%3D%0A&s=54e942b2ecaf682213b54be9706a4c7e30d5986f4aaf51008822f47e7df72e6c).
>  From a quick look though it seems you are right and
> llvm::sys::getHostCPUFeatures() only ever does anything on arm (that's
> what I call inconsistent design...).

I think that we should avoid relying on cpu name for anything, because 
one can easily run into wierd or unexpected behavior when running old 
LLVM code on new intel CPU architectures.

Jose
On Thu, Oct 2, 2014 at 11:34 AM, Jose Fonseca <jfonseca@vmware.com> wrote:
> On 02/10/14 03:09, Ilia Mirkin wrote:
>>
>> On Wed, Oct 1, 2014 at 10:56 AM, Maarten Lankhorst
>> <maarten@mblankhorst.nl> wrote:
>>>
>>> This fixes a crash when llvmpipe tries to use sse instructions,
>>> but llvm detects a cpu that doesn't support them.
>>>
>>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
>>> on i386 when run inside "qemu -cpu qemu32", which would otherwise error
>>> with:
>>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>> ---
>>>
>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>> index 55aa8b9..f2f8906 100644
>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>> @@ -479,10 +479,38 @@
>>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>         if (util_cpu_caps.has_f16c) {
>>>            MAttrs.push_back("+f16c");
>>>         }
>>> -      builder.setMAttrs(MAttrs);
>>>      }
>>>
>>>   #if HAVE_LLVM >= 0x0305
>>> +   /*
>>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>>> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by
>>> llvm's
>>> +    * sys::getHostCPUName(), but does support sse2.
>>> +    *
>>> +    * For this reason force the use of sse extensions when available, so
>>> our
>>> +    * understanding of the cpu is in sync with llvm's.
>>> +    */
>>> +
>>> +   else if (util_cpu_caps.has_sse4_2)
>>> +      MAttrs.push_back("+sse42");
>>> +   else if (util_cpu_caps.has_sse4_1)
>>> +      MAttrs.push_back("+sse41");
>>> +   else if (util_cpu_caps.has_ssse3)
>>> +      MAttrs.push_back("+ssse3");
>>> +   else if (util_cpu_caps.has_sse3)
>>> +      MAttrs.push_back("+sse3");
>>> +   else if (util_cpu_caps.has_sse2)
>>> +      MAttrs.push_back("+sse2");
>>> +   else if (util_cpu_caps.has_sse)
>>> +      MAttrs.push_back("+sse");
>>> +   else if (util_cpu_caps.has_mmx)
>>> +      MAttrs.push_back("+sse");
>>
>>
>> I'm not familiar with LLVM, but if this isn't a typo (i.e. "+mmx"),
>> this probably deserves a comment. Also does +sse42 imply all the other
>> "lower" SSE instructions?
>
>
> Actually, I think it's better not to enable MMX, 3DNow, or anything that
> operates on XMM registers, as mixing code that operates on SSE and MMX
> registers is complicated and inefficient.

If you made it this far into the else chain, you don't have XMM registers...
Am 02.10.2014 um 17:36 schrieb Jose Fonseca:
> On 02/10/14 15:04, Roland Scheidegger wrote:
>> Am 02.10.2014 um 09:31 schrieb Maarten Lankhorst:
>>> Op 02-10-14 om 09:10 schreef Maarten Lankhorst:
>>>> Hey,
>>>>
>>>> Op 02-10-14 om 04:22 schreef Roland Scheidegger:
>>>>> Am 01.10.2014 16:56, schrieb Maarten Lankhorst:
>>>>>> This fixes a crash when llvmpipe tries to use sse instructions,
>>>>>> but llvm detects a cpu that doesn't support them.
>>>>>>
>>>>>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo
>>>>>> -auto
>>>>>> on i386 when run inside "qemu -cpu qemu32", which would otherwise
>>>>>> error with:
>>>>>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>>>>>
>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>>> ---
>>>>>>
>>>>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>>>> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>>>> index 55aa8b9..f2f8906 100644
>>>>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>>>> @@ -479,10 +479,38 @@
>>>>>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef
>>>>>> *OutJIT,
>>>>>>         if (util_cpu_caps.has_f16c) {
>>>>>>            MAttrs.push_back("+f16c");
>>>>>>         }
>>>>>> -      builder.setMAttrs(MAttrs);
>>>>>>      }
>>>>>>
>>>>>>   #if HAVE_LLVM >= 0x0305
>>>>>> +   /*
>>>>>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>>>>>> +    * This breaks on "qemu -cpu qemu32" which is detected as
>>>>>> pentium2 by llvm's
>>>>>> +    * sys::getHostCPUName(), but does support sse2.
>>>>>> +    *
>>>>>> +    * For this reason force the use of sse extensions when
>>>>>> available, so our
>>>>>> +    * understanding of the cpu is in sync with llvm's.
>>>>>> +    */
>>>>>> +
>>>>>> +   else if (util_cpu_caps.has_sse4_2)
>>>>>> +      MAttrs.push_back("+sse42");
>>>>>> +   else if (util_cpu_caps.has_sse4_1)
>>>>>> +      MAttrs.push_back("+sse41");
>>>>>> +   else if (util_cpu_caps.has_ssse3)
>>>>>> +      MAttrs.push_back("+ssse3");
>>>>>> +   else if (util_cpu_caps.has_sse3)
>>>>>> +      MAttrs.push_back("+sse3");
>>>>>> +   else if (util_cpu_caps.has_sse2)
>>>>>> +      MAttrs.push_back("+sse2");
>>>>>> +   else if (util_cpu_caps.has_sse)
>>>>>> +      MAttrs.push_back("+sse");
>>>>>> +   else if (util_cpu_caps.has_mmx)
>>>>>> +      MAttrs.push_back("+sse");
>>>>>> +
>>>>>> +   if (util_cpu_caps.has_3dnow_ext)
>>>>>> +      MAttrs.push_back("+3dnowa");
>>>>>> +   else if (util_cpu_caps.has_3dnow)
>>>>>> +      MAttrs.push_back("+3dnow");
>>>>>> +
>>>>>>      StringRef MCPU = llvm::sys::getHostCPUName();
>>>>>>      /*
>>>>>>       * The cpu bits are no longer set automatically, so need to
>>>>>> set mcpu manually.
>>>>>> @@ -498,6 +526,7 @@
>>>>>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef
>>>>>> *OutJIT,
>>>>>>       */
>>>>>>      builder.setMCPU(MCPU);
>>>>>>   #endif
>>>>>> +   builder.setMAttrs(MAttrs);
>>>>>>
>>>>>>      ShaderMemoryManager *MM = new ShaderMemoryManager();
>>>>>>      *OutCode = MM->getGeneratedCode();
>>>>>>
>>>>> I'm not sure if that's really the right idea, it also misses half the
>>>>> potential features (avx, f16c, ...).
>>>>> Maybe using llvm::sys::getHostCPUFeatures() and passing that to mattrs
>>>>> instead of setting the host cpu name would work?
>>>>>
>>>> I was unaware that getHostCPUFeatures existed, but I think passing
>>>> both getHostCPUName and getHostCPUFeatures would work.
>>>> Unfortunately it seems getHostCPUFeatures only exists on arm right
>>>> now, but that should be fixable by re-introducing the cpuid stuff in
>>>> llvm..
>> So llvm does no longer have any ability to determine the features on its
>> own at all? All it can is deduce that from the cpu string (which is
>> quite fishy, IIRC it actually at some point used a "reduced" cpu string
>> in case of avx not being available despite the cpu supporting it)? What
>> a PITA. When this came up last time I was told one should use
>> llvm::sys::getHostCPUName() and llvm::sys::getHostCPUFeatures() to
>> determine what llvm thinks it can do, and that was only just back in
>> April
>> (https://urldefense.proofpoint.com/v1/url?u=http://llvm.org/bugs/show_bug.cgi?id%3D19429&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=R4H96AFXJHWiMF4WgsKxMokaANWv60welwfnJM53KjQ%3D%0A&s=54e942b2ecaf682213b54be9706a4c7e30d5986f4aaf51008822f47e7df72e6c).
>>
>>  From a quick look though it seems you are right and
>> llvm::sys::getHostCPUFeatures() only ever does anything on arm (that's
>> what I call inconsistent design...).
> 
> I think that we should avoid relying on cpu name for anything, because
> one can easily run into wierd or unexpected behavior when running old
> LLVM code on new intel CPU architectures.

Well I believe getHostCPUName() actually returned a string always
understood by llvm, based on the actual features. At some point at
least... But it seems this is indeed not really all that useful any
more. That said, cpu name isn't just for features, llvm will do some
other choices based on it, so we probably shouldn't avoid it. Though I
don't know if it's possible to get some string via that mechanism
nowadays which it won't understand then at all...

I was actually missing that at least avx and f16c are still set with
this patch. So this is ok, though I'd still say we should emit the
remaining features too (popcnt, avx2, xop at least).

As for mmx I'm not sure it really makes a difference. Because these
extensions were enabled (at least with some llvm versions) in any case,
even if we didn't explicitly enable them. I think though you are right
there's no point of explicitly enabling them since we don't really want
llvm to use them in any case.

Roland
Am 02.10.2014 um 18:11 schrieb Ilia Mirkin:
> On Thu, Oct 2, 2014 at 11:34 AM, Jose Fonseca <jfonseca@vmware.com> wrote:
>> On 02/10/14 03:09, Ilia Mirkin wrote:
>>>
>>> On Wed, Oct 1, 2014 at 10:56 AM, Maarten Lankhorst
>>> <maarten@mblankhorst.nl> wrote:
>>>>
>>>> This fixes a crash when llvmpipe tries to use sse instructions,
>>>> but llvm detects a cpu that doesn't support them.
>>>>
>>>> Fixes for example piglit/bin/amd_seamless_cubemap_per_texture -fbo -auto
>>>> on i386 when run inside "qemu -cpu qemu32", which would otherwise error
>>>> with:
>>>> "LLVM ERROR: Do not know how to split the result of this operator!"
>>>>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>> ---
>>>>
>>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>> index 55aa8b9..f2f8906 100644
>>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>>>> @@ -479,10 +479,38 @@
>>>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>>>         if (util_cpu_caps.has_f16c) {
>>>>            MAttrs.push_back("+f16c");
>>>>         }
>>>> -      builder.setMAttrs(MAttrs);
>>>>      }
>>>>
>>>>   #if HAVE_LLVM >= 0x0305
>>>> +   /*
>>>> +    * llvm 3.5 no longer supports cpuid based autodetect.
>>>> +    * This breaks on "qemu -cpu qemu32" which is detected as pentium2 by
>>>> llvm's
>>>> +    * sys::getHostCPUName(), but does support sse2.
>>>> +    *
>>>> +    * For this reason force the use of sse extensions when available, so
>>>> our
>>>> +    * understanding of the cpu is in sync with llvm's.
>>>> +    */
>>>> +
>>>> +   else if (util_cpu_caps.has_sse4_2)
>>>> +      MAttrs.push_back("+sse42");
>>>> +   else if (util_cpu_caps.has_sse4_1)
>>>> +      MAttrs.push_back("+sse41");
>>>> +   else if (util_cpu_caps.has_ssse3)
>>>> +      MAttrs.push_back("+ssse3");
>>>> +   else if (util_cpu_caps.has_sse3)
>>>> +      MAttrs.push_back("+sse3");
>>>> +   else if (util_cpu_caps.has_sse2)
>>>> +      MAttrs.push_back("+sse2");
>>>> +   else if (util_cpu_caps.has_sse)
>>>> +      MAttrs.push_back("+sse");
>>>> +   else if (util_cpu_caps.has_mmx)
>>>> +      MAttrs.push_back("+sse");
>>>
>>>
>>> I'm not familiar with LLVM, but if this isn't a typo (i.e. "+mmx"),
>>> this probably deserves a comment. Also does +sse42 imply all the other
>>> "lower" SSE instructions?
>>
>>
>> Actually, I think it's better not to enable MMX, 3DNow, or anything that
>> operates on XMM registers, as mixing code that operates on SSE and MMX
>> registers is complicated and inefficient.
> 
> If you made it this far into the else chain, you don't have XMM registers...
That's true enough (though only for mmx, not the 3dnow stuff)... I don't
know though how useful mmx actually is for our purposes. In theory at
least shouldn't hurt I guess in this case (though it probably should be
"+mmx" and not "+sse"...). I'm not even sure if llvm could actually
split up 4x32bit int vectors into 2x32bit ones for mmx. And certainly we
never use mmx intrinsics no matter what.

Roland