[Mesa-dev,1/5] clover/memory: Copy data when creating buffers with CL_MEM_USE_HOST_PTR

Submitted by Aaron Watry on July 22, 2017, 4:19 a.m.

Details

Message ID 20170722041951.14304-2-awatry@gmail.com
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Aaron Watry July 22, 2017, 4:19 a.m.
Fixes: OpenCL CTS test/conformance/buffers/buffer_copy

Signed-off-by: Aaron Watry <awatry@gmail.com>
CC: Francisco Jerez <currojerez@riseup.net>
---
 src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
index b852e6896f..912d74830a 100644
--- a/src/gallium/state_trackers/clover/core/memory.cpp
+++ b/src/gallium/state_trackers/clover/core/memory.cpp
@@ -30,7 +30,7 @@  memory_obj::memory_obj(clover::context &ctx, cl_mem_flags flags,
                        size_t size, void *host_ptr) :
    context(ctx), _flags(flags),
    _size(size), _host_ptr(host_ptr) {
-   if (flags & CL_MEM_COPY_HOST_PTR)
+   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
       data.append((char *)host_ptr, size);
 }
 

Comments

On Fri, 2017-07-21 at 23:19 -0500, Aaron Watry wrote:
> Fixes: OpenCL CTS test/conformance/buffers/buffer_copy

Similar patch was pushed in 2013:
56647c5d8f8e60269f0a3277e3caa7ee57d1fe6a
"clover: Append buffers that use CL_MEM_USE_HOST_PTR."

Grigory(added to cc) reverted the change and implemented user_ptr
mechanism in:
f972b223c4cb4ec58a9451cbac5d120ac9deb336
"clover: try userptr for CL_MEM_USE_HOST_PTR"

The buffer-flags piglit still passes, but it maps even CL_USE_HOST_PTR
buffers. I'm not sure what the CTS does, we might need a
synchronization point after kernels finish execution. I couldn't find
the relevant part of specs that would define accessing
CL_MEM_USE_HOST_PTR buffers without 
clEnqueueMapBuffer or clEnqueueReadBuffer


Jan


> 
> Signed-off-by: Aaron Watry <awatry@gmail.com>
> CC: Francisco Jerez <currojerez@riseup.net>
> ---
>  src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
> index b852e6896f..912d74830a 100644
> --- a/src/gallium/state_trackers/clover/core/memory.cpp
> +++ b/src/gallium/state_trackers/clover/core/memory.cpp
> @@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context &ctx, cl_mem_flags flags,
>                         size_t size, void *host_ptr) :
>     context(ctx), _flags(flags),
>     _size(size), _host_ptr(host_ptr) {
> -   if (flags & CL_MEM_COPY_HOST_PTR)
> +   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
>        data.append((char *)host_ptr, size);
>  }
>
/

On Sat, Jul 22, 2017 at 2:27 PM, Jan Vesely <jan.vesely@rutgers.edu> wrote:
> On Fri, 2017-07-21 at 23:19 -0500, Aaron Watry wrote:
>> Fixes: OpenCL CTS test/conformance/buffers/buffer_copy
>
> Similar patch was pushed in 2013:
> 56647c5d8f8e60269f0a3277e3caa7ee57d1fe6a
> "clover: Append buffers that use CL_MEM_USE_HOST_PTR."
>
> Grigory(added to cc) reverted the change and implemented user_ptr
> mechanism in:
> f972b223c4cb4ec58a9451cbac5d120ac9deb336
> "clover: try userptr for CL_MEM_USE_HOST_PTR"
>
> The buffer-flags piglit still passes, but it maps even CL_USE_HOST_PTR
> buffers. I'm not sure what the CTS does, we might need a
> synchronization point after kernels finish execution. I couldn't find
> the relevant part of specs that would define accessing
> CL_MEM_USE_HOST_PTR buffers without
> clEnqueueMapBuffer or clEnqueueReadBuffer
>

I got rid of this patch in my tree and then started digging into the
user_ptr stuff.

It looks like the issue stems from alignment restrictions.

In clover/core/resource.cpp, there's a call to:
  dev.pipe->resource_from_user_memory(dev.pipe, &info, obj.host_ptr())

That is failing for me.

The CTS allocates memory that is aligned to the value returned by
clGetDeviceInfo (device,CL_DEVICE_MEM_BASE_ADDR_ALIGN, ...) converted
to bytes.

Clover's api/device.cpp  has that hard-coded to 1024 bits/128 bytes at
the moment.

If I change the device info query for the mem_base_addr_align to
forward a call to:
  pipe->get_param(pipe, PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT)

And also change si_pipe.c:si_get_param's switch statement value to return:
  case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
    return sscreen->b.info.gart_page_size;

Then I can successfully create buffers from user pointers on my SI card.

I'm a bit fuzzy on what alignment restrictions exist for SI/GCN cards,
but the winsys seems to indicate we should align things to gart page
size, which makes sense on the surface at least.

If the alignment restrictions have changed between R600 and GCN, that
might explain why what's broken for me is working for you/Grigori (on
r600).

--Aaron

>
> Jan
>
>
>>
>> Signed-off-by: Aaron Watry <awatry@gmail.com>
>> CC: Francisco Jerez <currojerez@riseup.net>
>> ---
>>  src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
>> index b852e6896f..912d74830a 100644
>> --- a/src/gallium/state_trackers/clover/core/memory.cpp
>> +++ b/src/gallium/state_trackers/clover/core/memory.cpp
>> @@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context &ctx, cl_mem_flags flags,
>>                         size_t size, void *host_ptr) :
>>     context(ctx), _flags(flags),
>>     _size(size), _host_ptr(host_ptr) {
>> -   if (flags & CL_MEM_COPY_HOST_PTR)
>> +   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
>>        data.append((char *)host_ptr, size);
>>  }
>>
>
> --
> Jan Vesely <jan.vesely@rutgers.edu>
On Thu, 2017-08-03 at 00:28 -0500, Aaron Watry wrote:
> /
> 
> On Sat, Jul 22, 2017 at 2:27 PM, Jan Vesely <jan.vesely@rutgers.edu> wrote:
> > On Fri, 2017-07-21 at 23:19 -0500, Aaron Watry wrote:
> > > Fixes: OpenCL CTS test/conformance/buffers/buffer_copy
> > 
> > Similar patch was pushed in 2013:
> > 56647c5d8f8e60269f0a3277e3caa7ee57d1fe6a
> > "clover: Append buffers that use CL_MEM_USE_HOST_PTR."
> > 
> > Grigory(added to cc) reverted the change and implemented user_ptr
> > mechanism in:
> > f972b223c4cb4ec58a9451cbac5d120ac9deb336
> > "clover: try userptr for CL_MEM_USE_HOST_PTR"
> > 
> > The buffer-flags piglit still passes, but it maps even CL_USE_HOST_PTR
> > buffers. I'm not sure what the CTS does, we might need a
> > synchronization point after kernels finish execution. I couldn't find
> > the relevant part of specs that would define accessing
> > CL_MEM_USE_HOST_PTR buffers without
> > clEnqueueMapBuffer or clEnqueueReadBuffer
> > 
> 
> I got rid of this patch in my tree and then started digging into the
> user_ptr stuff.
> 
> It looks like the issue stems from alignment restrictions.
> 
> In clover/core/resource.cpp, there's a call to:
>   dev.pipe->resource_from_user_memory(dev.pipe, &info, obj.host_ptr())
> 
> That is failing for me.
> 
> The CTS allocates memory that is aligned to the value returned by
> clGetDeviceInfo (device,CL_DEVICE_MEM_BASE_ADDR_ALIGN, ...) converted
> to bytes.
> 
> Clover's api/device.cpp  has that hard-coded to 1024 bits/128 bytes at
> the moment.
> 
> If I change the device info query for the mem_base_addr_align to
> forward a call to:
>   pipe->get_param(pipe, PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT)

I'd say we should be doing this anyway instead of hardcoding the value.

> 
> And also change si_pipe.c:si_get_param's switch statement value to return:
>   case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
>     return sscreen->b.info.gart_page_size;

I'm not sure what the correct value is here. AFAIK, EG uses 256B cache
lines so I'd expect the value of to be at least that

Both NI and GCN should be able to use 4K pages (which is what
gart_page_size is set to), but we might want higher alignment for
better performance[0]

[0]https://lists.freedesktop.org/archives/dri-devel/2014-May/058858.htm
l
> 
> Then I can successfully create buffers from user pointers on my SI card.
> 
> I'm a bit fuzzy on what alignment restrictions exist for SI/GCN cards,
> but the winsys seems to indicate we should align things to gart page
> size, which makes sense on the surface at least.
> 
> If the alignment restrictions have changed between R600 and GCN, that
> might explain why what's broken for me is working for you/Grigori (on
> r600).

I remember there was a buffer alignment patch form AMD recently for
SI/CI vs. VI+, but I can't find it.
It looks like a separate issue however. if incorrect alignment makes
user_ptr fail, and the test still fails, it looks like the no-user_ptr
fallback is broken.

Jan

> 
> --Aaron
> 
> > 
> > Jan
> > 
> > 
> > > 
> > > Signed-off-by: Aaron Watry <awatry@gmail.com>
> > > CC: Francisco Jerez <currojerez@riseup.net>
> > > ---
> > >  src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
> > > index b852e6896f..912d74830a 100644
> > > --- a/src/gallium/state_trackers/clover/core/memory.cpp
> > > +++ b/src/gallium/state_trackers/clover/core/memory.cpp
> > > @@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context &ctx, cl_mem_flags flags,
> > >                         size_t size, void *host_ptr) :
> > >     context(ctx), _flags(flags),
> > >     _size(size), _host_ptr(host_ptr) {
> > > -   if (flags & CL_MEM_COPY_HOST_PTR)
> > > +   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
> > >        data.append((char *)host_ptr, size);
> > >  }
> > > 
> > 
> > --
> > Jan Vesely <jan.vesely@rutgers.edu>
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, Aug 3, 2017 at 11:53 AM, Jan Vesely <jan.vesely@rutgers.edu> wrote:
> On Thu, 2017-08-03 at 00:28 -0500, Aaron Watry wrote:
>> /
>>
>> On Sat, Jul 22, 2017 at 2:27 PM, Jan Vesely <jan.vesely@rutgers.edu> wrote:
>> > On Fri, 2017-07-21 at 23:19 -0500, Aaron Watry wrote:
>> > > Fixes: OpenCL CTS test/conformance/buffers/buffer_copy
>> >
>> > Similar patch was pushed in 2013:
>> > 56647c5d8f8e60269f0a3277e3caa7ee57d1fe6a
>> > "clover: Append buffers that use CL_MEM_USE_HOST_PTR."
>> >
>> > Grigory(added to cc) reverted the change and implemented user_ptr
>> > mechanism in:
>> > f972b223c4cb4ec58a9451cbac5d120ac9deb336
>> > "clover: try userptr for CL_MEM_USE_HOST_PTR"
>> >
>> > The buffer-flags piglit still passes, but it maps even CL_USE_HOST_PTR
>> > buffers. I'm not sure what the CTS does, we might need a
>> > synchronization point after kernels finish execution. I couldn't find
>> > the relevant part of specs that would define accessing
>> > CL_MEM_USE_HOST_PTR buffers without
>> > clEnqueueMapBuffer or clEnqueueReadBuffer
>> >
>>
>> I got rid of this patch in my tree and then started digging into the
>> user_ptr stuff.
>>
>> It looks like the issue stems from alignment restrictions.
>>
>> In clover/core/resource.cpp, there's a call to:
>>   dev.pipe->resource_from_user_memory(dev.pipe, &info, obj.host_ptr())
>>
>> That is failing for me.
>>
>> The CTS allocates memory that is aligned to the value returned by
>> clGetDeviceInfo (device,CL_DEVICE_MEM_BASE_ADDR_ALIGN, ...) converted
>> to bytes.
>>
>> Clover's api/device.cpp  has that hard-coded to 1024 bits/128 bytes at
>> the moment.
>>
>> If I change the device info query for the mem_base_addr_align to
>> forward a call to:
>>   pipe->get_param(pipe, PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT)
>
> I'd say we should be doing this anyway instead of hardcoding the value.

Completely agree.  Hard-coding a constant minimum alignment at the API
level is wrong. Once we figure out exactly how do derive that value, I
plan on writing up a patch to pull that value out of the pipe driver.

Cc'ing Christian since he added both the user_ptr support and the
large PTE support.

As far as I've looked (which hasn't necessarily been exhaustive), I
haven't found any other
pipe caps that look like they'd apply for minimum user_ptr buffer
alignment restrictions, but
at the same time the description of that cap doesn't seem perfectly applicable.

Do we need to introduce a new cap for minimum user_ptr buffer
alignment, or is the value of PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT
sub-optimal/wrong? I'm thinking it's correct as-is, and we might want
to consider a new cap which reports the proper user pointer alignment
when PIPE_CAP_RESOURCE_FROM_USER_MEMORY is supported.

Right now it's hard-coded to R600_MAP_BUFFER_ALIGNMENT in si_pipe.c
and r600_pipe.c which has a value of 64 (bytes, I believe).

>
>>
>> And also change si_pipe.c:si_get_param's switch statement value to return:
>>   case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
>>     return sscreen->b.info.gart_page_size;
>
> I'm not sure what the correct value is here. AFAIK, EG uses 256B cache
> lines so I'd expect the value of to be at least that

Depending on how the weather works out tonight, I might be able to at
least find out what NI reports for gart page sizes and compare that to
my SI.  I haven't tried to test user pointer support on r600g yet, so
either it's working alright with the existing 64-byte alignment, or
it's broken when we allocate pointers using the actual alignments
reported by clGetDeviceInfo. If it's broken, I'll try 256B, then keep
bumping it up until it either starts working or I hit GART page size.

--Aaron

>
> Both NI and GCN should be able to use 4K pages (which is what
> gart_page_size is set to), but we might want higher alignment for
> better performance[0]
>
> [0]https://lists.freedesktop.org/archives/dri-devel/2014-May/058858.htm
> l
>>
>> Then I can successfully create buffers from user pointers on my SI card.
>>
>> I'm a bit fuzzy on what alignment restrictions exist for SI/GCN cards,
>> but the winsys seems to indicate we should align things to gart page
>> size, which makes sense on the surface at least.
>>
>> If the alignment restrictions have changed between R600 and GCN, that
>> might explain why what's broken for me is working for you/Grigori (on
>> r600).
>
> I remember there was a buffer alignment patch form AMD recently for
> SI/CI vs. VI+, but I can't find it.
> It looks like a separate issue however. if incorrect alignment makes
> user_ptr fail, and the test still fails, it looks like the no-user_ptr
> fallback is broken.
>
> Jan
>
>>
>> --Aaron
>>
>> >
>> > Jan
>> >
>> >
>> > >
>> > > Signed-off-by: Aaron Watry <awatry@gmail.com>
>> > > CC: Francisco Jerez <currojerez@riseup.net>
>> > > ---
>> > >  src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
>> > >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > >
>> > > diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
>> > > index b852e6896f..912d74830a 100644
>> > > --- a/src/gallium/state_trackers/clover/core/memory.cpp
>> > > +++ b/src/gallium/state_trackers/clover/core/memory.cpp
>> > > @@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context &ctx, cl_mem_flags flags,
>> > >                         size_t size, void *host_ptr) :
>> > >     context(ctx), _flags(flags),
>> > >     _size(size), _host_ptr(host_ptr) {
>> > > -   if (flags & CL_MEM_COPY_HOST_PTR)
>> > > +   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
>> > >        data.append((char *)host_ptr, size);
>> > >  }
>> > >
>> >
>> > --
>> > Jan Vesely <jan.vesely@rutgers.edu>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, Aug 3, 2017 at 3:33 PM, Aaron Watry <awatry@gmail.com> wrote:
> On Thu, Aug 3, 2017 at 11:53 AM, Jan Vesely <jan.vesely@rutgers.edu> wrote:
>> On Thu, 2017-08-03 at 00:28 -0500, Aaron Watry wrote:
>>> /
>>>
>>> On Sat, Jul 22, 2017 at 2:27 PM, Jan Vesely <jan.vesely@rutgers.edu> wrote:
>>> > On Fri, 2017-07-21 at 23:19 -0500, Aaron Watry wrote:
>>> > > Fixes: OpenCL CTS test/conformance/buffers/buffer_copy
>>> >
>>> > Similar patch was pushed in 2013:
>>> > 56647c5d8f8e60269f0a3277e3caa7ee57d1fe6a
>>> > "clover: Append buffers that use CL_MEM_USE_HOST_PTR."
>>> >
>>> > Grigory(added to cc) reverted the change and implemented user_ptr
>>> > mechanism in:
>>> > f972b223c4cb4ec58a9451cbac5d120ac9deb336
>>> > "clover: try userptr for CL_MEM_USE_HOST_PTR"
>>> >
>>> > The buffer-flags piglit still passes, but it maps even CL_USE_HOST_PTR
>>> > buffers. I'm not sure what the CTS does, we might need a
>>> > synchronization point after kernels finish execution. I couldn't find
>>> > the relevant part of specs that would define accessing
>>> > CL_MEM_USE_HOST_PTR buffers without
>>> > clEnqueueMapBuffer or clEnqueueReadBuffer
>>> >
>>>
>>> I got rid of this patch in my tree and then started digging into the
>>> user_ptr stuff.
>>>
>>> It looks like the issue stems from alignment restrictions.
>>>
>>> In clover/core/resource.cpp, there's a call to:
>>>   dev.pipe->resource_from_user_memory(dev.pipe, &info, obj.host_ptr())
>>>
>>> That is failing for me.
>>>
>>> The CTS allocates memory that is aligned to the value returned by
>>> clGetDeviceInfo (device,CL_DEVICE_MEM_BASE_ADDR_ALIGN, ...) converted
>>> to bytes.
>>>
>>> Clover's api/device.cpp  has that hard-coded to 1024 bits/128 bytes at
>>> the moment.
>>>
>>> If I change the device info query for the mem_base_addr_align to
>>> forward a call to:
>>>   pipe->get_param(pipe, PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT)
>>
>> I'd say we should be doing this anyway instead of hardcoding the value.
>
> Completely agree.  Hard-coding a constant minimum alignment at the API
> level is wrong. Once we figure out exactly how do derive that value, I
> plan on writing up a patch to pull that value out of the pipe driver.
>
> Cc'ing Christian since he added both the user_ptr support and the
> large PTE support.
>
> As far as I've looked (which hasn't necessarily been exhaustive), I
> haven't found any other
> pipe caps that look like they'd apply for minimum user_ptr buffer
> alignment restrictions, but
> at the same time the description of that cap doesn't seem perfectly applicable.
>
> Do we need to introduce a new cap for minimum user_ptr buffer
> alignment, or is the value of PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT
> sub-optimal/wrong? I'm thinking it's correct as-is, and we might want
> to consider a new cap which reports the proper user pointer alignment
> when PIPE_CAP_RESOURCE_FROM_USER_MEMORY is supported.

IIRC, user_ptrs require page alignment.

Alex

>
> Right now it's hard-coded to R600_MAP_BUFFER_ALIGNMENT in si_pipe.c
> and r600_pipe.c which has a value of 64 (bytes, I believe).
>
>>
>>>
>>> And also change si_pipe.c:si_get_param's switch statement value to return:
>>>   case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
>>>     return sscreen->b.info.gart_page_size;
>>
>> I'm not sure what the correct value is here. AFAIK, EG uses 256B cache
>> lines so I'd expect the value of to be at least that
>
> Depending on how the weather works out tonight, I might be able to at
> least find out what NI reports for gart page sizes and compare that to
> my SI.  I haven't tried to test user pointer support on r600g yet, so
> either it's working alright with the existing 64-byte alignment, or
> it's broken when we allocate pointers using the actual alignments
> reported by clGetDeviceInfo. If it's broken, I'll try 256B, then keep
> bumping it up until it either starts working or I hit GART page size.
>
> --Aaron
>
>>
>> Both NI and GCN should be able to use 4K pages (which is what
>> gart_page_size is set to), but we might want higher alignment for
>> better performance[0]
>>
>> [0]https://lists.freedesktop.org/archives/dri-devel/2014-May/058858.htm
>> l
>>>
>>> Then I can successfully create buffers from user pointers on my SI card.
>>>
>>> I'm a bit fuzzy on what alignment restrictions exist for SI/GCN cards,
>>> but the winsys seems to indicate we should align things to gart page
>>> size, which makes sense on the surface at least.
>>>
>>> If the alignment restrictions have changed between R600 and GCN, that
>>> might explain why what's broken for me is working for you/Grigori (on
>>> r600).
>>
>> I remember there was a buffer alignment patch form AMD recently for
>> SI/CI vs. VI+, but I can't find it.
>> It looks like a separate issue however. if incorrect alignment makes
>> user_ptr fail, and the test still fails, it looks like the no-user_ptr
>> fallback is broken.
>>
>> Jan
>>
>>>
>>> --Aaron
>>>
>>> >
>>> > Jan
>>> >
>>> >
>>> > >
>>> > > Signed-off-by: Aaron Watry <awatry@gmail.com>
>>> > > CC: Francisco Jerez <currojerez@riseup.net>
>>> > > ---
>>> > >  src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
>>> > >  1 file changed, 1 insertion(+), 1 deletion(-)
>>> > >
>>> > > diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
>>> > > index b852e6896f..912d74830a 100644
>>> > > --- a/src/gallium/state_trackers/clover/core/memory.cpp
>>> > > +++ b/src/gallium/state_trackers/clover/core/memory.cpp
>>> > > @@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context &ctx, cl_mem_flags flags,
>>> > >                         size_t size, void *host_ptr) :
>>> > >     context(ctx), _flags(flags),
>>> > >     _size(size), _host_ptr(host_ptr) {
>>> > > -   if (flags & CL_MEM_COPY_HOST_PTR)
>>> > > +   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
>>> > >        data.append((char *)host_ptr, size);
>>> > >  }
>>> > >
>>> >
>>> > --
>>> > Jan Vesely <jan.vesely@rutgers.edu>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On 2017-08-03 22:26, Alex Deucher wrote:
> 
> IIRC, user_ptrs require page alignment.
> 
> Alex
> 

I didn't follow the whole discussion (sorry if I'm saying something 
redundant), but AMD's older OpenCL Optimization Guide [1] has some notes 
regarding the implementation of the USE_HOST_PTR flag.
It initially recommends 4KB (aka page) alignment but also supports 
arbitrary alignment (with additional overhead, I suppose it pins an 
extra page for bad alignments). It also does some optimizations to 
minimize mapping/unmapping operations, called "pre-pinning". Not sure if 
that is applicable to Mesa/Clover, aren't (GTT) buffers usually mapped 
forever?

Grigori

[1] 
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide.pdf

>> 
>> Right now it's hard-coded to R600_MAP_BUFFER_ALIGNMENT in si_pipe.c
>> and r600_pipe.c which has a value of 64 (bytes, I believe).
>> 
>>> 
>>>> 
>>>> And also change si_pipe.c:si_get_param's switch statement value to 
>>>> return:
>>>>   case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
>>>>     return sscreen->b.info.gart_page_size;
>>> 
>>> I'm not sure what the correct value is here. AFAIK, EG uses 256B 
>>> cache
>>> lines so I'd expect the value of to be at least that
>> 
>> Depending on how the weather works out tonight, I might be able to at
>> least find out what NI reports for gart page sizes and compare that to
>> my SI.  I haven't tried to test user pointer support on r600g yet, so
>> either it's working alright with the existing 64-byte alignment, or
>> it's broken when we allocate pointers using the actual alignments
>> reported by clGetDeviceInfo. If it's broken, I'll try 256B, then keep
>> bumping it up until it either starts working or I hit GART page size.
>> 
>> --Aaron
>> 
>>> 
>>> Both NI and GCN should be able to use 4K pages (which is what
>>> gart_page_size is set to), but we might want higher alignment for
>>> better performance[0]
>>> 
>>> [0]https://lists.freedesktop.org/archives/dri-devel/2014-May/058858.htm
>>> l
>>>> 
>>>> Then I can successfully create buffers from user pointers on my SI 
>>>> card.
>>>> 
>>>> I'm a bit fuzzy on what alignment restrictions exist for SI/GCN 
>>>> cards,
>>>> but the winsys seems to indicate we should align things to gart page
>>>> size, which makes sense on the surface at least.
>>>> 
>>>> If the alignment restrictions have changed between R600 and GCN, 
>>>> that
>>>> might explain why what's broken for me is working for you/Grigori 
>>>> (on
>>>> r600).
>>> 
>>> I remember there was a buffer alignment patch form AMD recently for
>>> SI/CI vs. VI+, but I can't find it.
>>> It looks like a separate issue however. if incorrect alignment makes
>>> user_ptr fail, and the test still fails, it looks like the 
>>> no-user_ptr
>>> fallback is broken.
>>> 
>>> Jan
>>> 
>>>> 
>>>> --Aaron
>>>> 
>>>> >
>>>> > Jan
>>>> >
>>>> >
>>>> > >
>>>> > > Signed-off-by: Aaron Watry <awatry@gmail.com>
>>>> > > CC: Francisco Jerez <currojerez@riseup.net>
>>>> > > ---
>>>> > >  src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
>>>> > >  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> > >
>>>> > > diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
>>>> > > index b852e6896f..912d74830a 100644
>>>> > > --- a/src/gallium/state_trackers/clover/core/memory.cpp
>>>> > > +++ b/src/gallium/state_trackers/clover/core/memory.cpp
>>>> > > @@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context &ctx, cl_mem_flags flags,
>>>> > >                         size_t size, void *host_ptr) :
>>>> > >     context(ctx), _flags(flags),
>>>> > >     _size(size), _host_ptr(host_ptr) {
>>>> > > -   if (flags & CL_MEM_COPY_HOST_PTR)
>>>> > > +   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
>>>> > >        data.append((char *)host_ptr, size);
>>>> > >  }
>>>> > >
>>>> >
>>>> > --
>>>> > Jan Vesely <jan.vesely@rutgers.edu>
>>>> 
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev