amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

Submitted by Sibren Vasse on Jan. 10, 2019, 5:52 p.m.

Details

Message ID CAF=iVcsEMfWyJck7RJ8-EGvKipXU=5eq3iv61mo49_7Wd4pHuw@mail.gmail.com
State New
Series "amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code"
Headers show

Commit Message

Sibren Vasse Jan. 10, 2019, 5:52 p.m.
On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> >>  From the trace it looks like we git the case where swiotlb tries
> >> to copy back data from a bounce buffer, but hits a dangling or NULL
> >> pointer.  So a couple questions for the submitter:
> >>
> >>   - does the system have more than 4GB memory and thus use swiotlb?
> >>     (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> >>   - does the device this happens on have a DMA mask smaller than
> >>     the available memory, that is should swiotlb be used here to start
> >>     with?
> >
> > Rather unlikely. The device is an AMD GPU, so we can address memory up to
> > 1TB.
>
> So we probably somehow got a false positive.
>
> For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> backtrace really is in the swiotlb code (I can't think of anything else,
> but I'd rather be sure).
I'm not sure what you want me to confirm. Could you elaborate?

>
> Second it would be great to print what the contents of io_tlb_start
> and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> maybe that gives a clue why we are hitting the swiotlb code here.


Result on boot:
[   11.405558] io_tlb_start: 3782983680, io_tlb_end: 3850092544

Regards,

Sibren

Patch hide | download patch | download mbox

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7c007ed7505f..042246dbae00 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -69,6 +69,7 @@  extern phys_addr_t io_tlb_start, io_tlb_end;

 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
+    printk_once(KERN_INFO "io_tlb_start: %llu, io_tlb_end: %llu",
io_tlb_start, io_tlb_end);
     return paddr >= io_tlb_start && paddr < io_tlb_end;
 }

Comments

hch@lst.de Jan. 14, 2019, 6:10 p.m.
On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote:
> On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig <hch@lst.de> wrote:
> >
> > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> > >>  From the trace it looks like we git the case where swiotlb tries
> > >> to copy back data from a bounce buffer, but hits a dangling or NULL
> > >> pointer.  So a couple questions for the submitter:
> > >>
> > >>   - does the system have more than 4GB memory and thus use swiotlb?
> > >>     (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > >>   - does the device this happens on have a DMA mask smaller than
> > >>     the available memory, that is should swiotlb be used here to start
> > >>     with?
> > >
> > > Rather unlikely. The device is an AMD GPU, so we can address memory up to
> > > 1TB.
> >
> > So we probably somehow got a false positive.
> >
> > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> > backtrace really is in the swiotlb code (I can't think of anything else,
> > but I'd rather be sure).
> I'm not sure what you want me to confirm. Could you elaborate?

Please open the vmlinux file for which this happend in gdb,
then send the output from this command

	l *(dma_direct_unmap_page+0x92)

to this thread.

> > Second it would be great to print what the contents of io_tlb_start
> > and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> > maybe that gives a clue why we are hitting the swiotlb code here.
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 7c007ed7505f..042246dbae00 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -69,6 +69,7 @@ extern phys_addr_t io_tlb_start, io_tlb_end;
> 
>  static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  {
> +    printk_once(KERN_INFO "io_tlb_start: %llu, io_tlb_end: %llu",
> io_tlb_start, io_tlb_end);
>      return paddr >= io_tlb_start && paddr < io_tlb_end;
>  }
> 
> Result on boot:
> [   11.405558] io_tlb_start: 3782983680, io_tlb_end: 3850092544

So this is a normal swiotlb location, and it does defintively exist.
Sibren Vasse Jan. 14, 2019, 7 p.m.
On Mon, 14 Jan 2019 at 19:10, Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote:
> > On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> > > >>  From the trace it looks like we git the case where swiotlb tries
> > > >> to copy back data from a bounce buffer, but hits a dangling or NULL
> > > >> pointer.  So a couple questions for the submitter:
> > > >>
> > > >>   - does the system have more than 4GB memory and thus use swiotlb?
> > > >>     (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > > >>   - does the device this happens on have a DMA mask smaller than
> > > >>     the available memory, that is should swiotlb be used here to start
> > > >>     with?
> > > >
> > > > Rather unlikely. The device is an AMD GPU, so we can address memory up to
> > > > 1TB.
> > >
> > > So we probably somehow got a false positive.
> > >
> > > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> > > backtrace really is in the swiotlb code (I can't think of anything else,
> > > but I'd rather be sure).
> > I'm not sure what you want me to confirm. Could you elaborate?
>
> Please open the vmlinux file for which this happend in gdb,
> then send the output from this command
>
>         l *(dma_direct_unmap_page+0x92)
>
> to this thread.
My call trace contained:
Jan 10 16:34:51 <hostname> kernel:  dma_direct_unmap_page+0x7a/0x80

(gdb) list *(dma_direct_unmap_page+0x7a)
0xffffffff810fa28a is in dma_direct_unmap_page (kernel/dma/direct.c:291).
286                     size_t size, enum dma_data_direction dir,
unsigned long attrs)
287     {
288             phys_addr_t phys = dma_to_phys(dev, addr);
289
290             if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
291                     dma_direct_sync_single_for_cpu(dev, addr, size, dir);
292
293             if (unlikely(is_swiotlb_buffer(phys)))
294                     swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
295     }