BISECTED- amd-staging-drm-next, xorg-server segfault A6-6310 APU - R4 Mullins.

Submitted by Michel Dänzer on Jan. 11, 2019, 3:27 p.m.

Details

Message ID 8033aaf7-2e86-9414-c68a-7b17e331f212@daenzer.net
State New
Series "BISECTED- amd-staging-drm-next, xorg-server segfault A6-6310 APU - R4 Mullins."
Headers show

Commit Message

Michel Dänzer Jan. 11, 2019, 3:27 p.m.
On 2019-01-10 6:56 p.m., Przemek Socha wrote:
>
> [  147.846148] [drm:amdgpu_display_user_framebuffer_create [amdgpu]] Invalid 
> pitch: expecting 10752 but got 10624
> [  147.846155] [drm:drm_internal_framebuffer_create] could not create 
> framebuffer"

Thanks, this confirms that the check is too strict. I've sent a patch
reverting this as well.


Yu, I like the idea behind your changes, but unfortunately it's more
complicated than that. If you want to work on similar checks which
accurately reflect the hardware constraints, people on the amd-gfx list
should be able to help with that.

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 70a816dd8b4d..99b646c16311 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -537,8 +537,11 @@  amdgpu_display_user_framebuffer_create(struct drm_device *dev,
 
 	pitch = amdgpu_align_pitch(adev, pitch, cpp, false);
 	if (mode_cmd->pitches[0] != pitch) {
-		DRM_DEBUG_KMS("Invalid pitch: expecting %d but got %d\n",
-			      pitch, mode_cmd->pitches[0]);
+		struct drm_format_name_buf format_name;
+
+		DRM_ERROR("Invalid pitch: expecting %d but got %d, format %s => cpp=%d\n",
+			  pitch, mode_cmd->pitches[0],
+			  drm_get_format_name(mode_cmd->pixel_format, &format_name), cpp);
 		return ERR_PTR(-EINVAL);
 	}
 

Comments

Yu Zhao Jan. 11, 2019, 9:37 p.m.
On Fri, Jan 11, 2019 at 04:27:44PM +0100, Michel Dänzer wrote:
> On 2019-01-10 6:56 p.m., Przemek Socha wrote:
> >
> > [  147.846148] [drm:amdgpu_display_user_framebuffer_create [amdgpu]] Invalid 
> > pitch: expecting 10752 but got 10624
> > [  147.846155] [drm:drm_internal_framebuffer_create] could not create 
> > framebuffer"
> 
> Thanks, this confirms that the check is too strict. I've sent a patch
> reverting this as well.
> 
> 
> Yu, I like the idea behind your changes, but unfortunately it's more
> complicated than that. If you want to work on similar checks which
> accurately reflect the hardware constraints, people on the amd-gfx list
> should be able to help with that.

Hi Michel, sorry for the troubles.

Background: after we turned on iommu with amd_iommu=force_isolation,
we saw io page faults from amd gpu (stoney ridge). We tracked it
down to userspace using 32-pixel pitch alignment, which seems smaller
than the minimum alignment supported by the hw. Instead of rejecting
the alignment, we suspect, it uses 64-pixel alignment to do dma. The
larger alignment sometimes causes out of bound memory accesses, thus
the io page faults.

I created the following patch and hoped it could fix the problem:
https://lore.kernel.org/patchwork/patch/1029656/
Well, it does on our stoney ridge based chromebook but also breaks
other platforms.

So my questions to the amd gpu experts here:
1) how do properly validate pitch alignment passed to kernel space?
2) if it's not easy, what would invalid alignment cause at worst?

Thank you.
Michel Dänzer Jan. 14, 2019, 5:44 p.m.
On 2019-01-11 10:37 p.m., Yu Zhao wrote:
> On Fri, Jan 11, 2019 at 04:27:44PM +0100, Michel Dänzer wrote:
>> On 2019-01-10 6:56 p.m., Przemek Socha wrote:
>>>
>>> [  147.846148] [drm:amdgpu_display_user_framebuffer_create [amdgpu]] Invalid 
>>> pitch: expecting 10752 but got 10624
>>> [  147.846155] [drm:drm_internal_framebuffer_create] could not create 
>>> framebuffer"
>>
>> Thanks, this confirms that the check is too strict. I've sent a patch
>> reverting this as well.
>>
>>
>> Yu, I like the idea behind your changes, but unfortunately it's more
>> complicated than that. If you want to work on similar checks which
>> accurately reflect the hardware constraints, people on the amd-gfx list
>> should be able to help with that.
> 
> Hi Michel, sorry for the troubles.

No worries, I missed these issues as well in my review.


> Background: after we turned on iommu with amd_iommu=force_isolation,
> we saw io page faults from amd gpu (stoney ridge). We tracked it
> down to userspace using 32-pixel pitch alignment, which seems smaller
> than the minimum alignment supported by the hw.

If that was the case, the corresponding surface would be displayed badly
distorted, because the hardware would use a different pitch.


> Instead of rejecting the alignment, we suspect, it uses 64-pixel
> alignment to do dma.

Actually, it's more likely that it would use the next smaller
well-aligned pitch, as it would probably simply ignore the least
significant bits smaller than the minimum alignment.


> The larger alignment sometimes causes out of bound memory accesses, thus
> the io page faults.

Per the above, this is more likely due to insufficient alignment of the
vertical size of the surface, resulting in the allocated memory being
too small.


> 1) how do properly validate pitch alignment passed to kernel space?

It's pretty complicated, I'm afraid. A case-insensitive search for
"display" in
https://gitlab.freedesktop.org/mesa/mesa/tree/master/src/amd/addrlib
might serve to give an idea of the complexity.