[v2] drm/scheduler: Fix job cleanup without timeout handler

Submitted by Erico Nunes on May 20, 2019, 11:16 p.m.

Details

Message ID 20190520231649.24595-1-nunes.erico@gmail.com
State New
Headers show
Series "drm/scheduler: Fix job cleanup without timeout handler" ( rev: 2 ) in DRI devel

Not browsing as part of any series.

Commit Message

Erico Nunes May 20, 2019, 11:16 p.m.
After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
only deleted when the timeout handler is able to be cancelled
successfully.

In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
job cleanup would be skipped which may result in memory leaks.

Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
drm_sched_cleanup_jobs.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index f8f0e1c19002..10d1d37e644a 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -630,7 +630,8 @@  static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
 	unsigned long flags;
 
 	/* Don't destroy jobs while the timeout worker is running */
-	if (!cancel_delayed_work(&sched->work_tdr))
+	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+	    !cancel_delayed_work(&sched->work_tdr))
 		return;
 
 

Comments

Am 21.05.19 um 01:16 schrieb Erico Nunes:
> [CAUTION: External Email]

>

> After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are

> only deleted when the timeout handler is able to be cancelled

> successfully.

>

> In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),

> job cleanup would be skipped which may result in memory leaks.

>

> Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in

> drm_sched_cleanup_jobs.

>

> Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

> Cc: Christian König <christian.koenig@amd.com>


Reviewed-by: Christian König <christian.koenig@amd.com>


Going to pick that up later today into our internal branch.

Thanks for the help,
Christian.

> ---

>   drivers/gpu/drm/scheduler/sched_main.c | 3 ++-

>   1 file changed, 2 insertions(+), 1 deletion(-)

>

> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

> index f8f0e1c19002..10d1d37e644a 100644

> --- a/drivers/gpu/drm/scheduler/sched_main.c

> +++ b/drivers/gpu/drm/scheduler/sched_main.c

> @@ -630,7 +630,8 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)

>          unsigned long flags;

>

>          /* Don't destroy jobs while the timeout worker is running */

> -       if (!cancel_delayed_work(&sched->work_tdr))

> +       if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&

> +           !cancel_delayed_work(&sched->work_tdr))

>                  return;

>

>

> --

> 2.20.1

>
On Tue, May 21, 2019 at 8:47 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Am 21.05.19 um 01:16 schrieb Erico Nunes:
> > [CAUTION: External Email]
> >
> > After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
> > only deleted when the timeout handler is able to be cancelled
> > successfully.
> >
> > In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
> > job cleanup would be skipped which may result in memory leaks.
> >
> > Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
> > drm_sched_cleanup_jobs.
> >
> > Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
> > Cc: Christian König <christian.koenig@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Going to pick that up later today into our internal branch.

Thanks. I also posted one to set lima to use a real default timeout.

Is it possible for us to still get at least one of these into a tree
that goes to a linux 5.2-rc?
This issue may make lima unusable as the system runs out of memory
quickly if many opengl programs are launched.

Erico
Am 21.05.19 um 14:16 schrieb Erico Nunes:
> [CAUTION: External Email]

>

> On Tue, May 21, 2019 at 8:47 AM Koenig, Christian

> <Christian.Koenig@amd.com> wrote:

>> Am 21.05.19 um 01:16 schrieb Erico Nunes:

>>> [CAUTION: External Email]

>>>

>>> After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are

>>> only deleted when the timeout handler is able to be cancelled

>>> successfully.

>>>

>>> In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),

>>> job cleanup would be skipped which may result in memory leaks.

>>>

>>> Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in

>>> drm_sched_cleanup_jobs.

>>>

>>> Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

>>> Cc: Christian König <christian.koenig@amd.com>

>> Reviewed-by: Christian König <christian.koenig@amd.com>

>>

>> Going to pick that up later today into our internal branch.

> Thanks. I also posted one to set lima to use a real default timeout.

>

> Is it possible for us to still get at least one of these into a tree

> that goes to a linux 5.2-rc?

> This issue may make lima unusable as the system runs out of memory

> quickly if many opengl programs are launched.


Alex should pick it up for his next -fixes pull this week.

Christian.

>

> Erico
On Tue, May 21, 2019 at 8:17 PM Erico Nunes <nunes.erico@gmail.com> wrote:
>
> On Tue, May 21, 2019 at 8:47 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
> >
> > Am 21.05.19 um 01:16 schrieb Erico Nunes:
> > > [CAUTION: External Email]
> > >
> > > After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
> > > only deleted when the timeout handler is able to be cancelled
> > > successfully.
> > >
> > > In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
> > > job cleanup would be skipped which may result in memory leaks.
> > >
> > > Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
> > > drm_sched_cleanup_jobs.
> > >
> > > Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> >
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> >
> > Going to pick that up later today into our internal branch.
>
> Thanks. I also posted one to set lima to use a real default timeout.
>
> Is it possible for us to still get at least one of these into a tree
> that goes to a linux 5.2-rc?
> This issue may make lima unusable as the system runs out of memory
> quickly if many opengl programs are launched.
>
The "drm/scheduler: rework job destruction" commit is not in 5.2-rc. So no
need to pick any fix to that place.

It would be better that run some tests on 5.2-rc for other possible
fixes indeed.

Regards,
Qiang
On Tue, May 21, 2019 at 2:48 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Am 21.05.19 um 01:16 schrieb Erico Nunes:
> > [CAUTION: External Email]
> >
> > After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
> > only deleted when the timeout handler is able to be cancelled
> > successfully.
> >
> > In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
> > job cleanup would be skipped which may result in memory leaks.
> >
> > Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
> > drm_sched_cleanup_jobs.
> >
> > Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
> > Cc: Christian König <christian.koenig@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Going to pick that up later today into our internal branch.

Please apply it to drm-misc-next.  that is where the other gpu
scheduler changes are.  They are not in 5.2.

Alex

>
> Thanks for the help,
> Christian.
>
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index f8f0e1c19002..10d1d37e644a 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -630,7 +630,8 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
> >          unsigned long flags;
> >
> >          /* Don't destroy jobs while the timeout worker is running */
> > -       if (!cancel_delayed_work(&sched->work_tdr))
> > +       if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> > +           !cancel_delayed_work(&sched->work_tdr))
> >                  return;
> >
> >
> > --
> > 2.20.1
> >
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 21.05.19 um 16:13 schrieb Alex Deucher:
> [CAUTION: External Email]

>

> On Tue, May 21, 2019 at 2:48 AM Koenig, Christian

> <Christian.Koenig@amd.com> wrote:

>> Am 21.05.19 um 01:16 schrieb Erico Nunes:

>>> [CAUTION: External Email]

>>>

>>> After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are

>>> only deleted when the timeout handler is able to be cancelled

>>> successfully.

>>>

>>> In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),

>>> job cleanup would be skipped which may result in memory leaks.

>>>

>>> Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in

>>> drm_sched_cleanup_jobs.

>>>

>>> Signed-off-by: Erico Nunes <nunes.erico@gmail.com>

>>> Cc: Christian König <christian.koenig@amd.com>

>> Reviewed-by: Christian König <christian.koenig@amd.com>

>>

>> Going to pick that up later today into our internal branch.

> Please apply it to drm-misc-next.  that is where the other gpu

> scheduler changes are.  They are not in 5.2.


Ah! Now that makes sense again, thanks for the reminder.

Christian.

>

> Alex

>

>> Thanks for the help,

>> Christian.

>>

>>> ---

>>>    drivers/gpu/drm/scheduler/sched_main.c | 3 ++-

>>>    1 file changed, 2 insertions(+), 1 deletion(-)

>>>

>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

>>> index f8f0e1c19002..10d1d37e644a 100644

>>> --- a/drivers/gpu/drm/scheduler/sched_main.c

>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c

>>> @@ -630,7 +630,8 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)

>>>           unsigned long flags;

>>>

>>>           /* Don't destroy jobs while the timeout worker is running */

>>> -       if (!cancel_delayed_work(&sched->work_tdr))

>>> +       if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&

>>> +           !cancel_delayed_work(&sched->work_tdr))

>>>                   return;

>>>

>>>

>>> --

>>> 2.20.1

>>>

>> _______________________________________________

>> dri-devel mailing list

>> dri-devel@lists.freedesktop.org

>> https://lists.freedesktop.org/mailman/listinfo/dri-devel