Add Profiling support in beignet.

Submitted by junyan.he@inbox.com on Nov. 16, 2015, 11:40 p.m.

Details

Reviewer None
Submitted Nov. 16, 2015, 11:40 p.m.
Last Updated Nov. 16, 2015, 11:42 p.m.
Revision 1

Cover Letter(s)

Revision 1
      From: Junyan He <junyan.he@linux.intel.com>

The profiling support is enabled by this patch set.
The profiling information is as following:
-------------------------- Log 0 --------------------------
| fix functions id:   7     simd:   16   kernel id:    0  |
| thread id:          0     EU id:   1   half slice id: 0 |
| dispatch Mask:   1 prolog:       197  epilog:      6699 |
| globalX:   4~   4  globalY:   0~   0  globalZ:   0~   0 |
|  ts0 :        64  | ts1 :         0  | ts2 :       930  |
|  ts3 :         0  | ts4 :      1046  | ts5 :      1170  |
|  ts6 :         0  | ts7 :         0  | ts8 :         0  |
|  ts9 :      1624  | ts10:      1838  | ts11:         0  |
|  ts12:      2032  | ts13:         0  | ts14:      2312  |
|  ts15:      2560  | ts16:         0  | ts17:         0  |
|  ts18:         0  | ts19:      2972  |                  |

Each hw thread will create one such log items.
Prolog is the timestamp when we enter this kernel, while
epilog is the timestamp we finish and leave it.
ts0~ts19 reocord the time offsets from the prolog, but
the base is 0.
We now just record first 20 blocks' timestamp. Later after
we fully support SourceToBinary, we can set profiling point
at any location.

V2:
1. Fix GLOBAL XYZ wrong value.
Some curbe registers such as lid0, lid1 may have already expired
when we reach the bottom block and cause the wrong global values.
2. Fix the problem of wrong device id in profiling info.
3. Fix the pointer size problems on BDW.
The pointers are 8 bytes value and the dri_bo_emit_reloc will
write 8 bytes. The buffer pointers for printf and profiling are
declared as 4 bytes, and so the value next to the pointer in the
curbe will be erased and cause the wrong results.
4. Place the prolog and epilog logic to the head and tail block.
The old version places the prolog at the beginning of the first block
and places the epilog at the last second block, which just before the
return block. These will cause the proflog and epilog within in predication.
But they should be executed unconditionally.
5. Improve the sub and add functions for timestamp calculation.
From BDW, the native long type is supported, use it to make calculation
more efficient.

V3:
1. Fix the wrong MOD -1 calculation.
2. Add tm0 register helper function.
3. The curbe allocation manner has changed, so we need to set all the curbe
   registers life interval correct before they can be allocated correctly.

Some known issues:
On DBW, some log like this:
------------------------ Log 5      -----------------------
| fix functions id:   7     simd:   16   kernel id:    0  |
| thread id:    0  EU id:   8  sub slice id: 1 slice id 0 |
| dispatch Mask:   1 prolog:     28578  epilog:     15445 |
| globalX:   4~   4  globalY:   0~   0  globalZ:   0~   0 |
|  ts0 :       186  | ts1 :         0  | ts2 :      1504  |
|  ts3 :         0  | ts4 :4294946425  | ts5 :4294946637  |
|  ts6 :         0  | ts7 :         0  | ts8 :         0  |
|  ts9 :4294947235  | ts10:4294947491  | ts11:         0  |
|  ts12:4294947645  | ts13:         0  | ts14:4294947819  |
|  ts15:4294947999  | ts16:         0  | ts17:         0  |
|  ts18:         0  | ts19:4294948561  |                  |

The big huge time stamp is really strange and invalid.
It can just be found when run may cases together, can when
we switch to one case run, we can never duplicate it.
It may have relationship with HW and will not cause any
regressions, so I choose to fix it later.


Signed-off-by: Junyan He <junyan.he@linux.intel.com>
---
backend/src/CMakeLists.txt                         |    3 +
backend/src/backend/gen8_context.cpp               |   21 +
backend/src/backend/gen8_context.hpp               |    2 +
backend/src/backend/gen_context.cpp                |  451 ++++++++++++++++++++
backend/src/backend/gen_context.hpp                |    9 +
.../src/backend/gen_insn_gen7_schedule_info.hxx    |    2 +
backend/src/backend/gen_insn_scheduling.cpp        |    4 +-
backend/src/backend/gen_insn_selection.cpp         |  140 ++++++
backend/src/backend/gen_insn_selection.hpp         |    8 +
backend/src/backend/gen_insn_selection.hxx         |    2 +
backend/src/backend/gen_program.cpp                |    9 +-
backend/src/backend/gen_program.hpp                |    2 +-
backend/src/backend/gen_reg_allocation.cpp         |   47 ++
backend/src/backend/gen_register.hpp               |   19 +
backend/src/backend/program.cpp                    |   35 +-
backend/src/backend/program.h                      |   17 +
backend/src/backend/program.hpp                    |   25 +-
backend/src/gbe_bin_interpreter.cpp                |    4 +
backend/src/ir/instruction.cpp                     |   96 ++++-
backend/src/ir/instruction.hpp                     |   27 +-
backend/src/ir/instruction.hxx                     |    2 +
backend/src/ir/lowering.cpp                        |    7 +
backend/src/ir/profile.cpp                         |   16 +-
backend/src/ir/profile.hpp                         |    8 +-
backend/src/ir/profiling.cpp                       |   74 ++++
backend/src/ir/profiling.hpp                       |  132 ++++++
backend/src/ir/unit.cpp                            |    6 +-
backend/src/ir/unit.hpp                            |   10 +
backend/src/llvm/llvm_gen_backend.cpp              |   48 ++-
backend/src/llvm/llvm_gen_backend.hpp              |    3 +
backend/src/llvm/llvm_gen_ocl_function.hxx         |    5 +
backend/src/llvm/llvm_profiling.cpp                |  211 +++++++++
backend/src/llvm/llvm_to_gen.cpp                   |    6 +-
backend/src/llvm/llvm_to_gen.hpp                   |    3 +-
src/CMakeLists.txt                                 |    1 +
src/cl_command_queue.c                             |    8 +
src/cl_command_queue_gen7.c                        |   37 ++
src/cl_driver.h                                    |   16 +
src/cl_driver_defs.c                               |    5 +
src/cl_gbe_loader.cpp                              |   15 +
src/cl_gbe_loader.h                                |    3 +
src/intel/intel_gpgpu.c                            |   58 +++
src/intel/intel_gpgpu.h                            |    3 +-
43 files changed, 1579 insertions(+), 21 deletions(-)
    

Revisions

Patches download mbox

# Name Submitter State A F R T
[01/21,V3] Backend: Add ProfilingInfo class to ir. junyan.he@inbox.com New
[02/21,V3] Backend: Add StoreProfiling and CalcTimestamp instructions junyan.he@inbox.com New
[03/21,V3] Backend: Add ProfilingInserter and a new function pass. junyan.he@inbox.com New
[04/21,V3] Backend: Add profiling registers to curbe. junyan.he@inbox.com New
[05/21,V3] Backend: Add ProfilingInfo to Unit. junyan.he@inbox.com New
[06/21,V3] Backend: Insert store_profiling before lowed return. junyan.he@inbox.com New
[07/21,V3] Backend: Add CalcTimestamp and StoreProfiling. junyan.he@inbox.com New
[08/21,V3] Backend: Add IVAR OCL_PROFILING_LOG to control profiling log. junyan.he@inbox.com New
[09/21,V3] Backend: Add CalcTimestamp and StoreProfiling to insn selection. junyan.he@inbox.com New
[10/21,V3] Backend: Add a auxiliary function to convert GenReg to uniform. junyan.he@inbox.com New
[11/21,V3] Backend: Add tm0 function for arf timestamp register. junyan.he@inbox.com New
[12/21,V3] Backend: Add profilingProlog function for GenContext. junyan.he@inbox.com New
[13/21,V3] Add profiling info APIs to runtime. junyan.he@inbox.com New
[14/21,V3] Runtime: Bind the profiling buffer when profiling enabled. junyan.he@inbox.com New
[15/21,V3] Backend: Fix two bugs about curbe related pointer. junyan.he@inbox.com New
[16/21,V3] Backend: Avoid CALC_TIMESTAMP and STORE_PROFILING being scheduled. junyan.he@inbox.com New
[17/21,V3] Backend: Add ADD_ and SUB_ timestamps help functions. junyan.he@inbox.com New
[18/21,V3] Backend: Implement emitCalcTimestampInstruction in GenContext. junyan.he@inbox.com New
[19/21,V3] Backend: Implement StoreProfilingInstruction in GenContext. junyan.he@inbox.com New
[20/21,V3] Backend: Append the reg interval for registers need for profiling. junyan.he@inbox.com New
[21/21,V3] CMake: Add -lrt to the link command of libcl.so junyan.he@inbox.com New