drawoverhead: new microbenchmark

Submitted by Marek Olšák on April 30, 2017, 11:58 p.m.

Details

Message ID 1493596702-22023-1-git-send-email-maraeo@gmail.com
State New
Headers show
Series "drawoverhead: new microbenchmark" ( rev: 1 ) in Piglit

Not browsing as part of any series.

Commit Message

Marek Olšák April 30, 2017, 11:58 p.m.
From: Marek Olšák <marek.olsak@amd.com>

Based on a benchmark from mesa/demos, but rewritten and extended.
It's a benchmark expected to be run separately, not a piglit test.
So why piglit? Because it's a good framework for writing apps like this.

mesa_glthread won't show an improvement here, because there is no app
overhead.

This is what the output looks like. The percentage is relative to
the first test of the given draw call.

The obvious thing there is that enabled vertex attribs decrease
Mesa performance even if there are no state changes.

Using Core profile.
Draw calls per second:
   DrawElements ( 1 VBOs, 0 UBOs,  0 Tex) w/ no state change:          5.71 million (100.0%)
   DrawElements ( 4 VBOs, 0 UBOs,  0 Tex) w/ no state change:          5.18 million (90.8%)
   DrawElements (16 VBOs, 0 UBOs,  0 Tex) w/ no state change:          3.65 million (63.9%)
   DrawElements ( 1 VBOs, 0 UBOs, 16 Tex) w/ no state change:          5.71 million (100.0%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ no state change:          5.78 million (101.2%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    220.11 thousand (3.9%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     1.06 million (18.5%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         483.27 thousand (8.5%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        291.20 thousand (5.1%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             1.84 million (32.3%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            1.12 million (19.7%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  2.27 million (39.8%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 966.00 thousand (16.9%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.37 million (24.0%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      1.86 million (32.6%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    1.66 million (29.0%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    1.09 million (19.1%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       1.94 million (34.0%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.56 million (27.3%)
   DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    200.81 thousand (3.5%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    186.92 thousand (3.3%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     638.49 thousand (11.2%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         452.39 thousand (7.9%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        278.79 thousand (4.9%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             1.47 million (25.7%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            974.30 thousand (17.1%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  1.79 million (31.3%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 853.07 thousand (14.9%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.16 million (20.3%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      1.49 million (26.2%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    1.35 million (23.7%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    946.45 thousand (16.6%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       1.62 million (28.3%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.29 million (22.5%)
   DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    126.44 thousand (2.2%)
   DrawArrays ( 1 VBOs, 0 UBOs,  0 Tex) w/ no state change:          8.02 million (100.0%)
   DrawArrays ( 4 VBOs, 0 UBOs,  0 Tex) w/ no state change:          7.14 million (89.0%)
   DrawArrays (16 VBOs, 0 UBOs,  0 Tex) w/ no state change:          4.26 million (53.0%)
   DrawArrays ( 1 VBOs, 0 UBOs, 16 Tex) w/ no state change:          7.89 million (98.4%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ no state change:          8.01 million (99.9%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    221.09 thousand (2.8%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     1.13 million (14.1%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         500.25 thousand (6.2%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        294.30 thousand (3.7%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             2.02 million (25.2%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            1.18 million (14.7%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  2.28 million (28.4%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 617.79 thousand (7.7%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.59 million (19.8%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      2.09 million (26.0%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    2.02 million (25.2%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    1.18 million (14.7%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       2.27 million (28.3%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.77 million (22.1%)
   DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    204.60 thousand (2.6%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    191.50 thousand (2.4%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     679.98 thousand (8.5%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         472.00 thousand (5.9%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        286.70 thousand (3.6%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             1.69 million (21.0%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            1.04 million (13.0%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  2.04 million (25.5%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 620.41 thousand (7.7%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.30 million (16.2%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      1.69 million (21.0%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    1.55 million (19.3%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    1.04 million (13.0%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       1.82 million (22.7%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.47 million (18.3%)
   DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    129.25 thousand (1.6%)
---
 tests/CMakeLists.txt         |   1 +
 tests/perf/CMakeLists.gl.txt |  14 ++
 tests/perf/CMakeLists.txt    |   1 +
 tests/perf/common.c          | 117 ++++++++++
 tests/perf/common.h          |  34 +++
 tests/perf/drawoverhead.c    | 495 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 662 insertions(+)
 create mode 100644 tests/perf/CMakeLists.gl.txt
 create mode 100644 tests/perf/CMakeLists.txt
 create mode 100644 tests/perf/common.c
 create mode 100644 tests/perf/common.h
 create mode 100644 tests/perf/drawoverhead.c

Patch hide | download patch | download mbox

diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
index af842fc..7aab33f 100644
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -9,20 +9,21 @@  add_subdirectory (fbo)
 add_subdirectory (general)
 add_subdirectory (glx)
 add_subdirectory (glslparsertest)
 add_subdirectory (hiz)
 add_subdirectory (asmparsertest)
 add_subdirectory (security)
 add_subdirectory (shaders)
 add_subdirectory (texturing)
 add_subdirectory (spec)
 add_subdirectory (fast_color_clear)
+add_subdirectory (perf)
 
 if (NOT APPLE)
 	# glean relies on AGL which is deprecated/broken on recent Mac OS X
 	# versions
 	add_subdirectory (glean)
 endif ()
 
 IF(EGL_FOUND)
 	add_subdirectory (egl)
 ENDIF(EGL_FOUND)
diff --git a/tests/perf/CMakeLists.gl.txt b/tests/perf/CMakeLists.gl.txt
new file mode 100644
index 0000000..f9d3115
--- /dev/null
+++ b/tests/perf/CMakeLists.gl.txt
@@ -0,0 +1,14 @@ 
+
+include_directories(
+	${GLEXT_INCLUDE_DIR}
+	${OPENGL_INCLUDE_PATH}
+)
+
+link_libraries (
+	piglitutil_${piglit_target_api}
+	${OPENGL_gl_LIBRARY}
+)
+
+piglit_add_executable (drawoverhead drawoverhead.c common.c)
+
+# vim: ft=cmake:
diff --git a/tests/perf/CMakeLists.txt b/tests/perf/CMakeLists.txt
new file mode 100644
index 0000000..144a306
--- /dev/null
+++ b/tests/perf/CMakeLists.txt
@@ -0,0 +1 @@ 
+piglit_include_target_api()
diff --git a/tests/perf/common.c b/tests/perf/common.c
new file mode 100644
index 0000000..e157d82
--- /dev/null
+++ b/tests/perf/common.c
@@ -0,0 +1,117 @@ 
+/*
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * Common perf code.  This should be re-usable with other tests.
+ */
+
+#include "piglit-util-gl.h"
+#include "common.h"
+
+/** Return time in seconds */
+static double
+perf_get_time(void)
+{
+	return piglit_time_get_nano() * 0.000000001;
+}
+
+/**
+ * Run function 'f' for enough iterations to reach a steady state.
+ * Return the rate (iterations/second).
+ */
+double
+perf_measure_rate(perf_rate_func f)
+{
+	const double minDuration = 0.5;
+	double rate = 0.0, prevRate = 0.0;
+	unsigned subiters;
+
+	/* Compute initial number of iterations to try.
+	 * If the test function is pretty slow this helps to avoid
+	 * extraordinarily long run times.
+	 */
+	subiters = 2;
+	{
+		const double t0 = perf_get_time();
+		double t1;
+		do {
+			f(subiters); /* call the rendering function */
+			glFinish();
+			t1 = perf_get_time();
+			subiters *= 2;
+		} while (t1 - t0 < 0.1 * minDuration);
+	}
+	/*perf_printf("initial subIters = %u\n", subiters);*/
+
+	while (1) {
+		const double t0 = perf_get_time();
+		unsigned iters = 0;
+		double t1;
+
+		do {
+			f(subiters); /* call the rendering function */
+			glFinish();
+			t1 = perf_get_time();
+			iters += subiters;
+		} while (t1 - t0 < minDuration);
+
+		rate = iters / (t1 - t0);
+
+		if (0)
+			printf("prevRate %f  rate  %f  ratio %f  iters %u\n",
+			       prevRate, rate, rate/prevRate, iters);
+
+		/* Try and speed the search up by skipping a few steps: */
+		if (rate > prevRate * 1.6)
+			subiters *= 8;
+		else if (rate > prevRate * 1.2)
+			subiters *= 4;
+		else if (rate > prevRate * 1.05)
+			subiters *= 2;
+		else
+			break;
+
+		prevRate = rate;
+	}
+
+	if (0)
+		printf("%s returning iters %u  rate %f\n", __FUNCTION__, subiters, rate);
+	return rate;
+}
+
+/* Note static buffer, can only use once per printf.
+ */
+const char *
+perf_human_float( double d )
+{
+	static char buf[80];
+
+	if (d > 1000000000.0)
+		snprintf(buf, sizeof(buf), "%.2f billion", d / 1000000000.0);
+	else if (d > 1000000.0)
+		snprintf(buf, sizeof(buf), "%.2f million", d / 1000000.0);
+	else if (d > 1000.0)
+		snprintf(buf, sizeof(buf), "%.2f thousand", d / 1000.0);
+	else
+		snprintf(buf, sizeof(buf), "%.2f", d);
+
+	return buf;
+}
diff --git a/tests/perf/common.h b/tests/perf/common.h
new file mode 100644
index 0000000..daa0d42
--- /dev/null
+++ b/tests/perf/common.h
@@ -0,0 +1,34 @@ 
+/*
+ * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef COMMON_H
+#define COMMON_H
+
+typedef void (*perf_rate_func)(unsigned count);
+
+double
+perf_measure_rate(perf_rate_func f);
+
+const char *
+perf_human_float( double d );
+
+#endif /* COMMON_H */
+
diff --git a/tests/perf/drawoverhead.c b/tests/perf/drawoverhead.c
new file mode 100644
index 0000000..9dd110f
--- /dev/null
+++ b/tests/perf/drawoverhead.c
@@ -0,0 +1,495 @@ 
+/*
+ * Copyright (C) 2009 VMware, Inc.
+ * Copyright (C) 2017 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "common.h"
+#include <stdbool.h>
+#include "piglit-util-gl.h"
+
+PIGLIT_GL_TEST_CONFIG_BEGIN
+
+	config.supports_gl_compat_version = 0;
+	config.supports_gl_core_version = 32;
+	for (int i = 1; i < argc; i++) {
+		if (!strcmp(argv[i], "-compat")) {
+			config.supports_gl_compat_version = 10;
+			config.supports_gl_core_version = 0;
+			break;
+		}
+	}
+	puts(config.supports_gl_core_version ? "Using Core profile." :
+					       "Using Compatibility profile.");
+	puts("Draw calls per second:");
+
+	config.window_visual = PIGLIT_GL_VISUAL_RGBA | PIGLIT_GL_VISUAL_DOUBLE |
+                               PIGLIT_GL_VISUAL_DEPTH | PIGLIT_GL_VISUAL_STENCIL;
+
+PIGLIT_GL_TEST_CONFIG_END
+
+static GLuint prog[2], uniform_loc, tex[8], ubo[4];
+static bool indexed;
+static GLenum enable_enum;
+
+void
+piglit_init(int argc, char **argv)
+{
+	static const unsigned indices[4] = {0, 1, 2, 3};
+	GLuint vao, ebo;
+
+	piglit_require_gl_version(30);
+
+	glGenVertexArrays(1, &vao);
+	glBindVertexArray(vao);
+
+	glGenBuffers(1, &ebo);
+	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
+	glBufferData(GL_ELEMENT_ARRAY_BUFFER,
+		     sizeof(indices), indices, GL_STATIC_DRAW);
+}
+
+static void
+get_vs_text(char *s, unsigned num_vbos, bool is_second)
+{
+	unsigned i;
+
+	strcpy(s, "#version 130\n"
+		  "#extension GL_ARB_explicit_attrib_location : require\n");
+	for (i = 0; i < num_vbos; i++) {
+		sprintf(s + strlen(s),
+			"layout (location = %u) in vec4 v%u;\n", i, i);
+	}
+	strcat(s, "void main() {\n"
+		  "	gl_Position = vec4(0.0)");
+	for (i = 0; i < num_vbos; i++)
+		sprintf(s + strlen(s), " + v%u", i);
+	if (is_second)
+		strcat(s, " + vec4(0.5)");
+	strcat(s, ";\n}\n");
+}
+
+static void
+get_fs_text(char *s, unsigned num_ubos, unsigned num_textures, bool is_second)
+{
+	unsigned i;
+
+	strcpy(s, "#version 130\n"
+		  "#extension GL_ARB_uniform_buffer_object : require\n"
+		  "uniform int index = 0;");
+	sprintf(s + strlen(s), "uniform vec4 u[%u];\n", is_second ? 240 : 1);
+
+	for (i = 0; i < num_textures; i++)
+		sprintf(s + strlen(s), "uniform sampler2D s%u;\n", i);
+	for (i = 0; i < num_ubos; i++)
+		sprintf(s + strlen(s), "uniform ub%u { vec4 ubu%u[10]; };\n", i, i);
+
+	strcat(s, "void main() {\n");
+	strcat(s, "	gl_FragData[0] = u[index]");
+	for (i = 0; i < num_textures; i++)
+		sprintf(s + strlen(s), " + texture(s%u, u[0].xy)", i);
+	for (i = 0; i < num_ubos; i++)
+		sprintf(s + strlen(s), " + ubu%u[index]", i);
+	if (is_second)
+		strcat(s, " + vec4(0.5)");
+	strcat(s, ";\n}\n");
+}
+
+static void
+setup_shaders_and_resources(unsigned num_vbos,
+			    unsigned num_ubos,
+			    unsigned num_textures)
+{
+	const unsigned max = 16;
+	char vs[4096], fs[4096];
+	unsigned p, i;
+
+	assert(num_vbos <= max);
+	assert(num_ubos <= max);
+	assert(num_textures <= max);
+
+	for (i = 0; i < max; i++)
+		glDisableVertexAttribArray(i);
+
+	/* Create two programs in case we want to test program changes. */
+	for (p = 0; p < 2; p++) {
+		get_vs_text(vs, num_vbos, p);
+		get_fs_text(fs, num_ubos, num_textures, p);
+		prog[p] = piglit_build_simple_program(vs, fs);
+
+		/* Assign texture units to samplers. */
+		glUseProgram(prog[p]);
+		for (i = 0; i < num_textures; i++) {
+			char sampler[20];
+			int loc;
+
+			snprintf(sampler, sizeof(sampler), "s%u", i);
+			loc = glGetUniformLocation(prog[p], sampler);
+			assert(loc >= 0);
+			glUniform1i(loc, i);
+		}
+		/* Assign UBO slots to uniform blocks. */
+		for (i = 0; i < num_ubos; i++) {
+			char block[20];
+			int index;
+
+			snprintf(block, sizeof(block), "ub%u", i);
+			index = glGetUniformBlockIndex(prog[p], block);
+			assert(index != GL_INVALID_INDEX);
+			glUniformBlockBinding(prog[p], index, i);
+		}
+	}
+	glUseProgram(prog[0]);
+
+	for (i = 0; i < num_ubos; i++) {
+		static const float data[10*4];
+		GLuint ub;
+
+		glGenBuffers(1, &ub);
+		glBindBuffer(GL_UNIFORM_BUFFER, ub);
+		glBufferData(GL_UNIFORM_BUFFER, sizeof(data), data,
+			     GL_STATIC_DRAW);
+
+		glBindBufferBase(GL_UNIFORM_BUFFER, i, ub);
+		/* Save the last UBOs for testing UBO changes. */
+		ubo[i % 4] = ub;
+	}
+	/* setup VBO w/ vertex data, we need a different buffer in each attrib */
+	for (i = 0; i < num_vbos; i++) {
+		/* Vertex positions are all zeroed - we want all primitives
+		 * to be culled.
+		 */
+		static const float vertices[4][3];
+		GLuint vbo;
+
+		glGenBuffers(1, &vbo);
+		glBindBuffer(GL_ARRAY_BUFFER, vbo);
+		glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices,
+			     GL_STATIC_DRAW);
+
+		glVertexAttribPointer(i, 2, GL_FLOAT, GL_FALSE,
+				      3 * sizeof(float), NULL);
+		glEnableVertexAttribArray(i);
+	}
+	for (i = 0; i < num_textures; i++) {
+		glActiveTexture(GL_TEXTURE0 + i);
+		/* Save the last texture IDs for testing texture changes. */
+		tex[i % 8] = piglit_rgbw_texture(GL_RGBA8, 4, 4, false, true,
+						 GL_UNSIGNED_BYTE);
+	}
+	glActiveTexture(GL_TEXTURE0);
+}
+
+static void
+draw(unsigned count)
+{
+	unsigned i;
+	if (indexed) {
+		for (i = 0; i < count; i++)
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+	} else {
+		for (i = 0; i < count; i++)
+			glDrawArrays(GL_POINTS, 0, 4);
+	}
+}
+
+static void
+draw_shader_change(unsigned count)
+{
+	unsigned i;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			glUseProgram(prog[i & 1]);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			glUseProgram(prog[i & 1]);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+	glUseProgram(prog[0]);
+}
+
+static void
+draw_uniform_change(unsigned count)
+{
+	unsigned i;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			glUniform4f(uniform_loc, i & 1, 0, 0, 0);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			glUniform4f(uniform_loc, i & 1, 0, 0, 0);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+}
+
+static void
+draw_one_texture_change(unsigned count)
+{
+	unsigned i;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			glBindTexture(GL_TEXTURE_2D, tex[i & 1]);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			glBindTexture(GL_TEXTURE_2D, tex[i & 1]);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+}
+
+static void
+draw_many_texture_change(unsigned count)
+{
+	unsigned i,j;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			for (j = 0; j < 8; j++) {
+				glActiveTexture(GL_TEXTURE0 + j);
+				glBindTexture(GL_TEXTURE_2D, tex[(i + j) % 8]);
+			}
+			glActiveTexture(GL_TEXTURE0);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			for (j = 0; j < 8; j++) {
+				glActiveTexture(GL_TEXTURE0 + j);
+				glBindTexture(GL_TEXTURE_2D, tex[(i + j) % 8]);
+			}
+			glActiveTexture(GL_TEXTURE0);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+}
+
+static void
+draw_one_ubo_change(unsigned count)
+{
+	unsigned i;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[i & 1]);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[i & 1]);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+}
+
+static void
+draw_many_ubo_change(unsigned count)
+{
+	unsigned i,j;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			for (j = 0; j < 4; j++)
+				glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[(i + j) % 4]);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			for (j = 0; j < 4; j++)
+				glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[(i + j) % 4]);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+}
+
+static void
+draw_state_change(unsigned count)
+{
+	unsigned i;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			if (i & 1)
+				glEnable(enable_enum);
+			else
+				glDisable(enable_enum);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			if (i & 1)
+				glEnable(enable_enum);
+			else
+				glDisable(enable_enum);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+	glDisable(enable_enum);
+}
+
+static void
+draw_vertex_attrib_change(unsigned count)
+{
+	unsigned i;
+	if (indexed) {
+		for (i = 0; i < count; i++) {
+			if (i & 1)
+				glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE,
+						      3 * sizeof(float), NULL);
+			else
+				glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE,
+						      3 * sizeof(float), NULL);
+			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
+		}
+	} else {
+		for (i = 0; i < count; i++) {
+			if (i & 1)
+				glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE,
+						      3 * sizeof(float), NULL);
+			else
+				glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE,
+						      3 * sizeof(float), NULL);
+			glDrawArrays(GL_POINTS, 0, 4);
+		}
+	}
+}
+
+#define COLOR_RESET	"\033[0m"
+#define COLOR_RED	"\033[31m"
+#define COLOR_GREEN	"\033[1;32m"
+#define COLOR_YELLOW	"\033[1;33m"
+#define COLOR_CYAN	"\033[1;36m"
+
+static double
+perf_run(const char *call, unsigned num_vbos, unsigned num_ubos,
+	 unsigned num_textures, const char *change, perf_rate_func f,
+	 double base_rate)
+{
+	double rate = perf_measure_rate(f);
+	double ratio = base_rate ? rate / base_rate : 1;
+
+	printf("   %s (%2u VBOs, %u UBOs, %2u Tex) w/ %s change:%*s"
+	       COLOR_CYAN "%s" COLOR_RESET " %s(%.1f%%)" COLOR_RESET "\n",
+	       call, num_vbos, num_ubos, num_textures, change,
+	       MAX2(18 - (int)strlen(change), 0), "",
+	       perf_human_float(rate),
+	       base_rate == 0 ? COLOR_RESET :
+				ratio > 0.7 ? COLOR_GREEN :
+				ratio > 0.4 ? COLOR_YELLOW : COLOR_RED,
+	       100 * ratio);
+	return rate;
+}
+
+struct enable_state_t {
+	GLenum enable;
+	const char *name;
+};
+
+static struct enable_state_t enable_states[] = {
+	{GL_BLEND,	"blend enable"},
+	{GL_DEPTH_TEST, "depth enable"},
+	{GL_STENCIL_TEST, "stencil enable"},
+	{GL_SCISSOR_TEST, "scissor enable"},
+	{GL_MULTISAMPLE, "MSAA enable"},
+	{GL_CULL_FACE,	"cull face enable"},
+	{GL_FRAMEBUFFER_SRGB, "FB sRGB enable"},
+};
+
+static void
+perf_draw_variant(const char *call, bool is_indexed)
+{
+	double base_rate = 0;
+	unsigned num_vbos, num_ubos, num_textures;
+
+	indexed = is_indexed;
+
+	/* Test different shader resource usage without state changes. */
+	num_ubos = 0;
+	num_textures = 0;
+	for (num_vbos = 1; num_vbos <= 16; num_vbos *= 4) {
+		setup_shaders_and_resources(num_vbos, num_ubos, num_textures);
+
+		double rate = perf_run(call, num_vbos, num_ubos, num_textures, "no state",
+				       draw, base_rate);
+		if (num_vbos == 1)
+			base_rate = rate;
+	}
+
+	num_vbos = 1;
+	num_ubos = 0;
+	num_textures = 16;
+	setup_shaders_and_resources(num_vbos, num_ubos, num_textures);
+	perf_run(call, num_vbos, num_ubos, num_textures, "no state",
+		 draw, base_rate);
+
+	/* Test state changes. */
+	num_ubos = 4;
+	num_textures = 8;
+	for (num_vbos = 1; num_vbos <= 16; num_vbos *= 16) {
+		setup_shaders_and_resources(num_vbos, num_ubos, num_textures);
+
+		perf_run(call, num_vbos, num_ubos, num_textures, "no state",
+			 draw, base_rate);
+		perf_run(call, num_vbos, num_ubos, num_textures, "shader program",
+			 draw_shader_change, base_rate);
+		perf_run(call, num_vbos, num_ubos, num_textures, "vertex attrib",
+			 draw_vertex_attrib_change, base_rate);
+		perf_run(call, num_vbos, num_ubos, num_textures, "1 texture",
+			 draw_one_texture_change, base_rate);
+		perf_run(call, num_vbos, num_ubos, num_textures, "8 textures",
+			 draw_many_texture_change, base_rate);
+		perf_run(call, num_vbos, num_ubos, num_textures, "1 UBO",
+			 draw_one_ubo_change, base_rate);
+		perf_run(call, num_vbos, num_ubos, num_textures, "4 UBOs",
+			 draw_many_ubo_change, base_rate);
+
+		glUseProgram(prog[0]);
+		uniform_loc = glGetUniformLocation(prog[0], "u");
+		perf_run(call, num_vbos, num_ubos, num_textures, "few uniforms / 1",
+			 draw_uniform_change, base_rate);
+
+		glUseProgram(prog[1]);
+		uniform_loc = glGetUniformLocation(prog[1], "u");
+		perf_run(call, num_vbos, num_ubos, num_textures, "many uniforms / 1",
+			 draw_uniform_change, base_rate);
+		glUseProgram(prog[0]);
+
+		for (int state = 0; state < ARRAY_SIZE(enable_states); state++) {
+			enable_enum = enable_states[state].enable;
+			perf_run(call, num_vbos, num_ubos, num_textures,
+				 enable_states[state].name,
+				 draw_state_change, base_rate);
+		}
+	}
+}
+
+/** Called from test harness/main */
+enum piglit_result
+piglit_display(void)
+{
+	perf_draw_variant("DrawElements", true);
+	perf_draw_variant("DrawArrays", false);
+
+	exit(0);
+	return PIGLIT_SKIP;
+}

Comments

Marek Olšák <maraeo@gmail.com> writes:

> From: Marek Olšák <marek.olsak@amd.com>
>
> Based on a benchmark from mesa/demos, but rewritten and extended.
> It's a benchmark expected to be run separately, not a piglit test.
> So why piglit? Because it's a good framework for writing apps like this.
>
> mesa_glthread won't show an improvement here, because there is no app
> overhead.
>
> This is what the output looks like. The percentage is relative to
> the first test of the given draw call.
>
> The obvious thing there is that enabled vertex attribs decrease
> Mesa performance even if there are no state changes.

Since nobody else has replied,

I think piglit is the wrong place for this.  I agree that it's sorta
convenient, but mesa-demos or glmark2 are the right place.
On 12/05/17 05:49, Eric Anholt wrote:
> Marek Olšák <maraeo@gmail.com> writes:
> 
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> Based on a benchmark from mesa/demos, but rewritten and extended.
>> It's a benchmark expected to be run separately, not a piglit test.
>> So why piglit? Because it's a good framework for writing apps like this.
>>
>> mesa_glthread won't show an improvement here, because there is no app
>> overhead.
>>
>> This is what the output looks like. The percentage is relative to
>> the first test of the given draw call.
>>
>> The obvious thing there is that enabled vertex attribs decrease
>> Mesa performance even if there are no state changes.
> 
> Since nobody else has replied,
> 
> I think piglit is the wrong place for this.  I agree that it's sorta
> convenient, but mesa-demos or glmark2 are the right place.

I'm not totally against this. It would be nice to leverage more of the 
piglit framework/tools to be able to run a group of perf tests and 
generate some kind of comparison automatically. That way way we could 
have an easy tools for comparing releases etc. I know Intel has an 
internal setup for this type of thing, but having something in the test 
suite all mesa devs use would be handy.

Anyway I've given this benchmark a run with my KHR_no_error changes and 
I'm seeing a nice jump in a bunch of the tests.

The biggest jump is:

DrawArrays(16 VBOs, 0 UBOs,  0 Tex) w/ no state change: 5.41 million (49.3%)

DrawArrays(16 VBOs, 0 UBOs,  0 Tex) w/ no state change: 16.33 million 
(109.1%)

But there is a general increase across the majority of tests which is 
encouraging to see as I still haven't been able to measure much change 
in the games I've been trying. I suspect I'll need to have coverage 
across most of the api a game uses before I see much change.
On Thu, May 11, 2017 at 9:49 PM, Eric Anholt <eric@anholt.net> wrote:
> Marek Olšák <maraeo@gmail.com> writes:
>
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> Based on a benchmark from mesa/demos, but rewritten and extended.
>> It's a benchmark expected to be run separately, not a piglit test.
>> So why piglit? Because it's a good framework for writing apps like this.
>>
>> mesa_glthread won't show an improvement here, because there is no app
>> overhead.
>>
>> This is what the output looks like. The percentage is relative to
>> the first test of the given draw call.
>>
>> The obvious thing there is that enabled vertex attribs decrease
>> Mesa performance even if there are no state changes.
>
> Since nobody else has replied,
>
> I think piglit is the wrong place for this.  I agree that it's sorta
> convenient, but mesa-demos or glmark2 are the right place.

It can be moved to mesa/demos or glmark2 if/when they get support for
piglit's dispatch and multiple window systems including gbm/drm, so
that the code size of the test remains the same.

Marek
On 05/15/2017 08:05 AM, Marek Olšák wrote:
> On Thu, May 11, 2017 at 9:49 PM, Eric Anholt <eric@anholt.net> wrote:
>> Marek Olšák <maraeo@gmail.com> writes:
>>
>>> From: Marek Olšák <marek.olsak@amd.com>
>>>
>>> Based on a benchmark from mesa/demos, but rewritten and extended.
>>> It's a benchmark expected to be run separately, not a piglit test.
>>> So why piglit? Because it's a good framework for writing apps like this.
>>>
>>> mesa_glthread won't show an improvement here, because there is no app
>>> overhead.
>>>
>>> This is what the output looks like. The percentage is relative to
>>> the first test of the given draw call.
>>>
>>> The obvious thing there is that enabled vertex attribs decrease
>>> Mesa performance even if there are no state changes.
>>
>> Since nobody else has replied,
>>
>> I think piglit is the wrong place for this.  I agree that it's sorta
>> convenient, but mesa-demos or glmark2 are the right place.
>
> It can be moved to mesa/demos or glmark2 if/when they get support for
> piglit's dispatch and multiple window systems including gbm/drm, so
> that the code size of the test remains the same.

IMHO, I think it's fine to add performance tests to Piglit (I proposed a 
few years ago).  As Marek said, the infrastructure is there and it would 
be wasted effort to duplicate it elsewhere.

-Brian
LGTM.
Reviewed-by: Brian Paul <brianp@vmware.com>

On 04/30/2017 05:58 PM, Marek Olšák wrote:
> From: Marek Olšák <marek.olsak@amd.com>
>
> Based on a benchmark from mesa/demos, but rewritten and extended.
> It's a benchmark expected to be run separately, not a piglit test.
> So why piglit? Because it's a good framework for writing apps like this.
>
> mesa_glthread won't show an improvement here, because there is no app
> overhead.
>
> This is what the output looks like. The percentage is relative to
> the first test of the given draw call.
>
> The obvious thing there is that enabled vertex attribs decrease
> Mesa performance even if there are no state changes.
>
> Using Core profile.
> Draw calls per second:
>     DrawElements ( 1 VBOs, 0 UBOs,  0 Tex) w/ no state change:          5.71 million (100.0%)
>     DrawElements ( 4 VBOs, 0 UBOs,  0 Tex) w/ no state change:          5.18 million (90.8%)
>     DrawElements (16 VBOs, 0 UBOs,  0 Tex) w/ no state change:          3.65 million (63.9%)
>     DrawElements ( 1 VBOs, 0 UBOs, 16 Tex) w/ no state change:          5.71 million (100.0%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ no state change:          5.78 million (101.2%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    220.11 thousand (3.9%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     1.06 million (18.5%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         483.27 thousand (8.5%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        291.20 thousand (5.1%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             1.84 million (32.3%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            1.12 million (19.7%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  2.27 million (39.8%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 966.00 thousand (16.9%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.37 million (24.0%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      1.86 million (32.6%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    1.66 million (29.0%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    1.09 million (19.1%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       1.94 million (34.0%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.56 million (27.3%)
>     DrawElements ( 1 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    200.81 thousand (3.5%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    186.92 thousand (3.3%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     638.49 thousand (11.2%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         452.39 thousand (7.9%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        278.79 thousand (4.9%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             1.47 million (25.7%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            974.30 thousand (17.1%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  1.79 million (31.3%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 853.07 thousand (14.9%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.16 million (20.3%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      1.49 million (26.2%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    1.35 million (23.7%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    946.45 thousand (16.6%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       1.62 million (28.3%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.29 million (22.5%)
>     DrawElements (16 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    126.44 thousand (2.2%)
>     DrawArrays ( 1 VBOs, 0 UBOs,  0 Tex) w/ no state change:          8.02 million (100.0%)
>     DrawArrays ( 4 VBOs, 0 UBOs,  0 Tex) w/ no state change:          7.14 million (89.0%)
>     DrawArrays (16 VBOs, 0 UBOs,  0 Tex) w/ no state change:          4.26 million (53.0%)
>     DrawArrays ( 1 VBOs, 0 UBOs, 16 Tex) w/ no state change:          7.89 million (98.4%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ no state change:          8.01 million (99.9%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    221.09 thousand (2.8%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     1.13 million (14.1%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         500.25 thousand (6.2%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        294.30 thousand (3.7%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             2.02 million (25.2%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            1.18 million (14.7%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  2.28 million (28.4%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 617.79 thousand (7.7%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.59 million (19.8%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      2.09 million (26.0%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    2.02 million (25.2%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    1.18 million (14.7%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       2.27 million (28.3%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.77 million (22.1%)
>     DrawArrays ( 1 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    204.60 thousand (2.6%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ shader program change:    191.50 thousand (2.4%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ vertex attrib change:     679.98 thousand (8.5%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 1 texture change:         472.00 thousand (5.9%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 8 textures change:        286.70 thousand (3.6%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 1 UBO change:             1.69 million (21.0%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ 4 UBOs change:            1.04 million (13.0%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ few uniforms / 1 change:  2.04 million (25.5%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ many uniforms / 1 change: 620.41 thousand (7.7%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ blend enable change:      1.30 million (16.2%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ depth enable change:      1.69 million (21.0%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ stencil enable change:    1.55 million (19.3%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ scissor enable change:    1.04 million (13.0%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ MSAA enable change:       1.82 million (22.7%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ cull face enable change:  1.47 million (18.3%)
>     DrawArrays (16 VBOs, 4 UBOs,  8 Tex) w/ FB sRGB enable change:    129.25 thousand (1.6%)
> ---
>   tests/CMakeLists.txt         |   1 +
>   tests/perf/CMakeLists.gl.txt |  14 ++
>   tests/perf/CMakeLists.txt    |   1 +
>   tests/perf/common.c          | 117 ++++++++++
>   tests/perf/common.h          |  34 +++
>   tests/perf/drawoverhead.c    | 495 +++++++++++++++++++++++++++++++++++++++++++
>   6 files changed, 662 insertions(+)
>   create mode 100644 tests/perf/CMakeLists.gl.txt
>   create mode 100644 tests/perf/CMakeLists.txt
>   create mode 100644 tests/perf/common.c
>   create mode 100644 tests/perf/common.h
>   create mode 100644 tests/perf/drawoverhead.c
>
> diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
> index af842fc..7aab33f 100644
> --- a/tests/CMakeLists.txt
> +++ b/tests/CMakeLists.txt
> @@ -9,20 +9,21 @@ add_subdirectory (fbo)
>   add_subdirectory (general)
>   add_subdirectory (glx)
>   add_subdirectory (glslparsertest)
>   add_subdirectory (hiz)
>   add_subdirectory (asmparsertest)
>   add_subdirectory (security)
>   add_subdirectory (shaders)
>   add_subdirectory (texturing)
>   add_subdirectory (spec)
>   add_subdirectory (fast_color_clear)
> +add_subdirectory (perf)
>
>   if (NOT APPLE)
>   	# glean relies on AGL which is deprecated/broken on recent Mac OS X
>   	# versions
>   	add_subdirectory (glean)
>   endif ()
>
>   IF(EGL_FOUND)
>   	add_subdirectory (egl)
>   ENDIF(EGL_FOUND)
> diff --git a/tests/perf/CMakeLists.gl.txt b/tests/perf/CMakeLists.gl.txt
> new file mode 100644
> index 0000000..f9d3115
> --- /dev/null
> +++ b/tests/perf/CMakeLists.gl.txt
> @@ -0,0 +1,14 @@
> +
> +include_directories(
> +	${GLEXT_INCLUDE_DIR}
> +	${OPENGL_INCLUDE_PATH}
> +)
> +
> +link_libraries (
> +	piglitutil_${piglit_target_api}
> +	${OPENGL_gl_LIBRARY}
> +)
> +
> +piglit_add_executable (drawoverhead drawoverhead.c common.c)
> +
> +# vim: ft=cmake:
> diff --git a/tests/perf/CMakeLists.txt b/tests/perf/CMakeLists.txt
> new file mode 100644
> index 0000000..144a306
> --- /dev/null
> +++ b/tests/perf/CMakeLists.txt
> @@ -0,0 +1 @@
> +piglit_include_target_api()
> diff --git a/tests/perf/common.c b/tests/perf/common.c
> new file mode 100644
> index 0000000..e157d82
> --- /dev/null
> +++ b/tests/perf/common.c
> @@ -0,0 +1,117 @@
> +/*
> + * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included
> + * in all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
> + * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +/**
> + * Common perf code.  This should be re-usable with other tests.
> + */
> +
> +#include "piglit-util-gl.h"
> +#include "common.h"
> +
> +/** Return time in seconds */
> +static double
> +perf_get_time(void)
> +{
> +	return piglit_time_get_nano() * 0.000000001;
> +}
> +
> +/**
> + * Run function 'f' for enough iterations to reach a steady state.
> + * Return the rate (iterations/second).
> + */
> +double
> +perf_measure_rate(perf_rate_func f)
> +{
> +	const double minDuration = 0.5;
> +	double rate = 0.0, prevRate = 0.0;
> +	unsigned subiters;
> +
> +	/* Compute initial number of iterations to try.
> +	 * If the test function is pretty slow this helps to avoid
> +	 * extraordinarily long run times.
> +	 */
> +	subiters = 2;
> +	{
> +		const double t0 = perf_get_time();
> +		double t1;
> +		do {
> +			f(subiters); /* call the rendering function */
> +			glFinish();
> +			t1 = perf_get_time();
> +			subiters *= 2;
> +		} while (t1 - t0 < 0.1 * minDuration);
> +	}
> +	/*perf_printf("initial subIters = %u\n", subiters);*/
> +
> +	while (1) {
> +		const double t0 = perf_get_time();
> +		unsigned iters = 0;
> +		double t1;
> +
> +		do {
> +			f(subiters); /* call the rendering function */
> +			glFinish();
> +			t1 = perf_get_time();
> +			iters += subiters;
> +		} while (t1 - t0 < minDuration);
> +
> +		rate = iters / (t1 - t0);
> +
> +		if (0)
> +			printf("prevRate %f  rate  %f  ratio %f  iters %u\n",
> +			       prevRate, rate, rate/prevRate, iters);
> +
> +		/* Try and speed the search up by skipping a few steps: */
> +		if (rate > prevRate * 1.6)
> +			subiters *= 8;
> +		else if (rate > prevRate * 1.2)
> +			subiters *= 4;
> +		else if (rate > prevRate * 1.05)
> +			subiters *= 2;
> +		else
> +			break;
> +
> +		prevRate = rate;
> +	}
> +
> +	if (0)
> +		printf("%s returning iters %u  rate %f\n", __FUNCTION__, subiters, rate);
> +	return rate;
> +}
> +
> +/* Note static buffer, can only use once per printf.
> + */
> +const char *
> +perf_human_float( double d )
> +{
> +	static char buf[80];
> +
> +	if (d > 1000000000.0)
> +		snprintf(buf, sizeof(buf), "%.2f billion", d / 1000000000.0);
> +	else if (d > 1000000.0)
> +		snprintf(buf, sizeof(buf), "%.2f million", d / 1000000.0);
> +	else if (d > 1000.0)
> +		snprintf(buf, sizeof(buf), "%.2f thousand", d / 1000.0);
> +	else
> +		snprintf(buf, sizeof(buf), "%.2f", d);
> +
> +	return buf;
> +}
> diff --git a/tests/perf/common.h b/tests/perf/common.h
> new file mode 100644
> index 0000000..daa0d42
> --- /dev/null
> +++ b/tests/perf/common.h
> @@ -0,0 +1,34 @@
> +/*
> + * Copyright (C) 2009  VMware, Inc.  All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included
> + * in all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
> + * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef COMMON_H
> +#define COMMON_H
> +
> +typedef void (*perf_rate_func)(unsigned count);
> +
> +double
> +perf_measure_rate(perf_rate_func f);
> +
> +const char *
> +perf_human_float( double d );
> +
> +#endif /* COMMON_H */
> +
> diff --git a/tests/perf/drawoverhead.c b/tests/perf/drawoverhead.c
> new file mode 100644
> index 0000000..9dd110f
> --- /dev/null
> +++ b/tests/perf/drawoverhead.c
> @@ -0,0 +1,495 @@
> +/*
> + * Copyright (C) 2009 VMware, Inc.
> + * Copyright (C) 2017 Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included
> + * in all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * VMWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
> + * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "common.h"
> +#include <stdbool.h>
> +#include "piglit-util-gl.h"
> +
> +PIGLIT_GL_TEST_CONFIG_BEGIN
> +
> +	config.supports_gl_compat_version = 0;
> +	config.supports_gl_core_version = 32;
> +	for (int i = 1; i < argc; i++) {
> +		if (!strcmp(argv[i], "-compat")) {
> +			config.supports_gl_compat_version = 10;
> +			config.supports_gl_core_version = 0;
> +			break;
> +		}
> +	}
> +	puts(config.supports_gl_core_version ? "Using Core profile." :
> +					       "Using Compatibility profile.");
> +	puts("Draw calls per second:");
> +
> +	config.window_visual = PIGLIT_GL_VISUAL_RGBA | PIGLIT_GL_VISUAL_DOUBLE |
> +                               PIGLIT_GL_VISUAL_DEPTH | PIGLIT_GL_VISUAL_STENCIL;
> +
> +PIGLIT_GL_TEST_CONFIG_END
> +
> +static GLuint prog[2], uniform_loc, tex[8], ubo[4];
> +static bool indexed;
> +static GLenum enable_enum;
> +
> +void
> +piglit_init(int argc, char **argv)
> +{
> +	static const unsigned indices[4] = {0, 1, 2, 3};
> +	GLuint vao, ebo;
> +
> +	piglit_require_gl_version(30);
> +
> +	glGenVertexArrays(1, &vao);
> +	glBindVertexArray(vao);
> +
> +	glGenBuffers(1, &ebo);
> +	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
> +	glBufferData(GL_ELEMENT_ARRAY_BUFFER,
> +		     sizeof(indices), indices, GL_STATIC_DRAW);
> +}
> +
> +static void
> +get_vs_text(char *s, unsigned num_vbos, bool is_second)
> +{
> +	unsigned i;
> +
> +	strcpy(s, "#version 130\n"
> +		  "#extension GL_ARB_explicit_attrib_location : require\n");
> +	for (i = 0; i < num_vbos; i++) {
> +		sprintf(s + strlen(s),
> +			"layout (location = %u) in vec4 v%u;\n", i, i);
> +	}
> +	strcat(s, "void main() {\n"
> +		  "	gl_Position = vec4(0.0)");
> +	for (i = 0; i < num_vbos; i++)
> +		sprintf(s + strlen(s), " + v%u", i);
> +	if (is_second)
> +		strcat(s, " + vec4(0.5)");
> +	strcat(s, ";\n}\n");
> +}
> +
> +static void
> +get_fs_text(char *s, unsigned num_ubos, unsigned num_textures, bool is_second)
> +{
> +	unsigned i;
> +
> +	strcpy(s, "#version 130\n"
> +		  "#extension GL_ARB_uniform_buffer_object : require\n"
> +		  "uniform int index = 0;");
> +	sprintf(s + strlen(s), "uniform vec4 u[%u];\n", is_second ? 240 : 1);
> +
> +	for (i = 0; i < num_textures; i++)
> +		sprintf(s + strlen(s), "uniform sampler2D s%u;\n", i);
> +	for (i = 0; i < num_ubos; i++)
> +		sprintf(s + strlen(s), "uniform ub%u { vec4 ubu%u[10]; };\n", i, i);
> +
> +	strcat(s, "void main() {\n");
> +	strcat(s, "	gl_FragData[0] = u[index]");
> +	for (i = 0; i < num_textures; i++)
> +		sprintf(s + strlen(s), " + texture(s%u, u[0].xy)", i);
> +	for (i = 0; i < num_ubos; i++)
> +		sprintf(s + strlen(s), " + ubu%u[index]", i);
> +	if (is_second)
> +		strcat(s, " + vec4(0.5)");
> +	strcat(s, ";\n}\n");
> +}
> +
> +static void
> +setup_shaders_and_resources(unsigned num_vbos,
> +			    unsigned num_ubos,
> +			    unsigned num_textures)
> +{
> +	const unsigned max = 16;
> +	char vs[4096], fs[4096];
> +	unsigned p, i;
> +
> +	assert(num_vbos <= max);
> +	assert(num_ubos <= max);
> +	assert(num_textures <= max);
> +
> +	for (i = 0; i < max; i++)
> +		glDisableVertexAttribArray(i);
> +
> +	/* Create two programs in case we want to test program changes. */
> +	for (p = 0; p < 2; p++) {
> +		get_vs_text(vs, num_vbos, p);
> +		get_fs_text(fs, num_ubos, num_textures, p);
> +		prog[p] = piglit_build_simple_program(vs, fs);
> +
> +		/* Assign texture units to samplers. */
> +		glUseProgram(prog[p]);
> +		for (i = 0; i < num_textures; i++) {
> +			char sampler[20];
> +			int loc;
> +
> +			snprintf(sampler, sizeof(sampler), "s%u", i);
> +			loc = glGetUniformLocation(prog[p], sampler);
> +			assert(loc >= 0);
> +			glUniform1i(loc, i);
> +		}
> +		/* Assign UBO slots to uniform blocks. */
> +		for (i = 0; i < num_ubos; i++) {
> +			char block[20];
> +			int index;
> +
> +			snprintf(block, sizeof(block), "ub%u", i);
> +			index = glGetUniformBlockIndex(prog[p], block);
> +			assert(index != GL_INVALID_INDEX);
> +			glUniformBlockBinding(prog[p], index, i);
> +		}
> +	}
> +	glUseProgram(prog[0]);
> +
> +	for (i = 0; i < num_ubos; i++) {
> +		static const float data[10*4];
> +		GLuint ub;
> +
> +		glGenBuffers(1, &ub);
> +		glBindBuffer(GL_UNIFORM_BUFFER, ub);
> +		glBufferData(GL_UNIFORM_BUFFER, sizeof(data), data,
> +			     GL_STATIC_DRAW);
> +
> +		glBindBufferBase(GL_UNIFORM_BUFFER, i, ub);
> +		/* Save the last UBOs for testing UBO changes. */
> +		ubo[i % 4] = ub;
> +	}
> +	/* setup VBO w/ vertex data, we need a different buffer in each attrib */
> +	for (i = 0; i < num_vbos; i++) {
> +		/* Vertex positions are all zeroed - we want all primitives
> +		 * to be culled.
> +		 */
> +		static const float vertices[4][3];
> +		GLuint vbo;
> +
> +		glGenBuffers(1, &vbo);
> +		glBindBuffer(GL_ARRAY_BUFFER, vbo);
> +		glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices,
> +			     GL_STATIC_DRAW);
> +
> +		glVertexAttribPointer(i, 2, GL_FLOAT, GL_FALSE,
> +				      3 * sizeof(float), NULL);
> +		glEnableVertexAttribArray(i);
> +	}
> +	for (i = 0; i < num_textures; i++) {
> +		glActiveTexture(GL_TEXTURE0 + i);
> +		/* Save the last texture IDs for testing texture changes. */
> +		tex[i % 8] = piglit_rgbw_texture(GL_RGBA8, 4, 4, false, true,
> +						 GL_UNSIGNED_BYTE);
> +	}
> +	glActiveTexture(GL_TEXTURE0);
> +}
> +
> +static void
> +draw(unsigned count)
> +{
> +	unsigned i;
> +	if (indexed) {
> +		for (i = 0; i < count; i++)
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +	} else {
> +		for (i = 0; i < count; i++)
> +			glDrawArrays(GL_POINTS, 0, 4);
> +	}
> +}
> +
> +static void
> +draw_shader_change(unsigned count)
> +{
> +	unsigned i;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			glUseProgram(prog[i & 1]);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			glUseProgram(prog[i & 1]);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +	glUseProgram(prog[0]);
> +}
> +
> +static void
> +draw_uniform_change(unsigned count)
> +{
> +	unsigned i;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			glUniform4f(uniform_loc, i & 1, 0, 0, 0);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			glUniform4f(uniform_loc, i & 1, 0, 0, 0);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +}
> +
> +static void
> +draw_one_texture_change(unsigned count)
> +{
> +	unsigned i;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			glBindTexture(GL_TEXTURE_2D, tex[i & 1]);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			glBindTexture(GL_TEXTURE_2D, tex[i & 1]);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +}
> +
> +static void
> +draw_many_texture_change(unsigned count)
> +{
> +	unsigned i,j;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			for (j = 0; j < 8; j++) {
> +				glActiveTexture(GL_TEXTURE0 + j);
> +				glBindTexture(GL_TEXTURE_2D, tex[(i + j) % 8]);
> +			}
> +			glActiveTexture(GL_TEXTURE0);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			for (j = 0; j < 8; j++) {
> +				glActiveTexture(GL_TEXTURE0 + j);
> +				glBindTexture(GL_TEXTURE_2D, tex[(i + j) % 8]);
> +			}
> +			glActiveTexture(GL_TEXTURE0);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +}
> +
> +static void
> +draw_one_ubo_change(unsigned count)
> +{
> +	unsigned i;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[i & 1]);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[i & 1]);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +}
> +
> +static void
> +draw_many_ubo_change(unsigned count)
> +{
> +	unsigned i,j;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			for (j = 0; j < 4; j++)
> +				glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[(i + j) % 4]);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			for (j = 0; j < 4; j++)
> +				glBindBufferBase(GL_UNIFORM_BUFFER, 0, ubo[(i + j) % 4]);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +}
> +
> +static void
> +draw_state_change(unsigned count)
> +{
> +	unsigned i;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			if (i & 1)
> +				glEnable(enable_enum);
> +			else
> +				glDisable(enable_enum);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			if (i & 1)
> +				glEnable(enable_enum);
> +			else
> +				glDisable(enable_enum);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +	glDisable(enable_enum);
> +}
> +
> +static void
> +draw_vertex_attrib_change(unsigned count)
> +{
> +	unsigned i;
> +	if (indexed) {
> +		for (i = 0; i < count; i++) {
> +			if (i & 1)
> +				glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE,
> +						      3 * sizeof(float), NULL);
> +			else
> +				glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE,
> +						      3 * sizeof(float), NULL);
> +			glDrawElements(GL_POINTS, 4, GL_UNSIGNED_INT, NULL);
> +		}
> +	} else {
> +		for (i = 0; i < count; i++) {
> +			if (i & 1)
> +				glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE,
> +						      3 * sizeof(float), NULL);
> +			else
> +				glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE,
> +						      3 * sizeof(float), NULL);
> +			glDrawArrays(GL_POINTS, 0, 4);
> +		}
> +	}
> +}
> +
> +#define COLOR_RESET	"\033[0m"
> +#define COLOR_RED	"\033[31m"
> +#define COLOR_GREEN	"\033[1;32m"
> +#define COLOR_YELLOW	"\033[1;33m"
> +#define COLOR_CYAN	"\033[1;36m"
> +
> +static double
> +perf_run(const char *call, unsigned num_vbos, unsigned num_ubos,
> +	 unsigned num_textures, const char *change, perf_rate_func f,
> +	 double base_rate)
> +{
> +	double rate = perf_measure_rate(f);
> +	double ratio = base_rate ? rate / base_rate : 1;
> +
> +	printf("   %s (%2u VBOs, %u UBOs, %2u Tex) w/ %s change:%*s"
> +	       COLOR_CYAN "%s" COLOR_RESET " %s(%.1f%%)" COLOR_RESET "\n",
> +	       call, num_vbos, num_ubos, num_textures, change,
> +	       MAX2(18 - (int)strlen(change), 0), "",
> +	       perf_human_float(rate),
> +	       base_rate == 0 ? COLOR_RESET :
> +				ratio > 0.7 ? COLOR_GREEN :
> +				ratio > 0.4 ? COLOR_YELLOW : COLOR_RED,
> +	       100 * ratio);
> +	return rate;
> +}
> +
> +struct enable_state_t {
> +	GLenum enable;
> +	const char *name;
> +};
> +
> +static struct enable_state_t enable_states[] = {
> +	{GL_BLEND,	"blend enable"},
> +	{GL_DEPTH_TEST, "depth enable"},
> +	{GL_STENCIL_TEST, "stencil enable"},
> +	{GL_SCISSOR_TEST, "scissor enable"},
> +	{GL_MULTISAMPLE, "MSAA enable"},
> +	{GL_CULL_FACE,	"cull face enable"},
> +	{GL_FRAMEBUFFER_SRGB, "FB sRGB enable"},
> +};
> +
> +static void
> +perf_draw_variant(const char *call, bool is_indexed)
> +{
> +	double base_rate = 0;
> +	unsigned num_vbos, num_ubos, num_textures;
> +
> +	indexed = is_indexed;
> +
> +	/* Test different shader resource usage without state changes. */
> +	num_ubos = 0;
> +	num_textures = 0;
> +	for (num_vbos = 1; num_vbos <= 16; num_vbos *= 4) {
> +		setup_shaders_and_resources(num_vbos, num_ubos, num_textures);
> +
> +		double rate = perf_run(call, num_vbos, num_ubos, num_textures, "no state",
> +				       draw, base_rate);
> +		if (num_vbos == 1)
> +			base_rate = rate;
> +	}
> +
> +	num_vbos = 1;
> +	num_ubos = 0;
> +	num_textures = 16;
> +	setup_shaders_and_resources(num_vbos, num_ubos, num_textures);
> +	perf_run(call, num_vbos, num_ubos, num_textures, "no state",
> +		 draw, base_rate);
> +
> +	/* Test state changes. */
> +	num_ubos = 4;
> +	num_textures = 8;
> +	for (num_vbos = 1; num_vbos <= 16; num_vbos *= 16) {
> +		setup_shaders_and_resources(num_vbos, num_ubos, num_textures);
> +
> +		perf_run(call, num_vbos, num_ubos, num_textures, "no state",
> +			 draw, base_rate);
> +		perf_run(call, num_vbos, num_ubos, num_textures, "shader program",
> +			 draw_shader_change, base_rate);
> +		perf_run(call, num_vbos, num_ubos, num_textures, "vertex attrib",
> +			 draw_vertex_attrib_change, base_rate);
> +		perf_run(call, num_vbos, num_ubos, num_textures, "1 texture",
> +			 draw_one_texture_change, base_rate);
> +		perf_run(call, num_vbos, num_ubos, num_textures, "8 textures",
> +			 draw_many_texture_change, base_rate);
> +		perf_run(call, num_vbos, num_ubos, num_textures, "1 UBO",
> +			 draw_one_ubo_change, base_rate);
> +		perf_run(call, num_vbos, num_ubos, num_textures, "4 UBOs",
> +			 draw_many_ubo_change, base_rate);
> +
> +		glUseProgram(prog[0]);
> +		uniform_loc = glGetUniformLocation(prog[0], "u");
> +		perf_run(call, num_vbos, num_ubos, num_textures, "few uniforms / 1",
> +			 draw_uniform_change, base_rate);
> +
> +		glUseProgram(prog[1]);
> +		uniform_loc = glGetUniformLocation(prog[1], "u");
> +		perf_run(call, num_vbos, num_ubos, num_textures, "many uniforms / 1",
> +			 draw_uniform_change, base_rate);
> +		glUseProgram(prog[0]);
> +
> +		for (int state = 0; state < ARRAY_SIZE(enable_states); state++) {
> +			enable_enum = enable_states[state].enable;
> +			perf_run(call, num_vbos, num_ubos, num_textures,
> +				 enable_states[state].name,
> +				 draw_state_change, base_rate);
> +		}
> +	}
> +}
> +
> +/** Called from test harness/main */
> +enum piglit_result
> +piglit_display(void)
> +{
> +	perf_draw_variant("DrawElements", true);
> +	perf_draw_variant("DrawArrays", false);
> +
> +	exit(0);
> +	return PIGLIT_SKIP;
> +}
>