[6/8] message-params: Allow parameter strings to contain escaped curly braces

Submitted by Georg Chini on April 9, 2018, 5:35 p.m.

Details

Message ID 20180409173525.18219-7-georg@chini.tk
State Superseded
Headers show
Series "core: Add message sending/receiving" ( rev: 1 ) in PulseAudio

Not browsing as part of any series.

Commit Message

Georg Chini April 9, 2018, 5:35 p.m.
The patch adds the possibility to escape curly braces within parameter strings
and introduces several new functions that can be used for writing parameters.

For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
has been created. Following new write functions are available:

pa_message_param_new() - creates a new pa_message_param structure
pa_message_param_to_string() - converts a pa_message_param to string and frees
the structure
pa_message_param_begin_list() - starts a list
pa_message_param_end_list() - ends a list
pa_message_param_write_string() - writes a string to a pa_message_param structure

For string parameters that contain curly braces, those braces must be escaped
by adding a "\" before them. This however means that a trailing backslash would
falsely escape the closing bracket. To avoid this, single quotes must be added
at start and end of the string. The function pa_message_param_write_string()
has a parameter do_escape. If true, the necessary escaping is added. Escaping
is only needed if a string might fulfill one of the following conditions:

- It contains curly braces
- It contains a trailing "\"
- It is enclosed in single quotes

Other strings can be passed without modification.

For reading, pa_message_param_read_string() reverts the changes that
pa_message_param_write_string() might have introduced.

The patch also adds more restrictions on the object path name. Now only
alphanumeric characters and one of "_", ".", "-" and "/" are allowed.
The path name may not end with a /. If the user specifies a trailing / when
sending a message, it will be removed.
---
 doc/messaging_api.txt           |  34 +++++++-
 src/Makefile.am                 |   3 +-
 src/map-file                    |   5 ++
 src/pulse/message-params.c      | 181 ++++++++++++++++++++++++++++++++++++++--
 src/pulse/message-params.h      |  23 ++++-
 src/pulsecore/core.c            |  20 ++---
 src/pulsecore/message-handler.c |  53 +++++++-----
 src/utils/pactl.c               |   4 +-
 8 files changed, 279 insertions(+), 44 deletions(-)

Patch hide | download patch | download mbox

diff --git a/doc/messaging_api.txt b/doc/messaging_api.txt
index 431a5df2..0e6be53f 100644
--- a/doc/messaging_api.txt
+++ b/doc/messaging_api.txt
@@ -14,10 +14,42 @@  look like that:
 {{Integer} {{1st float} {2nd float} ...}}{...}
 Any characters that are not enclosed in curly braces are ignored (all characters
 between { and {, between } and } and between } and {). The same syntax is used
-to specify message parameters. The following reference lists available messages,
+to specify message parameters. The reference further down lists available messages,
 their parameters and return values. If a return value is enclosed in {}, this
 means that multiple elements of the same type may be returned.
 
+There are several functions that simplify reading and writing message parameter
+strings. For writing, the structure pa_message_param can be used. Following
+functions are available:
+pa_message_param_new() - creates a new pa_message_param structure
+pa_message_param_to_string() - converts a pa_message_param to string and frees
+the structure
+pa_message_param_begin_list() - starts a list
+pa_message_param_end_list() - ends a list
+pa_message_param_write_string() - writes a string to a pa_message_param structure
+
+For string parameters that contain curly braces, those braces must be escaped
+by adding a "\" before them. This however means that a trailing backslash would
+falsely escape the closing bracket. To avoid this, single quotes must be added
+at start and end of the string. The function pa_message_param_write_string()
+has a parameter do_escape. If true, the necessary escaping is added. Escaping
+is only needed if a string might fulfill one of the following conditions:
+
+- It contains curly braces
+- It contains a trailing "\"
+- It is enclosed in single quotes
+
+Other strings can be passed without modification.
+
+For reading, the following functions are available:
+pa_message_param_split_list() - parse message parameter string
+pa_message_param_read_string() - read a string from a parameter list
+
+pa_message_param_read_string() also reverts the changes that
+pa_message_param_write_string() might have introduced.
+
+Reference:
+
 Object path: /core
 Message: list-handlers
 Parameters: None
diff --git a/src/Makefile.am b/src/Makefile.am
index ccdad8ff..72d8cf22 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -675,6 +675,7 @@  libpulsecommon_@PA_MAJORMINOR@_la_SOURCES = \
 		pulse/timeval.c pulse/timeval.h \
 		pulse/rtclock.c pulse/rtclock.h \
 		pulse/volume.c pulse/volume.h \
+		pulse/message-params.c pulse/message-params.h \
 		pulsecore/atomic.h \
 		pulsecore/authkey.c pulsecore/authkey.h \
 		pulsecore/conf-parser.c pulsecore/conf-parser.h \
@@ -884,6 +885,7 @@  libpulse_la_SOURCES = \
 		pulse/mainloop-api.c pulse/mainloop-api.h \
 		pulse/mainloop-signal.c pulse/mainloop-signal.h \
 		pulse/mainloop.c pulse/mainloop.h \
+		pulse/message-params.c pulse/message-params.h \
 		pulse/operation.c pulse/operation.h \
 		pulse/proplist.c pulse/proplist.h \
 		pulse/pulseaudio.h \
@@ -896,7 +898,6 @@  libpulse_la_SOURCES = \
 		pulse/timeval.c pulse/timeval.h \
 		pulse/utf8.c pulse/utf8.h \
 		pulse/util.c pulse/util.h \
-		pulse/message-params.c pulse/message-params.h \
 		pulse/volume.c pulse/volume.h \
 		pulse/xmalloc.c pulse/xmalloc.h
 
diff --git a/src/map-file b/src/map-file
index 385731dc..372d190d 100644
--- a/src/map-file
+++ b/src/map-file
@@ -225,8 +225,13 @@  pa_mainloop_quit;
 pa_mainloop_run;
 pa_mainloop_set_poll_func;
 pa_mainloop_wakeup;
+pa_message_param_begin_list;
+pa_message_param_end_list;
+pa_message_param_new;
 pa_message_param_read_string;
 pa_message_param_split_list;
+pa_message_param_to_string;
+pa_message_param_write_string;
 pa_msleep;
 pa_operation_cancel;
 pa_operation_get_state;
diff --git a/src/pulse/message-params.c b/src/pulse/message-params.c
index 964e29d0..d68ec59d 100644
--- a/src/pulse/message-params.c
+++ b/src/pulse/message-params.c
@@ -28,20 +28,55 @@ 
 #include <pulse/xmalloc.h>
 
 #include <pulsecore/macro.h>
+#include <pulsecore/strbuf.h>
 
 #include "message-params.h"
 
+/* Message parameter structure, a wrapper for pa_strbuf */
+struct pa_message_param {
+    pa_strbuf *buffer;
+};
+
+/* Remove escaping from a string */
+static char *unescape(char *value) {
+    char *tmp;
+
+    if (!value)
+        return NULL;
+
+    /* If the string is enclosed in single quotes, remove them */
+    if (*value == '\'' && value[strlen(value) - 1] == '\'') {
+
+        memmove(value, value + 1, strlen(value));
+        value[strlen(value) - 1] = 0;
+    }
+
+    /* Remove escape character from curly braces if present. */
+    while ((tmp = strstr(value, "\\{")))
+        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
+    while ((tmp = strstr(value, "\\}")))
+        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
+
+    /* Return the pointer that was passed in. */
+    return value;
+}
+
+/* Read functions */
+
 /* Split the specified string into elements. An element is defined as
  * a sub-string between curly braces. The function is needed to parse
  * the parameters of messages. Each time it is called returns the position
  * of the current element in result and the state pointer is advanced to
- * the next list element.
+ * the next list element. On return, the parameter *is_unpacked indicates
+ * if the string is plain text or contains a sub-list. is_unpacked may
+ * be NULL.
  * The variable state points to, should be initialized to NULL before
  * the first call. The function returns 1 on success, 0 if end of string
  * is encountered and -1 on parse error. */
-int pa_message_param_split_list(char *c, char **result, void **state) {
+int pa_message_param_split_list(char *c, char **result, bool *is_unpacked, void **state) {
     char *current = *state ? *state : c;
     uint32_t open_braces;
+    bool found_backslash = false;
 
     pa_assert(result);
 
@@ -54,29 +89,52 @@  int pa_message_param_split_list(char *c, char **result, void **state) {
     /* Find opening brace */
     while (*current != 0) {
 
-        if (*current == '{')
+        /* Skip escaped curly braces. */
+        if (*current == '\\') {
+            found_backslash = true;
+            current++;
+            continue;
+        }
+
+        if (*current == '{' && !found_backslash)
             break;
 
         /* unexpected closing brace, parse error */
-        if (*current == '}')
+        if (*current == '}' && !found_backslash)
             return -1;
 
+        found_backslash = false;
         current++;
     }
 
     /* No opening brace found, end of string */
     if (*current == 0)
-         return 0;
+        return 0;
 
+    if (is_unpacked)
+        *is_unpacked = true;
     *result = current + 1;
+    found_backslash = false;
     open_braces = 1;
 
     while (open_braces != 0 && *current != 0) {
         current++;
-        if (*current == '{')
+
+        /* Skip escaped curly braces. */
+        if (*current == '\\') {
+            found_backslash = true;
+            continue;
+        }
+
+        if (*current == '{' && !found_backslash) {
             open_braces++;
-        if (*current == '}')
+            if (is_unpacked)
+                *is_unpacked = false;
+        }
+        if (*current == '}' && !found_backslash)
             open_braces--;
+
+        found_backslash = false;
     }
 
     /* Parse error, closing brace missing */
@@ -96,21 +154,126 @@  int pa_message_param_split_list(char *c, char **result, void **state) {
 /* Read a string from the parameter list. The state pointer is
  * advanced to the next element of the list. If allocate is
  * true, a newly allocated string will be returned, else a
- * pointer to a sub-string within c. */
+ * pointer to a sub-string within c. Escaping will be removed
+ * if the string does not contain another list. */
 int pa_message_param_read_string(char *c, char **result, bool allocate, void **state) {
     char *start_pos;
     int err;
+    bool is_unpacked;
 
     pa_assert(result);
 
     *result = NULL;
 
-    if ((err = pa_message_param_split_list(c, &start_pos, state)) != 1)
+    if ((err = pa_message_param_split_list(c, &start_pos, &is_unpacked, state)) != 1)
         return err;
 
     *result = start_pos;
+
+    if (is_unpacked)
+        *result = unescape(*result);
+
     if (allocate)
         *result = pa_xstrdup(*result);
 
     return err;
 }
+
+/* Write functions. The functions are wrapper functions around pa_strbuf,
+ * so that the client does not need to use pa_strbuf directly. */
+
+/* Creates a new pa_message_param structure */
+pa_message_param *pa_message_param_new(void) {
+    pa_message_param *param;
+
+    param = pa_xnew(pa_message_param, 1);
+    param->buffer = pa_strbuf_new();
+
+    return param;
+}
+
+/* Converts a pa_message_param structure to string and frees the structure */
+char *pa_message_param_to_string(pa_message_param *param) {
+    char *result;
+
+    pa_assert(param);
+
+    result = pa_strbuf_to_string_free(param->buffer);
+
+    pa_xfree(param);
+    return result;
+}
+
+/* Writes an opening curly brace */
+void pa_message_param_begin_list(pa_message_param *param) {
+
+    pa_assert(param);
+
+    pa_strbuf_putc(param->buffer, '{');
+}
+
+/* Writes a closing curly brace */
+void pa_message_param_end_list(pa_message_param *param) {
+
+    pa_assert(param);
+
+    pa_strbuf_putc(param->buffer, '}');
+}
+
+/* Writes a string to a message_param structure, adding curly braces
+ * around the string. If do_escape is true, leading and trailing single
+ * quotes and also a "\" before curly braces are added to the input string.
+ * Escaping only needs to be used if the original string might fulfill any
+ * of the following conditions:
+ * - It contains curly braces
+ * - It is enclosed in single quotes
+ * - It ends with a "\" */
+void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape) {
+    const char *s;
+    char *output, *out_char;
+    size_t brace_count;
+
+    pa_assert(param);
+
+    /* Null value is written as empty element */
+    if (!value)
+        value = "";
+
+    if (!do_escape) {
+        pa_strbuf_printf(param->buffer, "{%s}", value);
+        return;
+    }
+
+    /* Using pa_strbuf_putc() to write to the strbuf while iterating over
+     * the input string would cause the allocation of a linked list element
+     * for each character of the input string. Therefore the output string
+     * is constructed locally before writing it to the buffer */
+
+    /* Count number of characters to escape */
+    brace_count = 0;
+    for (s = value; *s; ++s) {
+        if (*s == '{' || *s == '}')
+            brace_count++;
+    }
+
+    /* allocate output string */
+    output = pa_xmalloc(strlen(value) + brace_count + 5);
+    out_char = output;
+
+    *out_char++ = '{';
+    *out_char++ = '\'';
+
+    for (s = value; *s; ++s) {
+        if (*s == '{' || *s == '}')
+            *out_char++ = '\\';
+
+        *out_char++ = *s;
+    }
+
+    *out_char++ = '\'';
+    *out_char++ = '}';
+    *out_char = 0;
+
+    pa_strbuf_puts(param->buffer, output);
+    pa_xfree(output);
+}
diff --git a/src/pulse/message-params.h b/src/pulse/message-params.h
index 02e08363..c0675c6e 100644
--- a/src/pulse/message-params.h
+++ b/src/pulse/message-params.h
@@ -30,12 +30,33 @@ 
 
 PA_C_DECL_BEGIN
 
+typedef struct pa_message_param pa_message_param;
+
+/** Read functions */
+
 /** Split message parameter string into list elements */
-int pa_message_param_split_list(char *c, char **result, void **state);
+int pa_message_param_split_list(char *c, char **result, bool * is_unpacked, void **state);
 
 /** Read a string from the parameter list. */
 int pa_message_param_read_string(char *c, char **result, bool allocate, void **state);
 
+/** Write functions */
+
+/** Create a new pa_message_param structure */
+pa_message_param *pa_message_param_new(void);
+
+/** Convert pa_message_param to string, free pa_message_param structure */
+char *pa_message_param_to_string(pa_message_param *param);
+
+/** Write an opening brace */
+void pa_message_param_begin_list(pa_message_param *param);
+
+/** Write a closing brace */
+void pa_message_param_end_list(pa_message_param *param);
+
+/** Append string to parameter list */
+void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape);
+
 PA_C_DECL_END
 
 #endif
diff --git a/src/pulsecore/core.c b/src/pulsecore/core.c
index e95657f0..acd47cbb 100644
--- a/src/pulsecore/core.c
+++ b/src/pulsecore/core.c
@@ -29,6 +29,7 @@ 
 #include <pulse/rtclock.h>
 #include <pulse/timeval.h>
 #include <pulse/xmalloc.h>
+#include <pulse/message-params.h>
 
 #include <pulsecore/module.h>
 #include <pulsecore/core-rtclock.h>
@@ -65,25 +66,24 @@  static void core_free(pa_object *o);
 
 /* Returns a list of handlers. */
 static char *message_handler_list(pa_core *c) {
-    pa_strbuf *buf;
+    pa_message_param *param;
     void *state = NULL;
     struct pa_message_handler *handler;
 
-    buf = pa_strbuf_new();
+    param = pa_message_param_new();
 
-    pa_strbuf_putc(buf, '{');
+    pa_message_param_begin_list(param);
     PA_HASHMAP_FOREACH(handler, c->message_handlers, state) {
-        pa_strbuf_putc(buf, '{');
+        pa_message_param_begin_list(param);
 
-        pa_strbuf_printf(buf, "{%s} {", handler->object_path);
-        if (handler->description)
-            pa_strbuf_puts(buf, handler->description);
+        pa_message_param_write_string(param, handler->object_path, false);
+        pa_message_param_write_string(param, handler->description, true);
 
-        pa_strbuf_puts(buf, "}}");
+        pa_message_param_end_list(param);
     }
-    pa_strbuf_putc(buf, '}');
+    pa_message_param_end_list(param);
 
-    return pa_strbuf_to_string_free(buf);
+    return pa_message_param_to_string(param);
 }
 
 static int core_message_handler(const char *object_path, const char *message, char *message_parameters, char **response, void *userdata) {
diff --git a/src/pulsecore/message-handler.c b/src/pulsecore/message-handler.c
index 427186db..860f3b8e 100644
--- a/src/pulsecore/message-handler.c
+++ b/src/pulsecore/message-handler.c
@@ -31,15 +31,29 @@ 
 
 #include "message-handler.h"
 
-/* Check if a string does not contain control characters. Currently these are
- * only "{" and "}". */
-static bool string_is_valid(const char *test_string) {
+/* Check if a path string starts with a / and only contains valid characters */
+static bool object_path_is_valid(const char *test_string) {
     uint32_t i;
 
+    if (!test_string)
+        return false;
+
+    /* Make sure the string starts with / and does not end with a / */
+    if (test_string[0] != '/' || test_string[strlen(test_string) - 1] == '/')
+        return false;
+
     for (i = 0; test_string[i]; i++) {
-        if (test_string[i] == '{' ||
-            test_string[i] == '}')
-            return false;
+
+        if ((test_string[i] >= 'a' && test_string[i] <= 'z') ||
+            (test_string[i] >= 'A' && test_string[i] <= 'Z') ||
+            (test_string[i] >= '0' && test_string[i] <= '9') ||
+            test_string[i] == '.' ||
+            test_string[i] == '_' ||
+            test_string[i] == '-' ||
+            test_string[i] == '/')
+            continue;
+
+        return false;
     }
 
     return true;
@@ -56,13 +70,8 @@  void pa_message_handler_register(pa_core *c, const char *object_path, const char
     pa_assert(cb);
     pa_assert(userdata);
 
-    /* Ensure that the object path is not empty and starts with "/". */
-    pa_assert(object_path[0] == '/');
-
-    /* Ensure that object path and description are valid strings */
-    pa_assert(string_is_valid(object_path));
-    if (description)
-        pa_assert(string_is_valid(description));
+    /* Ensure that object path is valid */
+    pa_assert(object_path_is_valid(object_path));
 
     handler = pa_xnew0(struct pa_message_handler, 1);
     handler->userdata = userdata;
@@ -91,7 +100,7 @@  void pa_message_handler_unregister(pa_core *c, const char *object_path) {
 int pa_message_handler_send_message(pa_core *c, const char *object_path, const char *message, const char *message_parameters, char **response) {
     struct pa_message_handler *handler;
     int ret;
-    char *parameter_copy;
+    char *parameter_copy, *path_copy;
 
     pa_assert(c);
     pa_assert(object_path);
@@ -100,8 +109,16 @@  int pa_message_handler_send_message(pa_core *c, const char *object_path, const c
 
     *response = NULL;
 
-    if (!(handler = pa_hashmap_get(c->message_handlers, object_path)))
+    path_copy = pa_xstrdup(object_path);
+
+    /* Remove trailing / from path name if present */
+    if (path_copy[strlen(path_copy) - 1] == '/')
+        path_copy[strlen(path_copy) - 1] = 0;
+
+    if (!(handler = pa_hashmap_get(c->message_handlers, path_copy))) {
+        pa_xfree(path_copy);
         return -PA_ERR_NOENTITY;
+    }
 
     parameter_copy = pa_xstrdup(message_parameters);
 
@@ -110,6 +127,7 @@  int pa_message_handler_send_message(pa_core *c, const char *object_path, const c
     ret = handler->callback(handler->object_path, message, parameter_copy, response, handler->userdata);
 
     pa_xfree(parameter_copy);
+    pa_xfree(path_copy);
     return ret;
 }
 
@@ -123,11 +141,6 @@  int pa_message_handler_set_description(pa_core *c, const char *object_path, cons
     if (!(handler = pa_hashmap_get(c->message_handlers, object_path)))
         return -PA_ERR_NOENTITY;
 
-    if (description) {
-        if (!string_is_valid(description))
-            return -PA_ERR_INVALID;
-    }
-
     pa_xfree(handler->description);
     handler->description = pa_xstrdup(description);
 
diff --git a/src/utils/pactl.c b/src/utils/pactl.c
index fdafe57b..1f13e60c 100644
--- a/src/utils/pactl.c
+++ b/src/utils/pactl.c
@@ -864,14 +864,14 @@  static void list_handlers_callback(pa_context *c, int success, char *response, v
         return;
     }
 
-    if (pa_message_param_split_list(response, &param_list, &state) <= 0) {
+    if (pa_message_param_split_list(response, &param_list, NULL, &state) <= 0) {
         pa_log(_("list-handlers message response could not be parsed correctly"));
         quit(1);
         return;
     }
 
     state = NULL;
-    while ((err = pa_message_param_split_list(param_list, &start_pos, &state)) > 0) {
+    while ((err = pa_message_param_split_list(param_list, &start_pos, NULL, &state)) > 0) {
         void *state2 = NULL;
         char *path;
         char *description;

Comments

On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> The patch adds the possibility to escape curly braces within parameter strings
> and introduces several new functions that can be used for writing parameters.
> 
> For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
> has been created. Following new write functions are available:
> 
> pa_message_param_new() - creates a new pa_message_param structure
> pa_message_param_to_string() - converts a pa_message_param to string and frees
> the structure

I'd like to have the _free suffix in the function name, because for the
uninitiated the current name looks like just another "to_string"
function with no surprising side effects.

I think we need pa_message_params_free() as well in case an application
runs into an error while constructing the parameters and it wants to
just get rid of the pa_message_params struct without converting it to a
string first.

> pa_message_param_begin_list() - starts a list
> pa_message_param_end_list() - ends a list
> pa_message_param_write_string() - writes a string to a pa_message_param structure
> 
> For string parameters that contain curly braces, those braces must be escaped
> by adding a "\" before them. This however means that a trailing backslash would
> falsely escape the closing bracket. To avoid this, single quotes must be added
> at start and end of the string.

Why not solve this by the usual method: require escaping of "\" in
input with "\\" in output?

> The function pa_message_param_write_string()
> has a parameter do_escape.

Why not do escaping always?

> If true, the necessary escaping is added. Escaping
> is only needed if a string might fulfill one of the following conditions:
> 
> - It contains curly braces
> - It contains a trailing "\"
> - It is enclosed in single quotes
> 
> Other strings can be passed without modification.
> 
> For reading, pa_message_param_read_string() reverts the changes that
> pa_message_param_write_string() might have introduced.
> 
> The patch also adds more restrictions on the object path name. Now only
> alphanumeric characters and one of "_", ".", "-" and "/" are allowed.
> The path name may not end with a /. If the user specifies a trailing / when
> sending a message, it will be removed.
> ---
>  doc/messaging_api.txt           |  34 +++++++-
>  src/Makefile.am                 |   3 +-
>  src/map-file                    |   5 ++
>  src/pulse/message-params.c      | 181 ++++++++++++++++++++++++++++++++++++++--
>  src/pulse/message-params.h      |  23 ++++-
>  src/pulsecore/core.c            |  20 ++---
>  src/pulsecore/message-handler.c |  53 +++++++-----
>  src/utils/pactl.c               |   4 +-
>  8 files changed, 279 insertions(+), 44 deletions(-)
> 
> diff --git a/doc/messaging_api.txt b/doc/messaging_api.txt
> index 431a5df2..0e6be53f 100644
> --- a/doc/messaging_api.txt
> +++ b/doc/messaging_api.txt
> @@ -14,10 +14,42 @@ look like that:
>  {{Integer} {{1st float} {2nd float} ...}}{...}
>  Any characters that are not enclosed in curly braces are ignored (all characters
>  between { and {, between } and } and between } and {). The same syntax is used
> -to specify message parameters. The following reference lists available messages,
> +to specify message parameters. The reference further down lists available messages,
>  their parameters and return values. If a return value is enclosed in {}, this
>  means that multiple elements of the same type may be returned.
>  
> +There are several functions that simplify reading and writing message parameter
> +strings. For writing, the structure pa_message_param can be used. Following
> +functions are available:
> +pa_message_param_new() - creates a new pa_message_param structure
> +pa_message_param_to_string() - converts a pa_message_param to string and frees
> +the structure
> +pa_message_param_begin_list() - starts a list
> +pa_message_param_end_list() - ends a list
> +pa_message_param_write_string() - writes a string to a pa_message_param structure
> +
> +For string parameters that contain curly braces, those braces must be escaped
> +by adding a "\" before them. This however means that a trailing backslash would
> +falsely escape the closing bracket. To avoid this, single quotes must be added
> +at start and end of the string. The function pa_message_param_write_string()
> +has a parameter do_escape. If true, the necessary escaping is added. Escaping
> +is only needed if a string might fulfill one of the following conditions:
> +
> +- It contains curly braces
> +- It contains a trailing "\"
> +- It is enclosed in single quotes
> +
> +Other strings can be passed without modification.
> +
> +For reading, the following functions are available:
> +pa_message_param_split_list() - parse message parameter string
> +pa_message_param_read_string() - read a string from a parameter list
> +
> +pa_message_param_read_string() also reverts the changes that
> +pa_message_param_write_string() might have introduced.

I think the doxygen documentation is better suited for the details of
the C API. messaging_api.txt should contain information that is common
to both libpulse and pactl users.

> +
> +Reference:
> +
>  Object path: /core
>  Message: list-handlers
>  Parameters: None
> diff --git a/src/Makefile.am b/src/Makefile.am
> index ccdad8ff..72d8cf22 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -675,6 +675,7 @@ libpulsecommon_@PA_MAJORMINOR@_la_SOURCES = \
>  		pulse/timeval.c pulse/timeval.h \
>  		pulse/rtclock.c pulse/rtclock.h \
>  		pulse/volume.c pulse/volume.h \
> +		pulse/message-params.c pulse/message-params.h \
>  		pulsecore/atomic.h \
>  		pulsecore/authkey.c pulsecore/authkey.h \
>  		pulsecore/conf-parser.c pulsecore/conf-parser.h \
> @@ -884,6 +885,7 @@ libpulse_la_SOURCES = \
>  		pulse/mainloop-api.c pulse/mainloop-api.h \
>  		pulse/mainloop-signal.c pulse/mainloop-signal.h \
>  		pulse/mainloop.c pulse/mainloop.h \
> +		pulse/message-params.c pulse/message-params.h \
>  		pulse/operation.c pulse/operation.h \
>  		pulse/proplist.c pulse/proplist.h \
>  		pulse/pulseaudio.h \
> @@ -896,7 +898,6 @@ libpulse_la_SOURCES = \
>  		pulse/timeval.c pulse/timeval.h \
>  		pulse/utf8.c pulse/utf8.h \
>  		pulse/util.c pulse/util.h \
> -		pulse/message-params.c pulse/message-params.h \
>  		pulse/volume.c pulse/volume.h \
>  		pulse/xmalloc.c pulse/xmalloc.h
>  
> diff --git a/src/map-file b/src/map-file
> index 385731dc..372d190d 100644
> --- a/src/map-file
> +++ b/src/map-file
> @@ -225,8 +225,13 @@ pa_mainloop_quit;
>  pa_mainloop_run;
>  pa_mainloop_set_poll_func;
>  pa_mainloop_wakeup;
> +pa_message_param_begin_list;
> +pa_message_param_end_list;
> +pa_message_param_new;
>  pa_message_param_read_string;
>  pa_message_param_split_list;
> +pa_message_param_to_string;
> +pa_message_param_write_string;
>  pa_msleep;
>  pa_operation_cancel;
>  pa_operation_get_state;
> diff --git a/src/pulse/message-params.c b/src/pulse/message-params.c
> index 964e29d0..d68ec59d 100644
> --- a/src/pulse/message-params.c
> +++ b/src/pulse/message-params.c
> @@ -28,20 +28,55 @@
>  #include <pulse/xmalloc.h>
>  
>  #include <pulsecore/macro.h>
> +#include <pulsecore/strbuf.h>
>  
>  #include "message-params.h"
>  
> +/* Message parameter structure, a wrapper for pa_strbuf */
> +struct pa_message_param {
> +    pa_strbuf *buffer;
> +};
> +
> +/* Remove escaping from a string */
> +static char *unescape(char *value) {
> +    char *tmp;
> +
> +    if (!value)
> +        return NULL;
> +
> +    /* If the string is enclosed in single quotes, remove them */
> +    if (*value == '\'' && value[strlen(value) - 1] == '\'') {
> +
> +        memmove(value, value + 1, strlen(value));
> +        value[strlen(value) - 1] = 0;
> +    }
> +
> +    /* Remove escape character from curly braces if present. */
> +    while ((tmp = strstr(value, "\\{")))
> +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
> +    while ((tmp = strstr(value, "\\}")))
> +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
> +
> +    /* Return the pointer that was passed in. */
> +    return value;
> +}

core-util already contains pa_unescape() that does the same thing more
efficiently (if you drop the single quote thing).

> +
> +/* Read functions */
> +
>  /* Split the specified string into elements. An element is defined as
>   * a sub-string between curly braces. The function is needed to parse
>   * the parameters of messages. Each time it is called returns the position
>   * of the current element in result and the state pointer is advanced to
> - * the next list element.
> + * the next list element. On return, the parameter *is_unpacked indicates
> + * if the string is plain text or contains a sub-list. is_unpacked may
> + * be NULL.

is_unpacked looks like unnecessary complexity.
pa_message_params_read_string() should always unescape the value.

>   * The variable state points to, should be initialized to NULL before
>   * the first call. The function returns 1 on success, 0 if end of string
>   * is encountered and -1 on parse error. */
> -int pa_message_param_split_list(char *c, char **result, void **state) {
> +int pa_message_param_split_list(char *c, char **result, bool *is_unpacked, void **state) {
>      char *current = *state ? *state : c;
>      uint32_t open_braces;
> +    bool found_backslash = false;
>  
>      pa_assert(result);
>  
> @@ -54,29 +89,52 @@ int pa_message_param_split_list(char *c, char **result, void **state) {
>      /* Find opening brace */
>      while (*current != 0) {
>  
> -        if (*current == '{')
> +        /* Skip escaped curly braces. */
> +        if (*current == '\\') {

With my proposed escaping rules, this should be

    if (*current == '\\' && !found_backslash)

because if we have two backslashes in a row, the next character is not
escaped.

> +            found_backslash = true;
> +            current++;
> +            continue;
> +        }
> +
> +        if (*current == '{' && !found_backslash)
>              break;
>  
>          /* unexpected closing brace, parse error */
> -        if (*current == '}')
> +        if (*current == '}' && !found_backslash)
>              return -1;
>  
> +        found_backslash = false;
>          current++;
>      }
>  
>      /* No opening brace found, end of string */
>      if (*current == 0)
> -         return 0;
> +        return 0;
>  
> +    if (is_unpacked)
> +        *is_unpacked = true;
>      *result = current + 1;
> +    found_backslash = false;
>      open_braces = 1;
>  
>      while (open_braces != 0 && *current != 0) {
>          current++;
> -        if (*current == '{')
> +
> +        /* Skip escaped curly braces. */
> +        if (*current == '\\') {

Same as above, with my proposed unescaping rules this should be

    if (*current == '\\' && !found_backslash)

> +            found_backslash = true;
> +            continue;
> +        }
> +
> +        if (*current == '{' && !found_backslash) {
>              open_braces++;
> -        if (*current == '}')
> +            if (is_unpacked)
> +                *is_unpacked = false;
> +        }
> +        if (*current == '}' && !found_backslash)
>              open_braces--;
> +
> +        found_backslash = false;
>      }
>  
>      /* Parse error, closing brace missing */
> @@ -96,21 +154,126 @@ int pa_message_param_split_list(char *c, char **result, void **state) {
>  /* Read a string from the parameter list. The state pointer is
>   * advanced to the next element of the list. If allocate is
>   * true, a newly allocated string will be returned, else a
> - * pointer to a sub-string within c. */
> + * pointer to a sub-string within c. Escaping will be removed
> + * if the string does not contain another list. */

A string is a string, it's not a list. I think this function should
always do unescaping. If a string value is read that is missing
escaping for curly braces, that could be reported as an error (I'm not
saying "should", but I think it would be good to do the validation in
this function).

>  int pa_message_param_read_string(char *c, char **result, bool allocate, void **state) {
>      char *start_pos;
>      int err;
> +    bool is_unpacked;
>  
>      pa_assert(result);
>  
>      *result = NULL;
>  
> -    if ((err = pa_message_param_split_list(c, &start_pos, state)) != 1)
> +    if ((err = pa_message_param_split_list(c, &start_pos, &is_unpacked, state)) != 1)
>          return err;
>  
>      *result = start_pos;
> +
> +    if (is_unpacked)
> +        *result = unescape(*result);
> +
>      if (allocate)
>          *result = pa_xstrdup(*result);
>  
>      return err;
>  }
> +
> +/* Write functions. The functions are wrapper functions around pa_strbuf,
> + * so that the client does not need to use pa_strbuf directly. */
> +
> +/* Creates a new pa_message_param structure */
> +pa_message_param *pa_message_param_new(void) {
> +    pa_message_param *param;
> +
> +    param = pa_xnew(pa_message_param, 1);
> +    param->buffer = pa_strbuf_new();
> +
> +    return param;
> +}
> +
> +/* Converts a pa_message_param structure to string and frees the structure */
> +char *pa_message_param_to_string(pa_message_param *param) {
> +    char *result;
> +
> +    pa_assert(param);
> +
> +    result = pa_strbuf_to_string_free(param->buffer);
> +
> +    pa_xfree(param);
> +    return result;
> +}
> +
> +/* Writes an opening curly brace */
> +void pa_message_param_begin_list(pa_message_param *param) {
> +
> +    pa_assert(param);
> +
> +    pa_strbuf_putc(param->buffer, '{');
> +}
> +
> +/* Writes a closing curly brace */
> +void pa_message_param_end_list(pa_message_param *param) {
> +
> +    pa_assert(param);
> +
> +    pa_strbuf_putc(param->buffer, '}');
> +}
> +
> +/* Writes a string to a message_param structure, adding curly braces
> + * around the string. If do_escape is true, leading and trailing single
> + * quotes and also a "\" before curly braces are added to the input string.
> + * Escaping only needs to be used if the original string might fulfill any
> + * of the following conditions:
> + * - It contains curly braces
> + * - It is enclosed in single quotes
> + * - It ends with a "\" */
> +void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape) {

I would drop the do_escape flag, and if there's need for appending
unescaped raw data to pa_message_params, there could be a separate
function for that.

> +    const char *s;
> +    char *output, *out_char;
> +    size_t brace_count;
> +
> +    pa_assert(param);
> +
> +    /* Null value is written as empty element */
> +    if (!value)
> +        value = "";
> +
> +    if (!do_escape) {
> +        pa_strbuf_printf(param->buffer, "{%s}", value);
> +        return;
> +    }
> +
> +    /* Using pa_strbuf_putc() to write to the strbuf while iterating over
> +     * the input string would cause the allocation of a linked list element
> +     * for each character of the input string. Therefore the output string
> +     * is constructed locally before writing it to the buffer */

This is an interesting point. I think you should improve pa_escape(),
which currently uses the naive pa_strbuf_putc() method. With my
proposed escaping rules, you could then replace this code with a call
to pa_escape().

> +
> +    /* Count number of characters to escape */
> +    brace_count = 0;
> +    for (s = value; *s; ++s) {
> +        if (*s == '{' || *s == '}')
> +            brace_count++;
> +    }

You could at the same time count all characters to save the strlen()
call.

> +
> +    /* allocate output string */
> +    output = pa_xmalloc(strlen(value) + brace_count + 5);

It would be good to have a comment for that magic value 5.

> +    out_char = output;
> +
> +    *out_char++ = '{';
> +    *out_char++ = '\'';
> +
> +    for (s = value; *s; ++s) {
> +        if (*s == '{' || *s == '}')
> +            *out_char++ = '\\';
> +
> +        *out_char++ = *s;
> +    }
> +
> +    *out_char++ = '\'';
> +    *out_char++ = '}';
> +    *out_char = 0;
> +
> +    pa_strbuf_puts(param->buffer, output);
> +    pa_xfree(output);
> +}
> diff --git a/src/pulse/message-params.h b/src/pulse/message-params.h
> index 02e08363..c0675c6e 100644
> --- a/src/pulse/message-params.h
> +++ b/src/pulse/message-params.h
> @@ -30,12 +30,33 @@
>  
>  PA_C_DECL_BEGIN
>  
> +typedef struct pa_message_param pa_message_param;
> +
> +/** Read functions */

While wondering how this kind of lone comments show up in doxygen, I
noticed that message-params.h is not listed in the INPUT list in
doxygen/doxygen.conf.in, so no documentation is generated.

> +
>  /** Split message parameter string into list elements */
> -int pa_message_param_split_list(char *c, char **result, void **state);
> +int pa_message_param_split_list(char *c, char **result, bool * is_unpacked, void **state);
>  
>  /** Read a string from the parameter list. */
>  int pa_message_param_read_string(char *c, char **result, bool allocate, void **state);

It's worth noting in the documentation that the function unescapes the
string.

>  
> +/** Write functions */
> +
> +/** Create a new pa_message_param structure */
> +pa_message_param *pa_message_param_new(void);
> +
> +/** Convert pa_message_param to string, free pa_message_param structure */
> +char *pa_message_param_to_string(pa_message_param *param);

It should be documented that the returned string should be freed with
pa_xfree(). (Also remember the \since tags.)

> +
> +/** Write an opening brace */
> +void pa_message_param_begin_list(pa_message_param *param);
> +
> +/** Write a closing brace */
> +void pa_message_param_end_list(pa_message_param *param);
> +
> +/** Append string to parameter list */
> +void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape);

The documentation should mention that the function does escaping,
escpecially if you drop the do_escape flag as I hope, but even if the
flag stays, it should be described.

> +
>  PA_C_DECL_END
>  
>  #endif
> diff --git a/src/pulsecore/core.c b/src/pulsecore/core.c
> index e95657f0..acd47cbb 100644
> --- a/src/pulsecore/core.c
> +++ b/src/pulsecore/core.c
> @@ -29,6 +29,7 @@
>  #include <pulse/rtclock.h>
>  #include <pulse/timeval.h>
>  #include <pulse/xmalloc.h>
> +#include <pulse/message-params.h>
>  
>  #include <pulsecore/module.h>
>  #include <pulsecore/core-rtclock.h>
> @@ -65,25 +66,24 @@ static void core_free(pa_object *o);
>  
>  /* Returns a list of handlers. */
>  static char *message_handler_list(pa_core *c) {
> -    pa_strbuf *buf;
> +    pa_message_param *param;
>      void *state = NULL;
>      struct pa_message_handler *handler;
>  
> -    buf = pa_strbuf_new();
> +    param = pa_message_param_new();
>  
> -    pa_strbuf_putc(buf, '{');
> +    pa_message_param_begin_list(param);
>      PA_HASHMAP_FOREACH(handler, c->message_handlers, state) {
> -        pa_strbuf_putc(buf, '{');
> +        pa_message_param_begin_list(param);
>  
> -        pa_strbuf_printf(buf, "{%s} {", handler->object_path);
> -        if (handler->description)
> -            pa_strbuf_puts(buf, handler->description);
> +        pa_message_param_write_string(param, handler->object_path, false);
> +        pa_message_param_write_string(param, handler->description, true);
>  
> -        pa_strbuf_puts(buf, "}}");
> +        pa_message_param_end_list(param);
>      }
> -    pa_strbuf_putc(buf, '}');
> +    pa_message_param_end_list(param);
>  
> -    return pa_strbuf_to_string_free(buf);
> +    return pa_message_param_to_string(param);
>  }
>  
>  static int core_message_handler(const char *object_path, const char *message, char *message_parameters, char **response, void *userdata) {
> diff --git a/src/pulsecore/message-handler.c b/src/pulsecore/message-handler.c
> index 427186db..860f3b8e 100644
> --- a/src/pulsecore/message-handler.c
> +++ b/src/pulsecore/message-handler.c
> @@ -31,15 +31,29 @@
>  
>  #include "message-handler.h"
>  
> -/* Check if a string does not contain control characters. Currently these are
> - * only "{" and "}". */
> -static bool string_is_valid(const char *test_string) {
> +/* Check if a path string starts with a / and only contains valid characters */
> +static bool object_path_is_valid(const char *test_string) {
>      uint32_t i;
>  
> +    if (!test_string)
> +        return false;
> +
> +    /* Make sure the string starts with / and does not end with a / */
> +    if (test_string[0] != '/' || test_string[strlen(test_string) - 1] == '/')
> +        return false;
> +
>      for (i = 0; test_string[i]; i++) {
> -        if (test_string[i] == '{' ||
> -            test_string[i] == '}')
> -            return false;
> +
> +        if ((test_string[i] >= 'a' && test_string[i] <= 'z') ||
> +            (test_string[i] >= 'A' && test_string[i] <= 'Z') ||
> +            (test_string[i] >= '0' && test_string[i] <= '9') ||
> +            test_string[i] == '.' ||
> +            test_string[i] == '_' ||
> +            test_string[i] == '-' ||
> +            test_string[i] == '/')
> +            continue;
> +
> +        return false;
>      }

You could save the strlen() call if you checked the trailing slash
after this for loop when you're at the end of the string.
	
By the way, do you perhaps want to disallow two subsequent slashes in
object paths? (I don't care myself, because I doubt that invalid paths
will ever be encountered anyway when registering message handlers.)

>  
>      return true;
> @@ -56,13 +70,8 @@ void pa_message_handler_register(pa_core *c, const char *object_path, const char
>      pa_assert(cb);
>      pa_assert(userdata);
>  
> -    /* Ensure that the object path is not empty and starts with "/". */
> -    pa_assert(object_path[0] == '/');
> -
> -    /* Ensure that object path and description are valid strings */
> -    pa_assert(string_is_valid(object_path));
> -    if (description)
> -        pa_assert(string_is_valid(description));
> +    /* Ensure that object path is valid */
> +    pa_assert(object_path_is_valid(object_path));
>  
>      handler = pa_xnew0(struct pa_message_handler, 1);
>      handler->userdata = userdata;
> @@ -91,7 +100,7 @@ void pa_message_handler_unregister(pa_core *c, const char *object_path) {
>  int pa_message_handler_send_message(pa_core *c, const char *object_path, const char *message, const char *message_parameters, char **response) {
>      struct pa_message_handler *handler;
>      int ret;
> -    char *parameter_copy;
> +    char *parameter_copy, *path_copy;
>  
>      pa_assert(c);
>      pa_assert(object_path);
> @@ -100,8 +109,16 @@ int pa_message_handler_send_message(pa_core *c, const char *object_path, const c
>  
>      *response = NULL;
>  
> -    if (!(handler = pa_hashmap_get(c->message_handlers, object_path)))
> +    path_copy = pa_xstrdup(object_path);
> +
> +    /* Remove trailing / from path name if present */
> +    if (path_copy[strlen(path_copy) - 1] == '/')
> +        path_copy[strlen(path_copy) - 1] = 0;

If you want to allow some slack for applications, would it be better to
do this extra work in libpulse rather than in the daemon?

I would personally not bother with this at all, but I don't mind it
that much (even if it's done in the daemon).
On 21.07.2018 20:17, Tanu Kaskinen wrote:
> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>> The patch adds the possibility to escape curly braces within parameter strings
>> and introduces several new functions that can be used for writing parameters.
>>
>> For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
>> has been created. Following new write functions are available:
>>
>> pa_message_param_new() - creates a new pa_message_param structure
>> pa_message_param_to_string() - converts a pa_message_param to string and frees
>> the structure
> I'd like to have the _free suffix in the function name, because for the
> uninitiated the current name looks like just another "to_string"
> function with no surprising side effects.
>
> I think we need pa_message_params_free() as well in case an application
> runs into an error while constructing the parameters and it wants to
> just get rid of the pa_message_params struct without converting it to a
> string first.
>
>> pa_message_param_begin_list() - starts a list
>> pa_message_param_end_list() - ends a list
>> pa_message_param_write_string() - writes a string to a pa_message_param structure
>>
>> For string parameters that contain curly braces, those braces must be escaped
>> by adding a "\" before them. This however means that a trailing backslash would
>> falsely escape the closing bracket. To avoid this, single quotes must be added
>> at start and end of the string.
> Why not solve this by the usual method: require escaping of "\" in
> input with "\\" in output?

You are right, if I modify pa_unescape() as mentioned further below and then
use pa_escape()/pa_unescape() the quotes won't be necessary.

>
>> The function pa_message_param_write_string()
>> has a parameter do_escape.
> Why not do escaping always?

Because what you are writing as a string may be a list that you have 
prepared
previously. Then you will not want escaping. You may for example create 
a list
from an array and then insert this list as one string into the final 
parameter list
or have a function that converts a certain structure to a parameter 
string and
then write the result of this function as one element of the final list.

>
>> If true, the necessary escaping is added. Escaping
>> is only needed if a string might fulfill one of the following conditions:
>>
>> - It contains curly braces
>> - It contains a trailing "\"
>> - It is enclosed in single quotes
>>
>> Other strings can be passed without modification.
>>
>> For reading, pa_message_param_read_string() reverts the changes that
>> pa_message_param_write_string() might have introduced.
>>
>> The patch also adds more restrictions on the object path name. Now only
>> alphanumeric characters and one of "_", ".", "-" and "/" are allowed.
>> The path name may not end with a /. If the user specifies a trailing / when
>> sending a message, it will be removed.
>> ---
>>   doc/messaging_api.txt           |  34 +++++++-
>>   src/Makefile.am                 |   3 +-
>>   src/map-file                    |   5 ++
>>   src/pulse/message-params.c      | 181 ++++++++++++++++++++++++++++++++++++++--
>>   src/pulse/message-params.h      |  23 ++++-
>>   src/pulsecore/core.c            |  20 ++---
>>   src/pulsecore/message-handler.c |  53 +++++++-----
>>   src/utils/pactl.c               |   4 +-
>>   8 files changed, 279 insertions(+), 44 deletions(-)
>>
>> diff --git a/doc/messaging_api.txt b/doc/messaging_api.txt
>> index 431a5df2..0e6be53f 100644
>> --- a/doc/messaging_api.txt
>> +++ b/doc/messaging_api.txt
>> @@ -14,10 +14,42 @@ look like that:
>>   {{Integer} {{1st float} {2nd float} ...}}{...}
>>   Any characters that are not enclosed in curly braces are ignored (all characters
>>   between { and {, between } and } and between } and {). The same syntax is used
>> -to specify message parameters. The following reference lists available messages,
>> +to specify message parameters. The reference further down lists available messages,
>>   their parameters and return values. If a return value is enclosed in {}, this
>>   means that multiple elements of the same type may be returned.
>>   
>> +There are several functions that simplify reading and writing message parameter
>> +strings. For writing, the structure pa_message_param can be used. Following
>> +functions are available:
>> +pa_message_param_new() - creates a new pa_message_param structure
>> +pa_message_param_to_string() - converts a pa_message_param to string and frees
>> +the structure
>> +pa_message_param_begin_list() - starts a list
>> +pa_message_param_end_list() - ends a list
>> +pa_message_param_write_string() - writes a string to a pa_message_param structure
>> +
>> +For string parameters that contain curly braces, those braces must be escaped
>> +by adding a "\" before them. This however means that a trailing backslash would
>> +falsely escape the closing bracket. To avoid this, single quotes must be added
>> +at start and end of the string. The function pa_message_param_write_string()
>> +has a parameter do_escape. If true, the necessary escaping is added. Escaping
>> +is only needed if a string might fulfill one of the following conditions:
>> +
>> +- It contains curly braces
>> +- It contains a trailing "\"
>> +- It is enclosed in single quotes
>> +
>> +Other strings can be passed without modification.
>> +
>> +For reading, the following functions are available:
>> +pa_message_param_split_list() - parse message parameter string
>> +pa_message_param_read_string() - read a string from a parameter list
>> +
>> +pa_message_param_read_string() also reverts the changes that
>> +pa_message_param_write_string() might have introduced.
> I think the doxygen documentation is better suited for the details of
> the C API. messaging_api.txt should contain information that is common
> to both libpulse and pactl users.

Somehow it is not really clear to me, which documentation you want
to have in messaging_api.txt.

>
>> +/* Remove escaping from a string */
>> +static char *unescape(char *value) {
>> +    char *tmp;
>> +
>> +    if (!value)
>> +        return NULL;
>> +
>> +    /* If the string is enclosed in single quotes, remove them */
>> +    if (*value == '\'' && value[strlen(value) - 1] == '\'') {
>> +
>> +        memmove(value, value + 1, strlen(value));
>> +        value[strlen(value) - 1] = 0;
>> +    }
>> +
>> +    /* Remove escape character from curly braces if present. */
>> +    while ((tmp = strstr(value, "\\{")))
>> +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
>> +    while ((tmp = strstr(value, "\\}")))
>> +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
>> +
>> +    /* Return the pointer that was passed in. */
>> +    return value;
>> +}
> core-util already contains pa_unescape() that does the same thing more
> efficiently (if you drop the single quote thing).

pa_unescape() currently does not do the same thing. It removes all
escape characters, while I only want to remove the characters
I actually introduced (those before { or }).
I can however modify pa_unescape() to take the same arguments
as pa_escape().

>
>> +
>> +/* Read functions */
>> +
>>   /* Split the specified string into elements. An element is defined as
>>    * a sub-string between curly braces. The function is needed to parse
>>    * the parameters of messages. Each time it is called returns the position
>>    * of the current element in result and the state pointer is advanced to
>> - * the next list element.
>> + * the next list element. On return, the parameter *is_unpacked indicates
>> + * if the string is plain text or contains a sub-list. is_unpacked may
>> + * be NULL.
> is_unpacked looks like unnecessary complexity.
> pa_message_params_read_string() should always unescape the value.

It may be possible, that the string you read is a list. Consider the 
following
parameter list: {string1}{some nested structure}{string2}. You can now
read this list as three strings and then continue to read the elements of
the nested structure from the second string. You might even create a 
function
that takes a string and outputs a structure. So you are not forced to go
to the full depth of nesting on the first pass. This makes it much easier
to handle deeply nested parameter lists. For me this behavior is an 
important
feature and I do not want to drop it. See also my comment on why I do
not always want escaping.

>
>>    
>>   
>>       /* Parse error, closing brace missing */
>> @@ -96,21 +154,126 @@ int pa_message_param_split_list(char *c, char **result, void **state) {
>>   /* Read a string from the parameter list. The state pointer is
>>    * advanced to the next element of the list. If allocate is
>>    * true, a newly allocated string will be returned, else a
>> - * pointer to a sub-string within c. */
>> + * pointer to a sub-string within c. Escaping will be removed
>> + * if the string does not contain another list. */
> A string is a string, it's not a list. I think this function should
> always do unescaping. If a string value is read that is missing
> escaping for curly braces, that could be reported as an error (I'm not
> saying "should", but I think it would be good to do the validation in
> this function).

See my comments above why a string may be a list.

>
>> +
>> +/* Writes a string to a message_param structure, adding curly braces
>> + * around the string. If do_escape is true, leading and trailing single
>> + * quotes and also a "\" before curly braces are added to the input string.
>> + * Escaping only needs to be used if the original string might fulfill any
>> + * of the following conditions:
>> + * - It contains curly braces
>> + * - It is enclosed in single quotes
>> + * - It ends with a "\" */
>> +void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape) {
> I would drop the do_escape flag, and if there's need for appending
> unescaped raw data to pa_message_params, there could be a separate
> function for that.

See above. A function for writing raw data should nevertheless be provided
for the case where a string can only be written as several parts.

>
>> +    const char *s;
>> +    char *output, *out_char;
>> +    size_t brace_count;
>> +
>> +    pa_assert(param);
>> +
>> +    /* Null value is written as empty element */
>> +    if (!value)
>> +        value = "";
>> +
>> +    if (!do_escape) {
>> +        pa_strbuf_printf(param->buffer, "{%s}", value);
>> +        return;
>> +    }
>> +
>> +    /* Using pa_strbuf_putc() to write to the strbuf while iterating over
>> +     * the input string would cause the allocation of a linked list element
>> +     * for each character of the input string. Therefore the output string
>> +     * is constructed locally before writing it to the buffer */
> This is an interesting point. I think you should improve pa_escape(),
> which currently uses the naive pa_strbuf_putc() method. With my
> proposed escaping rules, you could then replace this code with a call
> to pa_escape().

OK, I'll do that and send a patch for tweaking pa_escape()/pa_unescape()
to my needs separately.

>
>> +
>> +    /* Count number of characters to escape */
>> +    brace_count = 0;
>> +    for (s = value; *s; ++s) {
>> +        if (*s == '{' || *s == '}')
>> +            brace_count++;
>> +    }
> You could at the same time count all characters to save the strlen()
> call.

Mh, do I really save something? If the string is x characters
long, I would increment a variable this often. Is that less work
than a strlen() call? But probably strlen() does something similar ...

>
>> +/* Check if a path string starts with a / and only contains valid characters */
>> +static bool object_path_is_valid(const char *test_string) {
>>       uint32_t i;
>>   
>> +    if (!test_string)
>> +        return false;
>> +
>> +    /* Make sure the string starts with / and does not end with a / */
>> +    if (test_string[0] != '/' || test_string[strlen(test_string) - 1] == '/')
>> +        return false;
>> +
>>       for (i = 0; test_string[i]; i++) {
>> -        if (test_string[i] == '{' ||
>> -            test_string[i] == '}')
>> -            return false;
>> +
>> +        if ((test_string[i] >= 'a' && test_string[i] <= 'z') ||
>> +            (test_string[i] >= 'A' && test_string[i] <= 'Z') ||
>> +            (test_string[i] >= '0' && test_string[i] <= '9') ||
>> +            test_string[i] == '.' ||
>> +            test_string[i] == '_' ||
>> +            test_string[i] == '-' ||
>> +            test_string[i] == '/')
>> +            continue;
>> +
>> +        return false;
>>       }
> You could save the strlen() call if you checked the trailing slash
> after this for loop when you're at the end of the string.
> 	
> By the way, do you perhaps want to disallow two subsequent slashes in
> object paths? (I don't care myself, because I doubt that invalid paths
> will ever be encountered anyway when registering message handlers.)

If I don't disallow it, then a double slash in the path will be valid.
At least up to now we do not care about the individual components
that the path is constructed from.
So should I disallow it or not?

>
>>   
>>       return true;
>> @@ -56,13 +70,8 @@ void pa_message_handler_register(pa_core *c, const char *object_path, const char
>>       pa_assert(cb);
>>       pa_assert(userdata);
>>   
>> -    /* Ensure that the object path is not empty and starts with "/". */
>> -    pa_assert(object_path[0] == '/');
>> -
>> -    /* Ensure that object path and description are valid strings */
>> -    pa_assert(string_is_valid(object_path));
>> -    if (description)
>> -        pa_assert(string_is_valid(description));
>> +    /* Ensure that object path is valid */
>> +    pa_assert(object_path_is_valid(object_path));
>>   
>>       handler = pa_xnew0(struct pa_message_handler, 1);
>>       handler->userdata = userdata;
>> @@ -91,7 +100,7 @@ void pa_message_handler_unregister(pa_core *c, const char *object_path) {
>>   int pa_message_handler_send_message(pa_core *c, const char *object_path, const char *message, const char *message_parameters, char **response) {
>>       struct pa_message_handler *handler;
>>       int ret;
>> -    char *parameter_copy;
>> +    char *parameter_copy, *path_copy;
>>   
>>       pa_assert(c);
>>       pa_assert(object_path);
>> @@ -100,8 +109,16 @@ int pa_message_handler_send_message(pa_core *c, const char *object_path, const c
>>   
>>       *response = NULL;
>>   
>> -    if (!(handler = pa_hashmap_get(c->message_handlers, object_path)))
>> +    path_copy = pa_xstrdup(object_path);
>> +
>> +    /* Remove trailing / from path name if present */
>> +    if (path_copy[strlen(path_copy) - 1] == '/')
>> +        path_copy[strlen(path_copy) - 1] = 0;
> If you want to allow some slack for applications, would it be better to
> do this extra work in libpulse rather than in the daemon?
>
> I would personally not bother with this at all, but I don't mind it
> that much (even if it's done in the daemon).
>
If you don't mind, I would leave it as it is.


Unfortunately I am still heavily loaded, so it may take a while before
I can send new patches.
On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
> On 21.07.2018 20:17, Tanu Kaskinen wrote:
> > On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> > > The patch adds the possibility to escape curly braces within parameter strings
> > > and introduces several new functions that can be used for writing parameters.
> > > 
> > > For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
> > > has been created. Following new write functions are available:
> > > 
> > > pa_message_param_new() - creates a new pa_message_param structure
> > > pa_message_param_to_string() - converts a pa_message_param to string and frees
> > > the structure
> > 
> > I'd like to have the _free suffix in the function name, because for the
> > uninitiated the current name looks like just another "to_string"
> > function with no surprising side effects.
> > 
> > I think we need pa_message_params_free() as well in case an application
> > runs into an error while constructing the parameters and it wants to
> > just get rid of the pa_message_params struct without converting it to a
> > string first.
> > 
> > > pa_message_param_begin_list() - starts a list
> > > pa_message_param_end_list() - ends a list
> > > pa_message_param_write_string() - writes a string to a pa_message_param structure
> > > 
> > > For string parameters that contain curly braces, those braces must be escaped
> > > by adding a "\" before them. This however means that a trailing backslash would
> > > falsely escape the closing bracket. To avoid this, single quotes must be added
> > > at start and end of the string.
> > 
> > Why not solve this by the usual method: require escaping of "\" in
> > input with "\\" in output?
> 
> You are right, if I modify pa_unescape() as mentioned further below and then
> use pa_escape()/pa_unescape() the quotes won't be necessary.
> 
> > 
> > > The function pa_message_param_write_string()
> > > has a parameter do_escape.
> > 
> > Why not do escaping always?
> 
> Because what you are writing as a string may be a list that you have 
> prepared
> previously. Then you will not want escaping. You may for example create 
> a list
> from an array and then insert this list as one string into the final 
> parameter list
> or have a function that converts a certain structure to a parameter 
> string and
> then write the result of this function as one element of the final list.

My mental model is that parameters have types, list type is different
than string type, and write_string() is only meant for writing values
of the string type.

Can you add a write_raw() function?

> > 
> > > If true, the necessary escaping is added. Escaping
> > > is only needed if a string might fulfill one of the following conditions:
> > > 
> > > - It contains curly braces
> > > - It contains a trailing "\"
> > > - It is enclosed in single quotes
> > > 
> > > Other strings can be passed without modification.
> > > 
> > > For reading, pa_message_param_read_string() reverts the changes that
> > > pa_message_param_write_string() might have introduced.
> > > 
> > > The patch also adds more restrictions on the object path name. Now only
> > > alphanumeric characters and one of "_", ".", "-" and "/" are allowed.
> > > The path name may not end with a /. If the user specifies a trailing / when
> > > sending a message, it will be removed.
> > > ---
> > >   doc/messaging_api.txt           |  34 +++++++-
> > >   src/Makefile.am                 |   3 +-
> > >   src/map-file                    |   5 ++
> > >   src/pulse/message-params.c      | 181 ++++++++++++++++++++++++++++++++++++++--
> > >   src/pulse/message-params.h      |  23 ++++-
> > >   src/pulsecore/core.c            |  20 ++---
> > >   src/pulsecore/message-handler.c |  53 +++++++-----
> > >   src/utils/pactl.c               |   4 +-
> > >   8 files changed, 279 insertions(+), 44 deletions(-)
> > > 
> > > diff --git a/doc/messaging_api.txt b/doc/messaging_api.txt
> > > index 431a5df2..0e6be53f 100644
> > > --- a/doc/messaging_api.txt
> > > +++ b/doc/messaging_api.txt
> > > @@ -14,10 +14,42 @@ look like that:
> > >   {{Integer} {{1st float} {2nd float} ...}}{...}
> > >   Any characters that are not enclosed in curly braces are ignored (all characters
> > >   between { and {, between } and } and between } and {). The same syntax is used
> > > -to specify message parameters. The following reference lists available messages,
> > > +to specify message parameters. The reference further down lists available messages,
> > >   their parameters and return values. If a return value is enclosed in {}, this
> > >   means that multiple elements of the same type may be returned.
> > >   
> > > +There are several functions that simplify reading and writing message parameter
> > > +strings. For writing, the structure pa_message_param can be used. Following
> > > +functions are available:
> > > +pa_message_param_new() - creates a new pa_message_param structure
> > > +pa_message_param_to_string() - converts a pa_message_param to string and frees
> > > +the structure
> > > +pa_message_param_begin_list() - starts a list
> > > +pa_message_param_end_list() - ends a list
> > > +pa_message_param_write_string() - writes a string to a pa_message_param structure
> > > +
> > > +For string parameters that contain curly braces, those braces must be escaped
> > > +by adding a "\" before them. This however means that a trailing backslash would
> > > +falsely escape the closing bracket. To avoid this, single quotes must be added
> > > +at start and end of the string. The function pa_message_param_write_string()
> > > +has a parameter do_escape. If true, the necessary escaping is added. Escaping
> > > +is only needed if a string might fulfill one of the following conditions:
> > > +
> > > +- It contains curly braces
> > > +- It contains a trailing "\"
> > > +- It is enclosed in single quotes
> > > +
> > > +Other strings can be passed without modification.
> > > +
> > > +For reading, the following functions are available:
> > > +pa_message_param_split_list() - parse message parameter string
> > > +pa_message_param_read_string() - read a string from a parameter list
> > > +
> > > +pa_message_param_read_string() also reverts the changes that
> > > +pa_message_param_write_string() might have introduced.
> > 
> > I think the doxygen documentation is better suited for the details of
> > the C API. messaging_api.txt should contain information that is common
> > to both libpulse and pactl users.
> 
> Somehow it is not really clear to me, which documentation you want
> to have in messaging_api.txt.

I want documentation for two things: the serialization format
specification and the documentation for the supported messages (plus a
short introduction explaining why the messaging API exists). When
writing C programs, the doxygen documentation is the reference for the
C API details.

> > 
> > > +/* Remove escaping from a string */
> > > +static char *unescape(char *value) {
> > > +    char *tmp;
> > > +
> > > +    if (!value)
> > > +        return NULL;
> > > +
> > > +    /* If the string is enclosed in single quotes, remove them */
> > > +    if (*value == '\'' && value[strlen(value) - 1] == '\'') {
> > > +
> > > +        memmove(value, value + 1, strlen(value));
> > > +        value[strlen(value) - 1] = 0;
> > > +    }
> > > +
> > > +    /* Remove escape character from curly braces if present. */
> > > +    while ((tmp = strstr(value, "\\{")))
> > > +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
> > > +    while ((tmp = strstr(value, "\\}")))
> > > +        memmove(tmp, tmp + 1, strlen(value) - (size_t)(tmp  - value));
> > > +
> > > +    /* Return the pointer that was passed in. */
> > > +    return value;
> > > +}
> > 
> > core-util already contains pa_unescape() that does the same thing more
> > efficiently (if you drop the single quote thing).
> 
> pa_unescape() currently does not do the same thing. It removes all
> escape characters, while I only want to remove the characters
> I actually introduced (those before { or }).
> I can however modify pa_unescape() to take the same arguments
> as pa_escape().

I don't see the need for being selective when unescaping. Nothing
breaks if all (except escaped) backslashes are stripped.

> > 
> > > +
> > > +/* Read functions */
> > > +
> > >   /* Split the specified string into elements. An element is defined as
> > >    * a sub-string between curly braces. The function is needed to parse
> > >    * the parameters of messages. Each time it is called returns the position
> > >    * of the current element in result and the state pointer is advanced to
> > > - * the next list element.
> > > + * the next list element. On return, the parameter *is_unpacked indicates
> > > + * if the string is plain text or contains a sub-list. is_unpacked may
> > > + * be NULL.
> > 
> > is_unpacked looks like unnecessary complexity.
> > pa_message_params_read_string() should always unescape the value.
> 
> It may be possible, that the string you read is a list. Consider the 
> following
> parameter list: {string1}{some nested structure}{string2}. You can now
> read this list as three strings and then continue to read the elements of
> the nested structure from the second string. You might even create a 
> function
> that takes a string and outputs a structure. So you are not forced to go
> to the full depth of nesting on the first pass. This makes it much easier
> to handle deeply nested parameter lists. For me this behavior is an 
> important
> feature and I do not want to drop it. See also my comment on why I do
> not always want escaping.

Doesn't split_list() already allow this, why do you want to use
read_string() to do the same thing as split_list()?

> > 
> > >    
> > >   
> > >       /* Parse error, closing brace missing */
> > > @@ -96,21 +154,126 @@ int pa_message_param_split_list(char *c, char **result, void **state) {
> > >   /* Read a string from the parameter list. The state pointer is
> > >    * advanced to the next element of the list. If allocate is
> > >    * true, a newly allocated string will be returned, else a
> > > - * pointer to a sub-string within c. */
> > > + * pointer to a sub-string within c. Escaping will be removed
> > > + * if the string does not contain another list. */
> > 
> > A string is a string, it's not a list. I think this function should
> > always do unescaping. If a string value is read that is missing
> > escaping for curly braces, that could be reported as an error (I'm not
> > saying "should", but I think it would be good to do the validation in
> > this function).
> 
> See my comments above why a string may be a list.
> 
> > 
> > > +
> > > +/* Writes a string to a message_param structure, adding curly braces
> > > + * around the string. If do_escape is true, leading and trailing single
> > > + * quotes and also a "\" before curly braces are added to the input string.
> > > + * Escaping only needs to be used if the original string might fulfill any
> > > + * of the following conditions:
> > > + * - It contains curly braces
> > > + * - It is enclosed in single quotes
> > > + * - It ends with a "\" */
> > > +void pa_message_param_write_string(pa_message_param *param, const char *value, bool do_escape) {
> > 
> > I would drop the do_escape flag, and if there's need for appending
> > unescaped raw data to pa_message_params, there could be a separate
> > function for that.
> 
> See above. A function for writing raw data should nevertheless be provided
> for the case where a string can only be written as several parts.
> 
> > 
> > > +    const char *s;
> > > +    char *output, *out_char;
> > > +    size_t brace_count;
> > > +
> > > +    pa_assert(param);
> > > +
> > > +    /* Null value is written as empty element */
> > > +    if (!value)
> > > +        value = "";
> > > +
> > > +    if (!do_escape) {
> > > +        pa_strbuf_printf(param->buffer, "{%s}", value);
> > > +        return;
> > > +    }
> > > +
> > > +    /* Using pa_strbuf_putc() to write to the strbuf while iterating over
> > > +     * the input string would cause the allocation of a linked list element
> > > +     * for each character of the input string. Therefore the output string
> > > +     * is constructed locally before writing it to the buffer */
> > 
> > This is an interesting point. I think you should improve pa_escape(),
> > which currently uses the naive pa_strbuf_putc() method. With my
> > proposed escaping rules, you could then replace this code with a call
> > to pa_escape().
> 
> OK, I'll do that and send a patch for tweaking pa_escape()/pa_unescape()
> to my needs separately.
> 
> > 
> > > +
> > > +    /* Count number of characters to escape */
> > > +    brace_count = 0;
> > > +    for (s = value; *s; ++s) {
> > > +        if (*s == '{' || *s == '}')
> > > +            brace_count++;
> > > +    }
> > 
> > You could at the same time count all characters to save the strlen()
> > call.
> 
> Mh, do I really save something? If the string is x characters
> long, I would increment a variable this often. Is that less work
> than a strlen() call? But probably strlen() does something similar ...

strlen() loops over all characters. One loop is better than two. Not
that this is likely to cause any noticeable slowdown, unnecessary
strlen() calls just tend to catch my attention...

> > 
> > > +/* Check if a path string starts with a / and only contains valid characters */
> > > +static bool object_path_is_valid(const char *test_string) {
> > >       uint32_t i;
> > >   
> > > +    if (!test_string)
> > > +        return false;
> > > +
> > > +    /* Make sure the string starts with / and does not end with a / */
> > > +    if (test_string[0] != '/' || test_string[strlen(test_string) - 1] == '/')
> > > +        return false;
> > > +
> > >       for (i = 0; test_string[i]; i++) {
> > > -        if (test_string[i] == '{' ||
> > > -            test_string[i] == '}')
> > > -            return false;
> > > +
> > > +        if ((test_string[i] >= 'a' && test_string[i] <= 'z') ||
> > > +            (test_string[i] >= 'A' && test_string[i] <= 'Z') ||
> > > +            (test_string[i] >= '0' && test_string[i] <= '9') ||
> > > +            test_string[i] == '.' ||
> > > +            test_string[i] == '_' ||
> > > +            test_string[i] == '-' ||
> > > +            test_string[i] == '/')
> > > +            continue;
> > > +
> > > +        return false;
> > >       }
> > 
> > You could save the strlen() call if you checked the trailing slash
> > after this for loop when you're at the end of the string.
> > 	
> > By the way, do you perhaps want to disallow two subsequent slashes in
> > object paths? (I don't care myself, because I doubt that invalid paths
> > will ever be encountered anyway when registering message handlers.)
> 
> If I don't disallow it, then a double slash in the path will be valid.
> At least up to now we do not care about the individual components
> that the path is constructed from.
> So should I disallow it or not?

As I said, I don't care. Clients don't register message handlers, so
we're dealing only with server code, and I find it unlikely that code
that generates weird paths would get past review.

If I have to have an opinion, then this is it: if we're anyway
validating the paths, disallowing double slashes would be logical,
becacuse I can't imagine such paths ever getting created on purpose.
On 22.07.2018 17:48, Tanu Kaskinen wrote:
> On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
>> On 21.07.2018 20:17, Tanu Kaskinen wrote:
>>> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>>>> The patch adds the possibility to escape curly braces within parameter strings
>>>> and introduces several new functions that can be used for writing parameters.
>>>>
>>>> For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
>>>> has been created. Following new write functions are available:
>>>>
>>>> pa_message_param_new() - creates a new pa_message_param structure
>>>> pa_message_param_to_string() - converts a pa_message_param to string and frees
>>>> the structure
>>>
>>>> The function pa_message_param_write_string()
>>>> has a parameter do_escape.
>>> Why not do escaping always?
>> Because what you are writing as a string may be a list that you have
>> prepared
>> previously. Then you will not want escaping. You may for example create
>> a list
>> from an array and then insert this list as one string into the final
>> parameter list
>> or have a function that converts a certain structure to a parameter
>> string and
>> then write the result of this function as one element of the final list.
> My mental model is that parameters have types, list type is different
> than string type, and write_string() is only meant for writing values
> of the string type.
>
> Can you add a write_raw() function?

Yes, this is done in patch 7. But the raw write function differs from what
write_string() is doing. write_string() writes one element of a list, 
that is
it encloses the string in braces. The raw write function is intended for
situations where you can't write a complete element with one write, so
it does not add any braces. I am still of the opinion, that a structure
or array converted to a parameter string is a string, so writing something
like this should be done with write_string().
Also writing unescaped strings in situations where escaping is not necessary
saves the overhead of looping over all the characters.

>>> core-util already contains pa_unescape() that does the same thing more
>>> efficiently (if you drop the single quote thing).
>> pa_unescape() currently does not do the same thing. It removes all
>> escape characters, while I only want to remove the characters
>> I actually introduced (those before { or }).
>> I can however modify pa_unescape() to take the same arguments
>> as pa_escape().
> I don't see the need for being selective when unescaping. Nothing
> breaks if all (except escaped) backslashes are stripped.

You are right, if previously all backslashes in the original string
have been escaped, nothing will break. I was still thinking of the
old solution where I did not escape backslashes.


>
>>>> +
>>>> +/* Read functions */
>>>> +
>>>>    /* Split the specified string into elements. An element is defined as
>>>>     * a sub-string between curly braces. The function is needed to parse
>>>>     * the parameters of messages. Each time it is called returns the position
>>>>     * of the current element in result and the state pointer is advanced to
>>>> - * the next list element.
>>>> + * the next list element. On return, the parameter *is_unpacked indicates
>>>> + * if the string is plain text or contains a sub-list. is_unpacked may
>>>> + * be NULL.
>>> is_unpacked looks like unnecessary complexity.
>>> pa_message_params_read_string() should always unescape the value.
>> It may be possible, that the string you read is a list. Consider the
>> following
>> parameter list: {string1}{some nested structure}{string2}. You can now
>> read this list as three strings and then continue to read the elements of
>> the nested structure from the second string. You might even create a
>> function
>> that takes a string and outputs a structure. So you are not forced to go
>> to the full depth of nesting on the first pass. This makes it much easier
>> to handle deeply nested parameter lists. For me this behavior is an
>> important
>> feature and I do not want to drop it. See also my comment on why I do
>> not always want escaping.
> Doesn't split_list() already allow this, why do you want to use
> read_string() to do the same thing as split_list()?

read_string() and split_list() are very similar and we could live
without read_string(). It is provided as a counterpart to write_string()
and for convenience additionally does the unescaping if necessary
like write_string does the escaping.
I don't see why this is a problem. It depends on the context which
is the better function to use.
On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
> On 22.07.2018 17:48, Tanu Kaskinen wrote:
> > On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
> > > On 21.07.2018 20:17, Tanu Kaskinen wrote:
> > > > On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> > > > > The patch adds the possibility to escape curly braces within parameter strings
> > > > > and introduces several new functions that can be used for writing parameters.
> > > > > 
> > > > > For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
> > > > > has been created. Following new write functions are available:
> > > > > 
> > > > > pa_message_param_new() - creates a new pa_message_param structure
> > > > > pa_message_param_to_string() - converts a pa_message_param to string and frees
> > > > > the structure
> > > > > The function pa_message_param_write_string()
> > > > > has a parameter do_escape.
> > > > 
> > > > Why not do escaping always?
> > > 
> > > Because what you are writing as a string may be a list that you have
> > > prepared
> > > previously. Then you will not want escaping. You may for example create
> > > a list
> > > from an array and then insert this list as one string into the final
> > > parameter list
> > > or have a function that converts a certain structure to a parameter
> > > string and
> > > then write the result of this function as one element of the final list.
> > 
> > My mental model is that parameters have types, list type is different
> > than string type, and write_string() is only meant for writing values
> > of the string type.
> > 
> > Can you add a write_raw() function?
> 
> Yes, this is done in patch 7. But the raw write function differs from what
> write_string() is doing. write_string() writes one element of a list, 
> that is
> it encloses the string in braces. The raw write function is intended for
> situations where you can't write a complete element with one write, so
> it does not add any braces. I am still of the opinion, that a structure
> or array converted to a parameter string is a string, so writing something
> like this should be done with write_string().

They are different kinds of strings, different abstraction levels. When
you're writing an array "as a string", in that context it's just a C
string. write_string() with escaping deals with strings in the "message
params type system". I don't know if this makes any sense to you.
Probably not... In any case, the do_escape flag seems unnecessary
complexity to me.

> Also writing unescaped strings in situations where escaping is not necessary
> saves the overhead of looping over all the characters.
> 
> > > > core-util already contains pa_unescape() that does the same thing more
> > > > efficiently (if you drop the single quote thing).
> > > 
> > > pa_unescape() currently does not do the same thing. It removes all
> > > escape characters, while I only want to remove the characters
> > > I actually introduced (those before { or }).
> > > I can however modify pa_unescape() to take the same arguments
> > > as pa_escape().
> > 
> > I don't see the need for being selective when unescaping. Nothing
> > breaks if all (except escaped) backslashes are stripped.
> 
> You are right, if previously all backslashes in the original string
> have been escaped, nothing will break. I was still thinking of the
> old solution where I did not escape backslashes.
> 
> 
> > 
> > > > > +
> > > > > +/* Read functions */
> > > > > +
> > > > >    /* Split the specified string into elements. An element is defined as
> > > > >     * a sub-string between curly braces. The function is needed to parse
> > > > >     * the parameters of messages. Each time it is called returns the position
> > > > >     * of the current element in result and the state pointer is advanced to
> > > > > - * the next list element.
> > > > > + * the next list element. On return, the parameter *is_unpacked indicates
> > > > > + * if the string is plain text or contains a sub-list. is_unpacked may
> > > > > + * be NULL.
> > > > 
> > > > is_unpacked looks like unnecessary complexity.
> > > > pa_message_params_read_string() should always unescape the value.
> > > 
> > > It may be possible, that the string you read is a list. Consider the
> > > following
> > > parameter list: {string1}{some nested structure}{string2}. You can now
> > > read this list as three strings and then continue to read the elements of
> > > the nested structure from the second string. You might even create a
> > > function
> > > that takes a string and outputs a structure. So you are not forced to go
> > > to the full depth of nesting on the first pass. This makes it much easier
> > > to handle deeply nested parameter lists. For me this behavior is an
> > > important
> > > feature and I do not want to drop it. See also my comment on why I do
> > > not always want escaping.
> > 
> > Doesn't split_list() already allow this, why do you want to use
> > read_string() to do the same thing as split_list()?
> 
> read_string() and split_list() are very similar and we could live
> without read_string(). It is provided as a counterpart to write_string()
> and for convenience additionally does the unescaping if necessary
> like write_string does the escaping.
> I don't see why this is a problem. It depends on the context which
> is the better function to use.

Again, in my mind a structure is not a string, they are different
types, and I think read_string() should only deal with the string type.
is_unpacked makes the API more complicated, so I'd like to get rid of
it.
On 26.07.2018 12:37, Tanu Kaskinen wrote:
> On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
>> On 22.07.2018 17:48, Tanu Kaskinen wrote:
>>> On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
>>>> On 21.07.2018 20:17, Tanu Kaskinen wrote:
>>>>> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>>>>>> The patch adds the possibility to escape curly braces within parameter strings
>>>>>> and introduces several new functions that can be used for writing parameters.
>>>>>>
>>>>>> For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
>>>>>> has been created. Following new write functions are available:
>>>>>>
>>>>>> pa_message_param_new() - creates a new pa_message_param structure
>>>>>> pa_message_param_to_string() - converts a pa_message_param to string and frees
>>>>>> the structure
>>>>>> The function pa_message_param_write_string()
>>>>>> has a parameter do_escape.
>>>>> Why not do escaping always?
>>>> Because what you are writing as a string may be a list that you have
>>>> prepared
>>>> previously. Then you will not want escaping. You may for example create
>>>> a list
>>>> from an array and then insert this list as one string into the final
>>>> parameter list
>>>> or have a function that converts a certain structure to a parameter
>>>> string and
>>>> then write the result of this function as one element of the final list.
>>> My mental model is that parameters have types, list type is different
>>> than string type, and write_string() is only meant for writing values
>>> of the string type.
>>>
>>> Can you add a write_raw() function?
>> Yes, this is done in patch 7. But the raw write function differs from what
>> write_string() is doing. write_string() writes one element of a list,
>> that is
>> it encloses the string in braces. The raw write function is intended for
>> situations where you can't write a complete element with one write, so
>> it does not add any braces. I am still of the opinion, that a structure
>> or array converted to a parameter string is a string, so writing something
>> like this should be done with write_string().
> They are different kinds of strings, different abstraction levels. When
> you're writing an array "as a string", in that context it's just a C
> string. write_string() with escaping deals with strings in the "message
> params type system". I don't know if this makes any sense to you.
> Probably not... In any case, the do_escape flag seems unnecessary
> complexity to me.

The alternative would be a function to write an unescaped string in
addition to the write_raw() function. If you don't like the flag, would
you be OK with a write_unescaped_string() function? I think it is just
more comfortable than using write_raw().

>
>> Also writing unescaped strings in situations where escaping is not necessary
>> saves the overhead of looping over all the characters.
>>
>>>>> core-util already contains pa_unescape() that does the same thing more
>>>>> efficiently (if you drop the single quote thing).
>>>> pa_unescape() currently does not do the same thing. It removes all
>>>> escape characters, while I only want to remove the characters
>>>> I actually introduced (those before { or }).
>>>> I can however modify pa_unescape() to take the same arguments
>>>> as pa_escape().
>>> I don't see the need for being selective when unescaping. Nothing
>>> breaks if all (except escaped) backslashes are stripped.
>> You are right, if previously all backslashes in the original string
>> have been escaped, nothing will break. I was still thinking of the
>> old solution where I did not escape backslashes.
>>
>>
>>>>>> +
>>>>>> +/* Read functions */
>>>>>> +
>>>>>>     /* Split the specified string into elements. An element is defined as
>>>>>>      * a sub-string between curly braces. The function is needed to parse
>>>>>>      * the parameters of messages. Each time it is called returns the position
>>>>>>      * of the current element in result and the state pointer is advanced to
>>>>>> - * the next list element.
>>>>>> + * the next list element. On return, the parameter *is_unpacked indicates
>>>>>> + * if the string is plain text or contains a sub-list. is_unpacked may
>>>>>> + * be NULL.
>>>>> is_unpacked looks like unnecessary complexity.
>>>>> pa_message_params_read_string() should always unescape the value.
>>>> It may be possible, that the string you read is a list. Consider the
>>>> following
>>>> parameter list: {string1}{some nested structure}{string2}. You can now
>>>> read this list as three strings and then continue to read the elements of
>>>> the nested structure from the second string. You might even create a
>>>> function
>>>> that takes a string and outputs a structure. So you are not forced to go
>>>> to the full depth of nesting on the first pass. This makes it much easier
>>>> to handle deeply nested parameter lists. For me this behavior is an
>>>> important
>>>> feature and I do not want to drop it. See also my comment on why I do
>>>> not always want escaping.
>>> Doesn't split_list() already allow this, why do you want to use
>>> read_string() to do the same thing as split_list()?
>> read_string() and split_list() are very similar and we could live
>> without read_string(). It is provided as a counterpart to write_string()
>> and for convenience additionally does the unescaping if necessary
>> like write_string does the escaping.
>> I don't see why this is a problem. It depends on the context which
>> is the better function to use.
> Again, in my mind a structure is not a string, they are different
> types, and I think read_string() should only deal with the string type.
> is_unpacked makes the API more complicated, so I'd like to get rid of
> it.
>
I don't see your point. is_unpacked is not part of the read_string() 
arguments
or return value. In split_list() it is definitively needed to indicate 
if the returned
string (in the C sense) contains another list. I can imagine many 
situations where
you might either pass an array or just a single value or even NULL. 
is_unpacked
allows to differentiate between the situations. And I don't think that a 
boolean
return value is adding significantly to the complexity of a function, 
especially
since passing NULL is allowed if you do not care for the value.
On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
> On 26.07.2018 12:37, Tanu Kaskinen wrote:
> > On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
> > > On 22.07.2018 17:48, Tanu Kaskinen wrote:
> > > > On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
> > > > > On 21.07.2018 20:17, Tanu Kaskinen wrote:
> > > > > > On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> > > > > > > The patch adds the possibility to escape curly braces within parameter strings
> > > > > > > and introduces several new functions that can be used for writing parameters.
> > > > > > > 
> > > > > > > For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
> > > > > > > has been created. Following new write functions are available:
> > > > > > > 
> > > > > > > pa_message_param_new() - creates a new pa_message_param structure
> > > > > > > pa_message_param_to_string() - converts a pa_message_param to string and frees
> > > > > > > the structure
> > > > > > > The function pa_message_param_write_string()
> > > > > > > has a parameter do_escape.
> > > > > > 
> > > > > > Why not do escaping always?
> > > > > 
> > > > > Because what you are writing as a string may be a list that you have
> > > > > prepared
> > > > > previously. Then you will not want escaping. You may for example create
> > > > > a list
> > > > > from an array and then insert this list as one string into the final
> > > > > parameter list
> > > > > or have a function that converts a certain structure to a parameter
> > > > > string and
> > > > > then write the result of this function as one element of the final list.
> > > > 
> > > > My mental model is that parameters have types, list type is different
> > > > than string type, and write_string() is only meant for writing values
> > > > of the string type.
> > > > 
> > > > Can you add a write_raw() function?
> > > 
> > > Yes, this is done in patch 7. But the raw write function differs from what
> > > write_string() is doing. write_string() writes one element of a list,
> > > that is
> > > it encloses the string in braces. The raw write function is intended for
> > > situations where you can't write a complete element with one write, so
> > > it does not add any braces. I am still of the opinion, that a structure
> > > or array converted to a parameter string is a string, so writing something
> > > like this should be done with write_string().
> > 
> > They are different kinds of strings, different abstraction levels. When
> > you're writing an array "as a string", in that context it's just a C
> > string. write_string() with escaping deals with strings in the "message
> > params type system". I don't know if this makes any sense to you.
> > Probably not... In any case, the do_escape flag seems unnecessary
> > complexity to me.
> 
> The alternative would be a function to write an unescaped string in
> addition to the write_raw() function. If you don't like the flag, would
> you be OK with a write_unescaped_string() function? I think it is just
> more comfortable than using write_raw().

Thanks for the concession, I was afraid we'll get stuck on this point.

By comfortable, do you refer to that write_raw() doesn't add braces
around the value? How about adding an add_braces flag to write_raw()?

> > > > > > > +
> > > > > > > +/* Read functions */
> > > > > > > +
> > > > > > >     /* Split the specified string into elements. An element is defined as
> > > > > > >      * a sub-string between curly braces. The function is needed to parse
> > > > > > >      * the parameters of messages. Each time it is called returns the position
> > > > > > >      * of the current element in result and the state pointer is advanced to
> > > > > > > - * the next list element.
> > > > > > > + * the next list element. On return, the parameter *is_unpacked indicates
> > > > > > > + * if the string is plain text or contains a sub-list. is_unpacked may
> > > > > > > + * be NULL.
> > > > > > 
> > > > > > is_unpacked looks like unnecessary complexity.
> > > > > > pa_message_params_read_string() should always unescape the value.
> > > > > 
> > > > > It may be possible, that the string you read is a list. Consider the
> > > > > following
> > > > > parameter list: {string1}{some nested structure}{string2}. You can now
> > > > > read this list as three strings and then continue to read the elements of
> > > > > the nested structure from the second string. You might even create a
> > > > > function
> > > > > that takes a string and outputs a structure. So you are not forced to go
> > > > > to the full depth of nesting on the first pass. This makes it much easier
> > > > > to handle deeply nested parameter lists. For me this behavior is an
> > > > > important
> > > > > feature and I do not want to drop it. See also my comment on why I do
> > > > > not always want escaping.
> > > > 
> > > > Doesn't split_list() already allow this, why do you want to use
> > > > read_string() to do the same thing as split_list()?
> > > 
> > > read_string() and split_list() are very similar and we could live
> > > without read_string(). It is provided as a counterpart to write_string()
> > > and for convenience additionally does the unescaping if necessary
> > > like write_string does the escaping.
> > > I don't see why this is a problem. It depends on the context which
> > > is the better function to use.
> > 
> > Again, in my mind a structure is not a string, they are different
> > types, and I think read_string() should only deal with the string type.
> > is_unpacked makes the API more complicated, so I'd like to get rid of
> > it.
> > 
> 
> I don't see your point. is_unpacked is not part of the read_string() 
> arguments
> or return value. In split_list() it is definitively needed to indicate 
> if the returned
> string (in the C sense) contains another list. I can imagine many 
> situations where
> you might either pass an array or just a single value or even NULL. 
> is_unpacked
> allows to differentiate between the situations.

Can you give an example? You say is_unpacked is definitely needed, but
so far the only use case has been read_string(), which you wanted to be
used for reading not only string values but everything else too. If
read_string() is changed to only read string values, then it doesn't
need is_unpacked from split_list().

The parameter types are known beforehand, so i don't see the need for
looking at the data to figure out the type. If introspection support is
desired, then that's a separate project (the is_unpacked flag isn't
sufficient for proper introspection).
On 27.07.2018 10:08, Tanu Kaskinen wrote:
> On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
>> On 26.07.2018 12:37, Tanu Kaskinen wrote:
>>> On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
>>>> On 22.07.2018 17:48, Tanu Kaskinen wrote:
>>>>> On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
>>>>>> On 21.07.2018 20:17, Tanu Kaskinen wrote:
>>>>>>> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>>>>>>>> The patch adds the possibility to escape curly braces within parameter strings
>>>>>>>> and introduces several new functions that can be used for writing parameters.
>>>>>>>>
>>>>>>>> For writing, the structure pa_message_param, which is a wrapper for pa_strbuf
>>>>>>>> has been created. Following new write functions are available:
>>>>>>>>
>>>>>>>> pa_message_param_new() - creates a new pa_message_param structure
>>>>>>>> pa_message_param_to_string() - converts a pa_message_param to string and frees
>>>>>>>> the structure
>>>>>>>> The function pa_message_param_write_string()
>>>>>>>> has a parameter do_escape.
>>>>>>> Why not do escaping always?
>>>>>> Because what you are writing as a string may be a list that you have
>>>>>> prepared
>>>>>> previously. Then you will not want escaping. You may for example create
>>>>>> a list
>>>>>> from an array and then insert this list as one string into the final
>>>>>> parameter list
>>>>>> or have a function that converts a certain structure to a parameter
>>>>>> string and
>>>>>> then write the result of this function as one element of the final list.
>>>>> My mental model is that parameters have types, list type is different
>>>>> than string type, and write_string() is only meant for writing values
>>>>> of the string type.
>>>>>
>>>>> Can you add a write_raw() function?
>>>> Yes, this is done in patch 7. But the raw write function differs from what
>>>> write_string() is doing. write_string() writes one element of a list,
>>>> that is
>>>> it encloses the string in braces. The raw write function is intended for
>>>> situations where you can't write a complete element with one write, so
>>>> it does not add any braces. I am still of the opinion, that a structure
>>>> or array converted to a parameter string is a string, so writing something
>>>> like this should be done with write_string().
>>> They are different kinds of strings, different abstraction levels. When
>>> you're writing an array "as a string", in that context it's just a C
>>> string. write_string() with escaping deals with strings in the "message
>>> params type system". I don't know if this makes any sense to you.
>>> Probably not... In any case, the do_escape flag seems unnecessary
>>> complexity to me.
>> The alternative would be a function to write an unescaped string in
>> addition to the write_raw() function. If you don't like the flag, would
>> you be OK with a write_unescaped_string() function? I think it is just
>> more comfortable than using write_raw().
> Thanks for the concession, I was afraid we'll get stuck on this point.
>
> By comfortable, do you refer to that write_raw() doesn't add braces
> around the value? How about adding an add_braces flag to write_raw()?

Yes, that's a good idea. I'll do that and remove the
do_escape flag from write_string().

>
>>>>>>>> +
>>>>>>>> +/* Read functions */
>>>>>>>> +
>>>>>>>>      /* Split the specified string into elements. An element is defined as
>>>>>>>>       * a sub-string between curly braces. The function is needed to parse
>>>>>>>>       * the parameters of messages. Each time it is called returns the position
>>>>>>>>       * of the current element in result and the state pointer is advanced to
>>>>>>>> - * the next list element.
>>>>>>>> + * the next list element. On return, the parameter *is_unpacked indicates
>>>>>>>> + * if the string is plain text or contains a sub-list. is_unpacked may
>>>>>>>> + * be NULL.
>>>>>>> is_unpacked looks like unnecessary complexity.
>>>>>>> pa_message_params_read_string() should always unescape the value.
>>>>>> It may be possible, that the string you read is a list. Consider the
>>>>>> following
>>>>>> parameter list: {string1}{some nested structure}{string2}. You can now
>>>>>> read this list as three strings and then continue to read the elements of
>>>>>> the nested structure from the second string. You might even create a
>>>>>> function
>>>>>> that takes a string and outputs a structure. So you are not forced to go
>>>>>> to the full depth of nesting on the first pass. This makes it much easier
>>>>>> to handle deeply nested parameter lists. For me this behavior is an
>>>>>> important
>>>>>> feature and I do not want to drop it. See also my comment on why I do
>>>>>> not always want escaping.
>>>>> Doesn't split_list() already allow this, why do you want to use
>>>>> read_string() to do the same thing as split_list()?
>>>> read_string() and split_list() are very similar and we could live
>>>> without read_string(). It is provided as a counterpart to write_string()
>>>> and for convenience additionally does the unescaping if necessary
>>>> like write_string does the escaping.
>>>> I don't see why this is a problem. It depends on the context which
>>>> is the better function to use.
>>> Again, in my mind a structure is not a string, they are different
>>> types, and I think read_string() should only deal with the string type.
>>> is_unpacked makes the API more complicated, so I'd like to get rid of
>>> it.
>>>
>> I don't see your point. is_unpacked is not part of the read_string()
>> arguments
>> or return value. In split_list() it is definitively needed to indicate
>> if the returned
>> string (in the C sense) contains another list. I can imagine many
>> situations where
>> you might either pass an array or just a single value or even NULL.
>> is_unpacked
>> allows to differentiate between the situations.
> Can you give an example? You say is_unpacked is definitely needed, but
> so far the only use case has been read_string(), which you wanted to be
> used for reading not only string values but everything else too. If
> read_string() is changed to only read string values, then it doesn't
> need is_unpacked from split_list().
>
> The parameter types are known beforehand, so i don't see the need for
> looking at the data to figure out the type. If introspection support is
> desired, then that's a separate project (the is_unpacked flag isn't
> sufficient for proper introspection).
>
This is not about parameter type, it is just to distinguish between
a list and a simple value. One example comes to my mind immediately:
Consider a parameter list that consists of strings and a couple of
arrays. Now you can read this list as an array of strings (patch 8
provides a function for that) and then examine those strings that
contain arrays separately. Having the flag (and using it in read_string())
provides a more flexible and convenient way to parse a parameter list.

The flag may not be strictly necessary at the moment, but I would still
like to keep it.
On Fri, 2018-07-27 at 10:51 +0200, Georg Chini wrote:
> On 27.07.2018 10:08, Tanu Kaskinen wrote:
> > On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
> > > On 26.07.2018 12:37, Tanu Kaskinen wrote:
> > > > On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
> > > > > On 22.07.2018 17:48, Tanu Kaskinen wrote:
> > > > > > On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
> > > > > > > On 21.07.2018 20:17, Tanu Kaskinen wrote:
> > > > > > > > On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> > > > > > > > > +
> > > > > > > > > +/* Read functions */
> > > > > > > > > +
> > > > > > > > >      /* Split the specified string into elements. An element is defined as
> > > > > > > > >       * a sub-string between curly braces. The function is needed to parse
> > > > > > > > >       * the parameters of messages. Each time it is called returns the position
> > > > > > > > >       * of the current element in result and the state pointer is advanced to
> > > > > > > > > - * the next list element.
> > > > > > > > > + * the next list element. On return, the parameter *is_unpacked indicates
> > > > > > > > > + * if the string is plain text or contains a sub-list. is_unpacked may
> > > > > > > > > + * be NULL.
> > > > > > > > 
> > > > > > > > is_unpacked looks like unnecessary complexity.
> > > > > > > > pa_message_params_read_string() should always unescape the value.
> > > > > > > 
> > > > > > > It may be possible, that the string you read is a list. Consider the
> > > > > > > following
> > > > > > > parameter list: {string1}{some nested structure}{string2}. You can now
> > > > > > > read this list as three strings and then continue to read the elements of
> > > > > > > the nested structure from the second string. You might even create a
> > > > > > > function
> > > > > > > that takes a string and outputs a structure. So you are not forced to go
> > > > > > > to the full depth of nesting on the first pass. This makes it much easier
> > > > > > > to handle deeply nested parameter lists. For me this behavior is an
> > > > > > > important
> > > > > > > feature and I do not want to drop it. See also my comment on why I do
> > > > > > > not always want escaping.
> > > > > > 
> > > > > > Doesn't split_list() already allow this, why do you want to use
> > > > > > read_string() to do the same thing as split_list()?
> > > > > 
> > > > > read_string() and split_list() are very similar and we could live
> > > > > without read_string(). It is provided as a counterpart to write_string()
> > > > > and for convenience additionally does the unescaping if necessary
> > > > > like write_string does the escaping.
> > > > > I don't see why this is a problem. It depends on the context which
> > > > > is the better function to use.
> > > > 
> > > > Again, in my mind a structure is not a string, they are different
> > > > types, and I think read_string() should only deal with the string type.
> > > > is_unpacked makes the API more complicated, so I'd like to get rid of
> > > > it.
> > > > 
> > > 
> > > I don't see your point. is_unpacked is not part of the read_string()
> > > arguments
> > > or return value. In split_list() it is definitively needed to indicate
> > > if the returned
> > > string (in the C sense) contains another list. I can imagine many
> > > situations where
> > > you might either pass an array or just a single value or even NULL.
> > > is_unpacked
> > > allows to differentiate between the situations.
> > 
> > Can you give an example? You say is_unpacked is definitely needed, but
> > so far the only use case has been read_string(), which you wanted to be
> > used for reading not only string values but everything else too. If
> > read_string() is changed to only read string values, then it doesn't
> > need is_unpacked from split_list().
> > 
> > The parameter types are known beforehand, so i don't see the need for
> > looking at the data to figure out the type. If introspection support is
> > desired, then that's a separate project (the is_unpacked flag isn't
> > sufficient for proper introspection).
> > 
> 
> This is not about parameter type, it is just to distinguish between
> a list and a simple value. One example comes to my mind immediately:
> Consider a parameter list that consists of strings and a couple of
> arrays. Now you can read this list as an array of strings (patch 8
> provides a function for that) and then examine those strings that
> contain arrays separately. Having the flag (and using it in read_string())
> provides a more flexible and convenient way to parse a parameter list.
> 
> The flag may not be strictly necessary at the moment, but I would still
> like to keep it.

To continue a familiar theme of my review: if there's a
read_string_array() function, I want that to be used only for string
arrays, not a mishmash of random types. There could be a separate
function split_list_into_array() that does something similar to what you wanted to do with read_string_array(). split_list_into_array() wouldn't do special string handling, though, so unescaping would be left to the application. I find that only logical, since other basic types don't get special handling either (i.e. floats aren't converted to C floats).

Your use case could be served with a vararg function that instead of
producing a string array would read into separate variables, like this:

pa_message_params_read(c, state,
                       PA_TYPE_STRING, &param1,
                       PA_TYPE_FLOAT, &param2,
                       PA_TYPE_RAW, &param3,
                       0);

PA_TYPE_RAW would be useful for reading a list (or anything else) into
a C string without unescaping or other processing. There could be also
PA_TYPE_IGNORE for uninteresting variables, and PA_TYPE_*_ARRAY for
arrays of basic types.

(Now it occurred to me that the 'c' argument name that is used in the
parsing functions is a bit weird and unhelpful. Maybe "param_list"
would be good?)
On 29.07.2018 21:47, Tanu Kaskinen wrote:
> On Fri, 2018-07-27 at 10:51 +0200, Georg Chini wrote:
>> On 27.07.2018 10:08, Tanu Kaskinen wrote:
>>> On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
>>>> On 26.07.2018 12:37, Tanu Kaskinen wrote:
>>>>> On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
>>>>>> On 22.07.2018 17:48, Tanu Kaskinen wrote:
>>>>>>> On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
>>>>>>>> On 21.07.2018 20:17, Tanu Kaskinen wrote:
>>>>>>>>> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>>>>>>>>>> +
>>>>>>>>>> +/* Read functions */
>>>>>>>>>> +
>>>>>>>>>>       /* Split the specified string into elements. An element is defined as
>>>>>>>>>>        * a sub-string between curly braces. The function is needed to parse
>>>>>>>>>>        * the parameters of messages. Each time it is called returns the position
>>>>>>>>>>        * of the current element in result and the state pointer is advanced to
>>>>>>>>>> - * the next list element.
>>>>>>>>>> + * the next list element. On return, the parameter *is_unpacked indicates
>>>>>>>>>> + * if the string is plain text or contains a sub-list. is_unpacked may
>>>>>>>>>> + * be NULL.
>>>>>>>>> is_unpacked looks like unnecessary complexity.
>>>>>>>>> pa_message_params_read_string() should always unescape the value.
>>>>>>>> It may be possible, that the string you read is a list. Consider the
>>>>>>>> following
>>>>>>>> parameter list: {string1}{some nested structure}{string2}. You can now
>>>>>>>> read this list as three strings and then continue to read the elements of
>>>>>>>> the nested structure from the second string. You might even create a
>>>>>>>> function
>>>>>>>> that takes a string and outputs a structure. So you are not forced to go
>>>>>>>> to the full depth of nesting on the first pass. This makes it much easier
>>>>>>>> to handle deeply nested parameter lists. For me this behavior is an
>>>>>>>> important
>>>>>>>> feature and I do not want to drop it. See also my comment on why I do
>>>>>>>> not always want escaping.
>>>>>>> Doesn't split_list() already allow this, why do you want to use
>>>>>>> read_string() to do the same thing as split_list()?
>>>>>> read_string() and split_list() are very similar and we could live
>>>>>> without read_string(). It is provided as a counterpart to write_string()
>>>>>> and for convenience additionally does the unescaping if necessary
>>>>>> like write_string does the escaping.
>>>>>> I don't see why this is a problem. It depends on the context which
>>>>>> is the better function to use.
>>>>> Again, in my mind a structure is not a string, they are different
>>>>> types, and I think read_string() should only deal with the string type.
>>>>> is_unpacked makes the API more complicated, so I'd like to get rid of
>>>>> it.
>>>>>
>>>> I don't see your point. is_unpacked is not part of the read_string()
>>>> arguments
>>>> or return value. In split_list() it is definitively needed to indicate
>>>> if the returned
>>>> string (in the C sense) contains another list. I can imagine many
>>>> situations where
>>>> you might either pass an array or just a single value or even NULL.
>>>> is_unpacked
>>>> allows to differentiate between the situations.
>>> Can you give an example? You say is_unpacked is definitely needed, but
>>> so far the only use case has been read_string(), which you wanted to be
>>> used for reading not only string values but everything else too. If
>>> read_string() is changed to only read string values, then it doesn't
>>> need is_unpacked from split_list().
>>>
>>> The parameter types are known beforehand, so i don't see the need for
>>> looking at the data to figure out the type. If introspection support is
>>> desired, then that's a separate project (the is_unpacked flag isn't
>>> sufficient for proper introspection).
>>>
>> This is not about parameter type, it is just to distinguish between
>> a list and a simple value. One example comes to my mind immediately:
>> Consider a parameter list that consists of strings and a couple of
>> arrays. Now you can read this list as an array of strings (patch 8
>> provides a function for that) and then examine those strings that
>> contain arrays separately. Having the flag (and using it in read_string())
>> provides a more flexible and convenient way to parse a parameter list.
>>
>> The flag may not be strictly necessary at the moment, but I would still
>> like to keep it.
> To continue a familiar theme of my review: if there's a
> read_string_array() function, I want that to be used only for string
> arrays, not a mishmash of random types. There could be a separate
> function split_list_into_array() that does something similar to what you wanted to do with read_string_array(). split_list_into_array() wouldn't do special string handling, though, so unescaping would be left to the application. I find that only logical, since other basic types don't get special handling either (i.e. floats aren't converted to C floats).
>
> Your use case could be served with a vararg function that instead of
> producing a string array would read into separate variables, like this:
>
> pa_message_params_read(c, state,
>                         PA_TYPE_STRING, &param1,
>                         PA_TYPE_FLOAT, &param2,
>                         PA_TYPE_RAW, &param3,
>                         0);
>
> PA_TYPE_RAW would be useful for reading a list (or anything else) into
> a C string without unescaping or other processing. There could be also
> PA_TYPE_IGNORE for uninteresting variables, and PA_TYPE_*_ARRAY for
> arrays of basic types.
>
> (Now it occurred to me that the 'c' argument name that is used in the
> parsing functions is a bit weird and unhelpful. Maybe "param_list"
> would be good?)
>
I still don't see your point. Again, the use of is_unpacked is 
transparent for the
user of read_string(), so the user just reads a string without having to 
care about
unescaping. The flag does not complicate the API, it simplifies it 
because the
unescaping is done automatically. Your approach seems unnecessary
complicated to me. A string is a string, even if it contains 
sub-structures. Your
split_list_into_array() function would basically return an array of 
strings, so what
would be the advantage? It would only make parsing more cumbersome, because
the task of unescaping is given to the user instead of doing it 
automatically where
necessary.
Also there is no "mishmash of random types", they are all strings It is 
only the
difference between a "simple" string which needs no further processing and a
"complex" string which needs further parsing.

I am still not willing to drop this - for me it simplifies parsing. How 
do we proceed?
On Sun, 2018-07-29 at 22:36 +0200, Georg Chini wrote:
> On 29.07.2018 21:47, Tanu Kaskinen wrote:
> > On Fri, 2018-07-27 at 10:51 +0200, Georg Chini wrote:
> > > On 27.07.2018 10:08, Tanu Kaskinen wrote:
> > > > On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
> > > > > On 26.07.2018 12:37, Tanu Kaskinen wrote:
> > > > > > On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
> > > > > > > On 22.07.2018 17:48, Tanu Kaskinen wrote:
> > > > > > > > On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
> > > > > > > > > On 21.07.2018 20:17, Tanu Kaskinen wrote:
> > > > > > > > > > On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> > > > > > > > > > > +
> > > > > > > > > > > +/* Read functions */
> > > > > > > > > > > +
> > > > > > > > > > >       /* Split the specified string into elements. An element is defined as
> > > > > > > > > > >        * a sub-string between curly braces. The function is needed to parse
> > > > > > > > > > >        * the parameters of messages. Each time it is called returns the position
> > > > > > > > > > >        * of the current element in result and the state pointer is advanced to
> > > > > > > > > > > - * the next list element.
> > > > > > > > > > > + * the next list element. On return, the parameter *is_unpacked indicates
> > > > > > > > > > > + * if the string is plain text or contains a sub-list. is_unpacked may
> > > > > > > > > > > + * be NULL.
> > > > > > > > > > 
> > > > > > > > > > is_unpacked looks like unnecessary complexity.
> > > > > > > > > > pa_message_params_read_string() should always unescape the value.
> > > > > > > > > 
> > > > > > > > > It may be possible, that the string you read is a list. Consider the
> > > > > > > > > following
> > > > > > > > > parameter list: {string1}{some nested structure}{string2}. You can now
> > > > > > > > > read this list as three strings and then continue to read the elements of
> > > > > > > > > the nested structure from the second string. You might even create a
> > > > > > > > > function
> > > > > > > > > that takes a string and outputs a structure. So you are not forced to go
> > > > > > > > > to the full depth of nesting on the first pass. This makes it much easier
> > > > > > > > > to handle deeply nested parameter lists. For me this behavior is an
> > > > > > > > > important
> > > > > > > > > feature and I do not want to drop it. See also my comment on why I do
> > > > > > > > > not always want escaping.
> > > > > > > > 
> > > > > > > > Doesn't split_list() already allow this, why do you want to use
> > > > > > > > read_string() to do the same thing as split_list()?
> > > > > > > 
> > > > > > > read_string() and split_list() are very similar and we could live
> > > > > > > without read_string(). It is provided as a counterpart to write_string()
> > > > > > > and for convenience additionally does the unescaping if necessary
> > > > > > > like write_string does the escaping.
> > > > > > > I don't see why this is a problem. It depends on the context which
> > > > > > > is the better function to use.
> > > > > > 
> > > > > > Again, in my mind a structure is not a string, they are different
> > > > > > types, and I think read_string() should only deal with the string type.
> > > > > > is_unpacked makes the API more complicated, so I'd like to get rid of
> > > > > > it.
> > > > > > 
> > > > > 
> > > > > I don't see your point. is_unpacked is not part of the read_string()
> > > > > arguments
> > > > > or return value. In split_list() it is definitively needed to indicate
> > > > > if the returned
> > > > > string (in the C sense) contains another list. I can imagine many
> > > > > situations where
> > > > > you might either pass an array or just a single value or even NULL.
> > > > > is_unpacked
> > > > > allows to differentiate between the situations.
> > > > 
> > > > Can you give an example? You say is_unpacked is definitely needed, but
> > > > so far the only use case has been read_string(), which you wanted to be
> > > > used for reading not only string values but everything else too. If
> > > > read_string() is changed to only read string values, then it doesn't
> > > > need is_unpacked from split_list().
> > > > 
> > > > The parameter types are known beforehand, so i don't see the need for
> > > > looking at the data to figure out the type. If introspection support is
> > > > desired, then that's a separate project (the is_unpacked flag isn't
> > > > sufficient for proper introspection).
> > > > 
> > > 
> > > This is not about parameter type, it is just to distinguish between
> > > a list and a simple value. One example comes to my mind immediately:
> > > Consider a parameter list that consists of strings and a couple of
> > > arrays. Now you can read this list as an array of strings (patch 8
> > > provides a function for that) and then examine those strings that
> > > contain arrays separately. Having the flag (and using it in read_string())
> > > provides a more flexible and convenient way to parse a parameter list.
> > > 
> > > The flag may not be strictly necessary at the moment, but I would still
> > > like to keep it.
> > 
> > To continue a familiar theme of my review: if there's a
> > read_string_array() function, I want that to be used only for string
> > arrays, not a mishmash of random types. There could be a separate
> > function split_list_into_array() that does something similar to
> > what you wanted to do with read_string_array().
> > split_list_into_array() wouldn't do special string handling,
> > though, so unescaping would be left to the application. I find that
> > only logical, since other basic types don't get special handling
> > either (i.e. floats aren't converted to C floats).
> > 
> > Your use case could be served with a vararg function that instead of
> > producing a string array would read into separate variables, like this:
> > 
> > pa_message_params_read(c, state,
> >                         PA_TYPE_STRING, &param1,
> >                         PA_TYPE_FLOAT, &param2,
> >                         PA_TYPE_RAW, &param3,
> >                         0);
> > 
> > PA_TYPE_RAW would be useful for reading a list (or anything else) into
> > a C string without unescaping or other processing. There could be also
> > PA_TYPE_IGNORE for uninteresting variables, and PA_TYPE_*_ARRAY for
> > arrays of basic types.
> > 
> > (Now it occurred to me that the 'c' argument name that is used in the
> > parsing functions is a bit weird and unhelpful. Maybe "param_list"
> > would be good?)
> > 
> 
> I still don't see your point. Again, the use of is_unpacked is 
> transparent for the
> user of read_string(), so the user just reads a string without having to 
> care about
> unescaping. The flag does not complicate the API, it simplifies it 
> because the
> unescaping is done automatically.

The flag complicates the split_list() function parameters, but also the
read_string() semantics: you need to explain to the user that unlike
all the other read_foo() functions, read_string() can read any value
and unescaping becomes conditional because of this.

> Your approach seems unnecessary
> complicated to me. A string is a string, even if it contains 
> sub-structures. Your
> split_list_into_array() function would basically return an array of 
> strings, so what
> would be the advantage? It would only make parsing more cumbersome, because
> the task of unescaping is given to the user instead of doing it 
> automatically where
> necessary.

That's why I came up with the vararg function. I agree that
split_list_into_array() is unlikely to be very useful.

> Also there is no "mishmash of random types", they are all strings It is 
> only the
> difference between a "simple" string which needs no further processing and a
> "complex" string which needs further parsing.

The API defines types for the parameters. There are no separate
"simple" and "complex" strings in the type system. The string type is
different than the list type. What you call complex strings are in my
mind just the raw unprocessed serialized data, which is a different
abstraction level than the values that the various read_foo() functions
return. It feels just very wrong to have a function that returns values
in both domains: unparsed raw data and parsed values. Especially when
the function is a family of read_foo() functions where all other
functions only operate on their designated types.

Compare to D-Bus: if you say that "they are all strings", then in D-Bus 
the same would be "they are all byte arrays", since that's what they
are in the message buffer. I doubt you would suggest that the
dbus_message_iter_read_byte_array() function (if such thing existed)
should be usable on arbitrarily typed values rather than just byte
array values, and should return the raw serialized data regardless of
its type. Here you're suggesting that read_string() should be able to
return the raw serialized data regardless of its type.

Do you understand what I'm getting at? You may disagree that having
strict separation of concerns between raw serialized data and parsed
values is beneficial for the user-friendliness of the API, but if you
don't even understand what I mean by that separation, then discussing
this is very hard.
On 30.07.2018 11:18, Tanu Kaskinen wrote:
> On Sun, 2018-07-29 at 22:36 +0200, Georg Chini wrote:
>> On 29.07.2018 21:47, Tanu Kaskinen wrote:
>>> On Fri, 2018-07-27 at 10:51 +0200, Georg Chini wrote:
>>>> On 27.07.2018 10:08, Tanu Kaskinen wrote:
>>>>> On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
>>>>>> On 26.07.2018 12:37, Tanu Kaskinen wrote:
>>>>>>> On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
>>>>>>>> On 22.07.2018 17:48, Tanu Kaskinen wrote:
>>>>>>>>> On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
>>>>>>>>>> On 21.07.2018 20:17, Tanu Kaskinen wrote:
>>>>>>>>>>> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* Read functions */
>>>>>>>>>>>> +
>>>>>>>>>>>>        /* Split the specified string into elements. An element is defined as
>>>>>>>>>>>>         * a sub-string between curly braces. The function is needed to parse
>>>>>>>>>>>>         * the parameters of messages. Each time it is called returns the position
>>>>>>>>>>>>         * of the current element in result and the state pointer is advanced to
>>>>>>>>>>>> - * the next list element.
>>>>>>>>>>>> + * the next list element. On return, the parameter *is_unpacked indicates
>>>>>>>>>>>> + * if the string is plain text or contains a sub-list. is_unpacked may
>>>>>>>>>>>> + * be NULL.
>>>>>>>>>>> is_unpacked looks like unnecessary complexity.
>>>>>>>>>>> pa_message_params_read_string() should always unescape the value.
>>>>>>>>>>
>>>> This is not about parameter type, it is just to distinguish between
>>>> a list and a simple value. One example comes to my mind immediately:
>>>> Consider a parameter list that consists of strings and a couple of
>>>> arrays. Now you can read this list as an array of strings (patch 8
>>>> provides a function for that) and then examine those strings that
>>>> contain arrays separately. Having the flag (and using it in read_string())
>>>> provides a more flexible and convenient way to parse a parameter list.
>>>>
>>>> The flag may not be strictly necessary at the moment, but I would still
>>>> like to keep it.
>>> To continue a familiar theme of my review: if there's a
>>> read_string_array() function, I want that to be used only for string
>>> arrays, not a mishmash of random types. There could be a separate
>>> function split_list_into_array() that does something similar to
>>> what you wanted to do with read_string_array().
>>> split_list_into_array() wouldn't do special string handling,
>>> though, so unescaping would be left to the application. I find that
>>> only logical, since other basic types don't get special handling
>>> either (i.e. floats aren't converted to C floats).
>>>
>>> Your use case could be served with a vararg function that instead of
>>> producing a string array would read into separate variables, like this:
>>>
>>> pa_message_params_read(c, state,
>>>                          PA_TYPE_STRING, &param1,
>>>                          PA_TYPE_FLOAT, &param2,
>>>                          PA_TYPE_RAW, &param3,
>>>                          0);
>>>
>>> PA_TYPE_RAW would be useful for reading a list (or anything else) into
>>> a C string without unescaping or other processing. There could be also
>>> PA_TYPE_IGNORE for uninteresting variables, and PA_TYPE_*_ARRAY for
>>> arrays of basic types.
>>>
>>> (Now it occurred to me that the 'c' argument name that is used in the
>>> parsing functions is a bit weird and unhelpful. Maybe "param_list"
>>> would be good?)
>>>
>> I still don't see your point. Again, the use of is_unpacked is
>> transparent for the
>> user of read_string(), so the user just reads a string without having to
>> care about
>> unescaping. The flag does not complicate the API, it simplifies it
>> because the
>> unescaping is done automatically.
> The flag complicates the split_list() function parameters, but also the
> read_string() semantics: you need to explain to the user that unlike
> all the other read_foo() functions, read_string() can read any value
> and unescaping becomes conditional because of this.

read_string() is not supposed to read any value, it is only supposed
to read strings. Actually, the user does not need to know anything
about escaping, because read_string() and write_string() do the
necessary escaping/unescaping completely transparent for the user.

The is_unpacked flag is at least useful to check if something returned
by split_list() is really a simple type and not a structure or array. I am
currently not using it in the read_foo() functions, but I think I should.

>
>> Your approach seems unnecessary
>> complicated to me. A string is a string, even if it contains
>> sub-structures. Your
>> split_list_into_array() function would basically return an array of
>> strings, so what
>> would be the advantage? It would only make parsing more cumbersome, because
>> the task of unescaping is given to the user instead of doing it
>> automatically where
>> necessary.
> That's why I came up with the vararg function. I agree that
> split_list_into_array() is unlikely to be very useful.
>
>> Also there is no "mishmash of random types", they are all strings It is
>> only the
>> difference between a "simple" string which needs no further processing and a
>> "complex" string which needs further parsing.
> The API defines types for the parameters. There are no separate
> "simple" and "complex" strings in the type system. The string type is
> different than the list type. What you call complex strings are in my
> mind just the raw unprocessed serialized data, which is a different
> abstraction level than the values that the various read_foo() functions
> return. It feels just very wrong to have a function that returns values
> in both domains: unparsed raw data and parsed values. Especially when
> the function is a family of read_foo() functions where all other
> functions only operate on their designated types.

Yes, that is some additional functionality that the read_string()
function provides on top of reading what I called "simple strings"
above. But that does not hinder the function to work exactly
like expected when reading a plain string. Maybe a better name
for the function would be read_string_element(), indicating that
it can read either a plain string or an element of the parameter
list as a string.
I could split this into two functions, read_string() which always
does unescaping and read_element() which would be a wrapper
around split_list() and would never do unescaping. In both functions,
the is_unpacked flag can be useful to check if the input matches
the type "plain string" or "serialized data".

>
> Compare to D-Bus: if you say that "they are all strings", then in D-Bus
> the same would be "they are all byte arrays", since that's what they
> are in the message buffer. I doubt you would suggest that the
> dbus_message_iter_read_byte_array() function (if such thing existed)
> should be usable on arbitrarily typed values rather than just byte
> array values, and should return the raw serialized data regardless of
> its type. Here you're suggesting that read_string() should be able to
> return the raw serialized data regardless of its type.
>
> Do you understand what I'm getting at? You may disagree that having
> strict separation of concerns between raw serialized data and parsed
> values is beneficial for the user-friendliness of the API, but if you
> don't even understand what I mean by that separation, then discussing
> this is very hard.
>
I do understand your reasoning and technically we surely could drop
the flag. I just don't want it because for me using read_string() is a
more convenient way of parsing a parameter list than using split_list()
directly. It's just a simple way to say "let's skip this array or structure
for the moment and store it somewhere for later inspection". So I can
do something like

x = read_float()
y = read_int()
my_structure = read_string()
z = read_bool()
a = read_string()

and then examine my_structure at a later time. Using split_list() instead
of read_string() is somehow breaking the flow for me. We could do the
same with the read_element() function I suggested above (and then
maybe make split_list() a purely internal function?)
On Mon, 2018-07-30 at 13:23 +0200, Georg Chini wrote:
> On 30.07.2018 11:18, Tanu Kaskinen wrote:
> > On Sun, 2018-07-29 at 22:36 +0200, Georg Chini wrote:
> > > On 29.07.2018 21:47, Tanu Kaskinen wrote:
> > > > On Fri, 2018-07-27 at 10:51 +0200, Georg Chini wrote:
> > > > > On 27.07.2018 10:08, Tanu Kaskinen wrote:
> > > > > > On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
> > > > > > > On 26.07.2018 12:37, Tanu Kaskinen wrote:
> > > > > > > > On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
> > > > > > > > > On 22.07.2018 17:48, Tanu Kaskinen wrote:
> > > > > > > > > > On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
> > > > > > > > > > > On 21.07.2018 20:17, Tanu Kaskinen wrote:
> > > > > > > > > > > > On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Read functions */
> > > > > > > > > > > > > +
> > > > > > > > > > > > >        /* Split the specified string into elements. An element is defined as
> > > > > > > > > > > > >         * a sub-string between curly braces. The function is needed to parse
> > > > > > > > > > > > >         * the parameters of messages. Each time it is called returns the position
> > > > > > > > > > > > >         * of the current element in result and the state pointer is advanced to
> > > > > > > > > > > > > - * the next list element.
> > > > > > > > > > > > > + * the next list element. On return, the parameter *is_unpacked indicates
> > > > > > > > > > > > > + * if the string is plain text or contains a sub-list. is_unpacked may
> > > > > > > > > > > > > + * be NULL.
> > > > > > > > > > > > 
> > > > > > > > > > > > is_unpacked looks like unnecessary complexity.
> > > > > > > > > > > > pa_message_params_read_string() should always unescape the value.
> > > > > 
> > > > > This is not about parameter type, it is just to distinguish between
> > > > > a list and a simple value. One example comes to my mind immediately:
> > > > > Consider a parameter list that consists of strings and a couple of
> > > > > arrays. Now you can read this list as an array of strings (patch 8
> > > > > provides a function for that) and then examine those strings that
> > > > > contain arrays separately. Having the flag (and using it in read_string())
> > > > > provides a more flexible and convenient way to parse a parameter list.
> > > > > 
> > > > > The flag may not be strictly necessary at the moment, but I would still
> > > > > like to keep it.
> > > > 
> > > > To continue a familiar theme of my review: if there's a
> > > > read_string_array() function, I want that to be used only for string
> > > > arrays, not a mishmash of random types. There could be a separate
> > > > function split_list_into_array() that does something similar to
> > > > what you wanted to do with read_string_array().
> > > > split_list_into_array() wouldn't do special string handling,
> > > > though, so unescaping would be left to the application. I find that
> > > > only logical, since other basic types don't get special handling
> > > > either (i.e. floats aren't converted to C floats).
> > > > 
> > > > Your use case could be served with a vararg function that instead of
> > > > producing a string array would read into separate variables, like this:
> > > > 
> > > > pa_message_params_read(c, state,
> > > >                          PA_TYPE_STRING, &param1,
> > > >                          PA_TYPE_FLOAT, &param2,
> > > >                          PA_TYPE_RAW, &param3,
> > > >                          0);
> > > > 
> > > > PA_TYPE_RAW would be useful for reading a list (or anything else) into
> > > > a C string without unescaping or other processing. There could be also
> > > > PA_TYPE_IGNORE for uninteresting variables, and PA_TYPE_*_ARRAY for
> > > > arrays of basic types.
> > > > 
> > > > (Now it occurred to me that the 'c' argument name that is used in the
> > > > parsing functions is a bit weird and unhelpful. Maybe "param_list"
> > > > would be good?)
> > > > 
> > > 
> > > I still don't see your point. Again, the use of is_unpacked is
> > > transparent for the
> > > user of read_string(), so the user just reads a string without having to
> > > care about
> > > unescaping. The flag does not complicate the API, it simplifies it
> > > because the
> > > unescaping is done automatically.
> > 
> > The flag complicates the split_list() function parameters, but also the
> > read_string() semantics: you need to explain to the user that unlike
> > all the other read_foo() functions, read_string() can read any value
> > and unescaping becomes conditional because of this.
> 
> read_string() is not supposed to read any value, it is only supposed
> to read strings. Actually, the user does not need to know anything
> about escaping, because read_string() and write_string() do the
> necessary escaping/unescaping completely transparent for the user.
> 
> The is_unpacked flag is at least useful to check if something returned
> by split_list() is really a simple type and not a structure or array. I am
> currently not using it in the read_foo() functions, but I think I should.

I guess you'd use is_unpacked for catching errors in the read_foo()
functions? That seems reasonable, but I'd like that to be done in an
internal function. You said that maybe split_list() could be made an
internal function, and that seems like a good idea. An alternative
would be to have an internal function for which split_list() would be a
simplified wrapper.

> > 
> > > Your approach seems unnecessary
> > > complicated to me. A string is a string, even if it contains
> > > sub-structures. Your
> > > split_list_into_array() function would basically return an array of
> > > strings, so what
> > > would be the advantage? It would only make parsing more cumbersome, because
> > > the task of unescaping is given to the user instead of doing it
> > > automatically where
> > > necessary.
> > 
> > That's why I came up with the vararg function. I agree that
> > split_list_into_array() is unlikely to be very useful.
> > 
> > > Also there is no "mishmash of random types", they are all strings It is
> > > only the
> > > difference between a "simple" string which needs no further processing and a
> > > "complex" string which needs further parsing.
> > 
> > The API defines types for the parameters. There are no separate
> > "simple" and "complex" strings in the type system. The string type is
> > different than the list type. What you call complex strings are in my
> > mind just the raw unprocessed serialized data, which is a different
> > abstraction level than the values that the various read_foo() functions
> > return. It feels just very wrong to have a function that returns values
> > in both domains: unparsed raw data and parsed values. Especially when
> > the function is a family of read_foo() functions where all other
> > functions only operate on their designated types.
> 
> Yes, that is some additional functionality that the read_string()
> function provides on top of reading what I called "simple strings"
> above. But that does not hinder the function to work exactly
> like expected when reading a plain string. Maybe a better name
> for the function would be read_string_element(), indicating that
> it can read either a plain string or an element of the parameter
> list as a string.
> I could split this into two functions, read_string() which always
> does unescaping and read_element() which would be a wrapper
> around split_list() and would never do unescaping. In both functions,
> the is_unpacked flag can be useful to check if the input matches
> the type "plain string" or "serialized data".

A separate function for reading the raw data sounds good to me. Are you
attached to the read_element() name, or could it be read_raw() (or
perhaps read_raw_value() or read_raw_element())? Somehow read_raw()
seems better to me, but I can't make strong arguments for why it should
be called that.
On 01.08.2018 13:13, Tanu Kaskinen wrote:
> On Mon, 2018-07-30 at 13:23 +0200, Georg Chini wrote:
>> On 30.07.2018 11:18, Tanu Kaskinen wrote:
>>> On Sun, 2018-07-29 at 22:36 +0200, Georg Chini wrote:
>>>> On 29.07.2018 21:47, Tanu Kaskinen wrote:
>>>>> On Fri, 2018-07-27 at 10:51 +0200, Georg Chini wrote:
>>>>>> On 27.07.2018 10:08, Tanu Kaskinen wrote:
>>>>>>> On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
>>>>>>>> On 26.07.2018 12:37, Tanu Kaskinen wrote:
>>>>>>>>> On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
>>>>>>>>>> On 22.07.2018 17:48, Tanu Kaskinen wrote:
>>>>>>>>>>> On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
>>>>>>>>>>>> On 21.07.2018 20:17, Tanu Kaskinen wrote:
>>>>>>>>>>>>> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/* Read functions */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>         /* Split the specified string into elements. An element is defined as
>>>>>>>>>>>>>>          * a sub-string between curly braces. The function is needed to parse
>>>>>>>>>>>>>>          * the parameters of messages. Each time it is called returns the position
>>>>>>>>>>>>>>          * of the current element in result and the state pointer is advanced to
>>>>>>>>>>>>>> - * the next list element.
>>>>>>>>>>>>>> + * the next list element. On return, the parameter *is_unpacked indicates
>>>>>>>>>>>>>> + * if the string is plain text or contains a sub-list. is_unpacked may
>>>>>>>>>>>>>> + * be NULL.
>>>>>>>>>>>>> is_unpacked looks like unnecessary complexity.
>>>>>>>>>>>>> pa_message_params_read_string() should always unescape the value.
>>>>>> This is not about parameter type, it is just to distinguish between
>>>>>> a list and a simple value. One example comes to my mind immediately:
>>>>>> Consider a parameter list that consists of strings and a couple of
>>>>>> arrays. Now you can read this list as an array of strings (patch 8
>>>>>> provides a function for that) and then examine those strings that
>>>>>> contain arrays separately. Having the flag (and using it in read_string())
>>>>>> provides a more flexible and convenient way to parse a parameter list.
>>>>>>
>>>>>> The flag may not be strictly necessary at the moment, but I would still
>>>>>> like to keep it.
>>>>> To continue a familiar theme of my review: if there's a
>>>>> read_string_array() function, I want that to be used only for string
>>>>> arrays, not a mishmash of random types. There could be a separate
>>>>> function split_list_into_array() that does something similar to
>>>>> what you wanted to do with read_string_array().
>>>>> split_list_into_array() wouldn't do special string handling,
>>>>> though, so unescaping would be left to the application. I find that
>>>>> only logical, since other basic types don't get special handling
>>>>> either (i.e. floats aren't converted to C floats).
>>>>>
>>>>> Your use case could be served with a vararg function that instead of
>>>>> producing a string array would read into separate variables, like this:
>>>>>
>>>>> pa_message_params_read(c, state,
>>>>>                           PA_TYPE_STRING, &param1,
>>>>>                           PA_TYPE_FLOAT, &param2,
>>>>>                           PA_TYPE_RAW, &param3,
>>>>>                           0);
>>>>>
>>>>> PA_TYPE_RAW would be useful for reading a list (or anything else) into
>>>>> a C string without unescaping or other processing. There could be also
>>>>> PA_TYPE_IGNORE for uninteresting variables, and PA_TYPE_*_ARRAY for
>>>>> arrays of basic types.
>>>>>
>>>>> (Now it occurred to me that the 'c' argument name that is used in the
>>>>> parsing functions is a bit weird and unhelpful. Maybe "param_list"
>>>>> would be good?)
>>>>>
>>>> I still don't see your point. Again, the use of is_unpacked is
>>>> transparent for the
>>>> user of read_string(), so the user just reads a string without having to
>>>> care about
>>>> unescaping. The flag does not complicate the API, it simplifies it
>>>> because the
>>>> unescaping is done automatically.
>>> The flag complicates the split_list() function parameters, but also the
>>> read_string() semantics: you need to explain to the user that unlike
>>> all the other read_foo() functions, read_string() can read any value
>>> and unescaping becomes conditional because of this.
>> read_string() is not supposed to read any value, it is only supposed
>> to read strings. Actually, the user does not need to know anything
>> about escaping, because read_string() and write_string() do the
>> necessary escaping/unescaping completely transparent for the user.
>>
>> The is_unpacked flag is at least useful to check if something returned
>> by split_list() is really a simple type and not a structure or array. I am
>> currently not using it in the read_foo() functions, but I think I should.
> I guess you'd use is_unpacked for catching errors in the read_foo()
> functions? That seems reasonable, but I'd like that to be done in an
> internal function. You said that maybe split_list() could be made an
> internal function, and that seems like a good idea. An alternative
> would be to have an internal function for which split_list() would be a
> simplified wrapper.

I guess I can make split_list() internal.

>
>>>> Your approach seems unnecessary
>>>> complicated to me. A string is a string, even if it contains
>>>> sub-structures. Your
>>>> split_list_into_array() function would basically return an array of
>>>> strings, so what
>>>> would be the advantage? It would only make parsing more cumbersome, because
>>>> the task of unescaping is given to the user instead of doing it
>>>> automatically where
>>>> necessary.
>>> That's why I came up with the vararg function. I agree that
>>> split_list_into_array() is unlikely to be very useful.
>>>
>>>> Also there is no "mishmash of random types", they are all strings It is
>>>> only the
>>>> difference between a "simple" string which needs no further processing and a
>>>> "complex" string which needs further parsing.
>>> The API defines types for the parameters. There are no separate
>>> "simple" and "complex" strings in the type system. The string type is
>>> different than the list type. What you call complex strings are in my
>>> mind just the raw unprocessed serialized data, which is a different
>>> abstraction level than the values that the various read_foo() functions
>>> return. It feels just very wrong to have a function that returns values
>>> in both domains: unparsed raw data and parsed values. Especially when
>>> the function is a family of read_foo() functions where all other
>>> functions only operate on their designated types.
>> Yes, that is some additional functionality that the read_string()
>> function provides on top of reading what I called "simple strings"
>> above. But that does not hinder the function to work exactly
>> like expected when reading a plain string. Maybe a better name
>> for the function would be read_string_element(), indicating that
>> it can read either a plain string or an element of the parameter
>> list as a string.
>> I could split this into two functions, read_string() which always
>> does unescaping and read_element() which would be a wrapper
>> around split_list() and would never do unescaping. In both functions,
>> the is_unpacked flag can be useful to check if the input matches
>> the type "plain string" or "serialized data".
> A separate function for reading the raw data sounds good to me. Are you
> attached to the read_element() name, or could it be read_raw() (or
> perhaps read_raw_value() or read_raw_element())? Somehow read_raw()
> seems better to me, but I can't make strong arguments for why it should
> be called that.
>
read_raw() seems OK as a counterpart to write_raw().