From fe6367595a003cda046d9351bb4d085b53ac7927 Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Wed, 22 May 2024 10:06:25 -0400 Subject: [PATCH 01/11] clarify integer overflow behaviour, order of evaluation Signed-off-by: Calum Murray --- cesql/spec.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index 921c40d7..b48938b9 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -95,7 +95,7 @@ INT(hop) < INT(ttl) AND INT(hop) < 1000 The root of the expression is the `expression` rule: ```ebnf -expression ::= value-identifier | boolean-literal | unary-operation | binary-operation | function-invocation | like-operation | exists-operation | in-operation | ( "(" expression ")" ) +expression ::= value-identifier | literal | unary-operation | binary-operation | function-invocation | like-operation | exists-operation | in-operation | ( "(" expression ")" ) ``` Nested expressions MUST be correctly parenthesized. @@ -111,17 +111,17 @@ lowercase-char ::= [a-z] value-identifier ::= ( lowercase-char | digit ) ( lowercase-char | digit )* ``` -CESQL defines 3 different literal kinds: integer numbers, `true` or `false` booleans, and `''` or `""` delimited strings. +CESQL defines 3 different literal kinds: integer numbers, `true` or `false` booleans, and `''` or `""` delimited strings. Integer literals MUST be valid 32 bit signed integer values. ```ebnf digit ::= [0-9] -number-literal ::= digit+ +integer-literal ::= ( '+' | '-' ) digit+ boolean-literal ::= "true" | "false" (* Case insensitive *) string-literal ::= ( "'" ( [^'] | "\'" )* "'" ) | ( '"' ( [^"] | '\"' )* '"') -literal ::= number-literal | boolean-literal | string-literal +literal ::= integer-literal | boolean-literal | string-literal ``` Because string literals can be either `''` or `""` delimited, in the former case, the `'` has to be escaped, while in @@ -200,6 +200,8 @@ _Boolean_. When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to `false`. For example, `true AND (missingAttribute = "")` would evaluate to `false` as the subexpression `missingAttribute = ""` would be false. +When addressing an attribute represented by the _Integer_ type, if the value of that attribute does not fit into a 32 bit signed integer, an error MUST be returned. + ### 3.3. Errors Because every operator and function is total, an expression evaluation flow is defined statically and cannot be modified @@ -341,12 +343,11 @@ The following tables show the built-in functions that MUST be supported by a CES | Definition | Semantics | | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `ABS(x): Integer -> Integer` | Returns the absolute value of `x` | +| `ABS(x): Integer -> Integer` | Returns the absolute value of `x`. If the value of `x` is `-2147483648` (the most negative 32 bit integer value possible), then this returns `2147483647` as well as a math error. | ### 3.6. Evaluation of the expression -Operators MUST be evaluated in order, where the parenthesized expressions have the highest priority over all the other -operators. +Operators MUST be evaluated in left to right order, where all operators within a parenthized expression MUST be evaluated before continuing to the right of the parenthized expression. AND and OR operations MUST be short-circuit evaluated. When the left operand of the AND operation evaluates to `false`, the right operand MUST NOT be evaluated. Similarly, when the left operand of the OR operation evalues to `true`, the right operand MUST NOT be evaluated. From 0f23dedc9e58051d1238f171a2a742d792f11caf Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Wed, 22 May 2024 20:53:46 -0400 Subject: [PATCH 02/11] Address feedback from @duglin * Link to ISO 9075 * Fix some typos * Clarify return types and errors * Simplify ebnf definitions for value identifiers * Clarify type casting and behaviour for operators Signed-off-by: Calum Murray --- cesql/spec.md | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index b48938b9..6f09b3a2 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -41,11 +41,11 @@ CloudEvent instances. CloudEvents SQL expressions (also known as CESQL) allow computing values and matching of CloudEvent attributes against complex expressions that lean on the syntax of Structured Query Language (SQL) `WHERE` clauses. Using SQL-derived expressions for message filtering has widespread implementation usage because the Java Message Service (JMS) message selector syntax also leans -on SQL. Note that neither the SQL standard (ISO 9075) nor the JMS standard nor any other SQL dialect is used as a +on SQL. Note that neither the [SQL standard (ISO 9075)][iso-9075] nor the JMS standard nor any other SQL dialect is used as a normative foundation or to constrain the expression syntax defined in this specification, but the syntax is informed by them. -CESQL is a _[Total pure functional programming language][total-programming-language-wiki]_ in order to guarantee the +CESQL is a _[total pure functional programming language][total-programming-language-wiki]_ in order to guarantee the termination of the evaluation of the expression. It features a type system correlated to the [CloudEvents type system][ce-type-system], and it features boolean and arithmetic operations, as well as built-in functions for string manipulation. @@ -56,7 +56,10 @@ producer, or in an intermediary, and it can be implemented using any technology The CloudEvents Expression Language assumes the input always includes, but is not limited to, a single valid and type-checked CloudEvent instance. An expression MUST NOT mutate the value of the input CloudEvent instance, nor any of the other input values. The evaluation of an expression observes the concept of [referential -transparency][referential-transparency-wiki]. The output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_, and it might include an error. +transparency][referential-transparency-wiki]. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_. +The secondary output of a CESQL expression evaluation is a set of errors which occurred during evaluation. This set MAY be empty, indicating that no +error occurred during execution of the expression. The value used by CESQL engines to represent an empty set of errors is out of the scope of this +specification. The CloudEvents Expression Language doesn't support the handling of the data field of the CloudEvent instances, due to its polymorphic nature and complexity. Users that need this functionality ought to use other more appropriate tools. @@ -70,7 +73,8 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S The CESQL can be used as a [filter dialect][subscriptions-filter-dialect] to filter on the input values. -When used as a filter predicate, the expression output value is always cast to a boolean value. +When used as a filter predicate, the expression output value MUST be a _Boolean_. If the output value is not a _Boolean_ or any errors are returned, +the event MUST NOT pass the filter. @@ -103,12 +107,12 @@ Nested expressions MUST be correctly parenthesized. ### 2.2. Value identifiers and literals Value identifiers in CESQL MUST follow the same restrictions of the [Attribute Naming -Convention][ce-attribute-naming-convention] from the CloudEvents spec. A value identifier MUST NOT be greater than 20 +Convention][ce-attribute-naming-convention] from the CloudEvents spec. A value identifier SHOULD NOT be greater than 20 characters in length. ```ebnf lowercase-char ::= [a-z] -value-identifier ::= ( lowercase-char | digit ) ( lowercase-char | digit )* +value-identifier ::= ( lowercase-char | digit )+ ``` CESQL defines 3 different literal kinds: integer numbers, `true` or `false` booleans, and `''` or `""` delimited strings. Integer literals MUST be valid 32 bit signed integer values. @@ -124,8 +128,8 @@ string-literal ::= ( "'" ( [^'] | "\'" )* "'" ) | ( '"' ( [^"] | '\"' )* '"') literal ::= integer-literal | boolean-literal | string-literal ``` -Because string literals can be either `''` or `""` delimited, in the former case, the `'` has to be escaped, while in -the latter the `"` has to be escaped. +Because string literals can be either `''` or `""` delimited, in the former case, the `'` character has to be escaped if it is to be used in the string literal, while in +the latter the `"` has to be escaped if it is to be used in the string literal. ### 2.3. Operators @@ -200,8 +204,6 @@ _Boolean_. When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to `false`. For example, `true AND (missingAttribute = "")` would evaluate to `false` as the subexpression `missingAttribute = ""` would be false. -When addressing an attribute represented by the _Integer_ type, if the value of that attribute does not fit into a 32 bit signed integer, an error MUST be returned. - ### 3.3. Errors Because every operator and function is total, an expression evaluation flow is defined statically and cannot be modified @@ -214,7 +216,8 @@ runtime errors. ### 3.4. Operators -The following tables show the operators that MUST be supported by a CESQL evaluator. +The following tables show the operators that MUST be supported by a CESQL evaluator. When evaluating an operator, +a CESQL engine MUST attempt to cast the operands to the correct types. If this type casting fails, a _Cast_ error will be returned, along with the zero value for the return type of the expression. All the operators in the following tables are listed in precedence order. @@ -471,3 +474,4 @@ hop < ttl [subscriptions-filter-dialect]: ../subscriptions/spec.md#3241-filter-dialects [ebnf-xml-spec]: https://www.w3.org/TR/REC-xml/#sec-notation [modulo-operation-wiki]: https://en.wikipedia.org/wiki/Modulo_operation +[iso-9075]: https://en.wikipedia.org/wiki/ISO/IEC_9075 From 53d6890a7a8a03e9fa1876245e6b0512c1edfa8a Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Thu, 23 May 2024 11:13:15 -0400 Subject: [PATCH 03/11] further fixes from review by @duglin Signed-off-by: Calum Murray --- cesql/spec.md | 77 ++++++++++++++++++++++++++------------------------- 1 file changed, 39 insertions(+), 38 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index 6f09b3a2..f541784f 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -234,34 +234,29 @@ Corresponds to the syntactic rule `unary-operation`: Corresponds to the syntactic rule `binary-operation`: -| Definition | Semantics | -| --------------------------------------- | ---------------------------------------------------------------------------------------------------------- | -| `x = y: Boolean x Boolean -> Boolean` | Returns `true` if the values of `x` and `y` are equal | -| `x != y: Boolean x Boolean -> Boolean` | Same as `NOT (x = y)` | -| `x <> y: Boolean x Boolean -> Boolean` | Same as `NOT (x = y)` | -| `x AND y: Boolean x Boolean -> Boolean` | Returns the logical and of `x` and `y` | -| `x OR y: Boolean x Boolean -> Boolean` | Returns the logical or of `x` and `y` | -| `x XOR y: Boolean x Boolean -> Boolean` | Returns the logical xor of `x` and `y` | -| `x = y: Integer x Integer -> Boolean` | Returns `true` if the values of `x` and `y` are equal | -| `x != y: Integer x Integer -> Boolean` | Same as `NOT (x = y)` | -| `x <> y: Integer x Integer -> Boolean` | Same as `NOT (x = y)` | -| `x < y: Integer x Integer -> Boolean` | Returns `true` if `x` is strictly less than `y` | -| `x <= y: Integer x Integer -> Boolean` | Returns `true` if `x` is less than or equal to `y` | -| `x > y: Integer x Integer -> Boolean` | Returns `true` if `x` is strictly greater than `y` | -| `x >= y: Integer x Integer -> Boolean` | Returns `true` if `x` is greater or equal to `y` | -| `x * y: Integer x Integer -> Integer` | Returns the product of `x` and `y` | -| `x / y: Integer x Integer -> Integer` | Returns the truncated division of `x` and `y`. Returns `0` and raises an error if `y = 0` | -| `x % y: Integer x Integer -> Integer` | Returns the remainder of the truncated division of `x` and `y`. Returns `0` and raises an error if `y = 0` | -| `x + y: Integer x Integer -> Integer` | Returns the sum of `x` and `y` | -| `x - y: Integer x Integer -> Integer` | Returns the difference of `x` and `y` | -| `x = y: String x String -> Boolean` | Returns `true` if the values of `x` and `y` are equal | -| `x != y: String x String -> Boolean` | Same as `NOT (x = y)` | -| `x <> y: String x String -> Boolean` | Same as `NOT (x = y)` | - -The modulo and divisions MUST follow the [truncated divisions definition][modulo-operation-wiki], that is: - -- The remainder of the modulo MUST have the same sign as the dividend. -- The quotient MUST be rounded towards zero, _truncating_ the decimal part. +| Definition | Semantics | +| --------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| `x = y: Boolean x Boolean -> Boolean` | Returns `true` if the values of `x` and `y` are equal | +| `x != y: Boolean x Boolean -> Boolean` | Same as `NOT (x = y)` | +| `x <> y: Boolean x Boolean -> Boolean` | Same as `NOT (x = y)` | +| `x AND y: Boolean x Boolean -> Boolean` | Returns the logical and of `x` and `y` | +| `x OR y: Boolean x Boolean -> Boolean` | Returns the logical or of `x` and `y` | +| `x XOR y: Boolean x Boolean -> Boolean` | Returns the logical xor of `x` and `y` | +| `x = y: Integer x Integer -> Boolean` | Returns `true` if the values of `x` and `y` are equal | +| `x != y: Integer x Integer -> Boolean` | Same as `NOT (x = y)` | +| `x <> y: Integer x Integer -> Boolean` | Same as `NOT (x = y)` | +| `x < y: Integer x Integer -> Boolean` | Returns `true` if `x` is less than `y` | +| `x <= y: Integer x Integer -> Boolean` | Returns `true` if `x` is less than or equal to `y` | +| `x > y: Integer x Integer -> Boolean` | Returns `true` if `x` is greater than `y` | +| `x >= y: Integer x Integer -> Boolean` | Returns `true` if `x` is greater than or equal to `y` | +| `x * y: Integer x Integer -> Integer` | Returns the product of `x` and `y` | +| `x / y: Integer x Integer -> Integer` | Returns the result of dividing `x` by `y`, rounded towards `0` to obtain an integer. Returns `0` and raises an error if `y = 0` | +| `x % y: Integer x Integer -> Integer` | Returns the remainder of `x` divided by `y`, where the result has the same sign as `x`. Returns `0` and raises an error if `y = 0` | +| `x + y: Integer x Integer -> Integer` | Returns the sum of `x` and `y` | +| `x - y: Integer x Integer -> Integer` | Returns the difference of `x` and `y` | +| `x = y: String x String -> Boolean` | Returns `true` if the values of `x` and `y` are equal | +| `x != y: String x String -> Boolean` | Same as `NOT (x = y)` | +| `x <> y: String x String -> Boolean` | Same as `NOT (x = y)` | The AND and OR operators MUST be short-circuit evaluated. This means that whenever the left operand of the AND operation evaluates to `false`, the right operand MUST NOT be evaluated. Similarly, whenever the left operand of the OR operation evaluates to `true`, the right operand MUST NOT be evaluated. @@ -303,7 +298,8 @@ The matching is done using the same semantics of the equal `=` operator, but usi ### 3.5. Functions -CESQL provides the concept of function, and defines some built-in that every engine MUST implement. An engine SHOULD also allow users to define their custom functions. +CESQL provides the concept of function, and defines some built-in functions that every engine MUST implement. An engine SHOULD also allow users to define their custom functions, however, the mechanism +by which this is done is out of scope of this specification. A function is identified by its name, its parameters and the return value. A function can be variadic, that is the arity is not fixed. @@ -312,18 +308,23 @@ Because of implicit casting, no functions with the same name and same arity but An overload on a variadic function is allowed only if the number of initial fixed arguments is greater than the maximum arity for that particular function name. Only one variadic overload is allowed. -For example, the following definitions are valid: +For example, the following set of definitions are valid and will all be allowed by the rules: * `ABC(x): String -> Integer`: Unary function (arity 1). * `ABC(x, y): String x String -> Integer`: Binary function (arity 2). * `ABC(x, y, z, ...): String x String x String x String^n -> Integer`: n-ary function (variable arity), but the initial fixed arguments are at least 3. -But the following are invalid, so the engine MUST reject them: +But the following set is invalid, so the engine MUST reject them: * `ABC(x...): String^n -> Integer`: n-ary function (variable arity), but there are no initial fixed arguments. * `ABC(x, y, z): String x String x String -> Integer`: Ternary function (arity 3). -When a function invocation cannot be dispatched, the return value is undefined. +These two are incompatible because the n-ary function `ABC(x...)` can not be distinguished in any way +from the ternary function `ABC(x, y, z)` if the n-ary function were called with three arguments. +In order for these definitions to be valid, the n-ary function would need to have at least 4 fixed +arguments. + +When a function invocation cannot be dispatched, the return value is `false`, and a `_FunctionError` is also returned. The following tables show the built-in functions that MUST be supported by a CESQL evaluator. @@ -333,10 +334,10 @@ The following tables show the built-in functions that MUST be supported by a CES | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `LENGTH(x): String -> Integer` | Returns the character length of the String `x`. | | `CONCAT(x1, x2, ...): String^n -> String` | Returns the concatenation of `x1` up to `xN`. | -| `CONCAT_WS(delimiter, x1, x2, ...): String x String^n -> String` | Returns the concatenation of `x1` up to `xN`, using the `delimiter`. | +| `CONCAT_WS(delimiter, x1, x2, ...): String x String^n -> String` | Returns the concatenation of `x1` up to `xN`, using the `delimiter` between each string, but not before `x1` or after `xN`. | | `LOWER(x): String -> String` | Returns `x` in lowercase. | | `UPPER(x): String -> String` | Returns `x` in uppercase. | -| `TRIM(x): String -> String` | Returns `x` with leading and trailing trimmed whitespaces. | +| `TRIM(x): String -> String` | Returns `x` with leading and trailing whitespaces trimmed. This does not remove any non-whitespace characters, such as control characters. | | `LEFT(x, y): String x Integer -> String` | Returns a new string with the first `y` characters of `x`, or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and raises an error. | | `RIGHT(x, y): String x Integer -> String` | Returns a new string with the last `y` characters of `x` or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and raises an error. | | `SUBSTRING(x, pos): String x Integer x Integer -> String` | Returns the substring of `x` starting from index `pos` (included) up to the end of `x`. Characters' index starts from `1`. If `pos` is negative, the beginning of the substring is `pos` characters from the end of the string. If `pos` is 0, then returns the empty string. Returns the empty string and raises an error if `pos > LENGTH(x) OR pos < -LENGTH(x)`. | @@ -359,8 +360,9 @@ left operand of the OR operation evalues to `true`, the right operand MUST NOT b When the argument types of an operator/function invocation don't match the signature of the operator/function being invoked, the CESQL engine MUST try to perform an implicit cast. -This section defines an **ambiguous** operator/function as an operator/function that is overloaded with another -operator/function definition with same symbol/name and arity but different parameter types. +This section defines an **ambiguous** operator as an operator that is overloaded with another +operator definition with same symbol/name and arity but different parameter types. Note: a function +can not be ambiguous as it is not allowed for two functions to have the same arity and name. A CESQL engine MUST apply the following implicit casting rules in order: @@ -374,8 +376,7 @@ A CESQL engine MUST apply the following implicit casting rules in order: 1. If such operator definition exists and is unique, cast `x` to the type of the left parameter 1. Otherwise, raise an error and the cast results are undefined 1. If the function is n-ary with `n > 1`: - 1. If it's not ambiguous, cast all the arguments to the corresponding parameter types. - 1. If it's ambiguous, raise an error and the cast results are undefined. + 1. Cast all the arguments to the corresponding parameter types. 1. If the operator is n-ary with `n > 2`: 1. If it's not ambiguous, cast all the operands to the target type. 1. If it's ambiguous, raise an error and the cast results are undefined. From dca51402a954c0042ea19c2c97a18a3a99c9d8d1 Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Mon, 27 May 2024 16:05:36 -0400 Subject: [PATCH 04/11] clarify error handling and zero values Signed-off-by: Calum Murray --- cesql/spec.md | 56 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 10 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index f541784f..fe9fc2cd 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -76,8 +76,6 @@ The CESQL can be used as a [filter dialect][subscriptions-filter-dialect] to fil When used as a filter predicate, the expression output value MUST be a _Boolean_. If the output value is not a _Boolean_ or any errors are returned, the event MUST NOT pass the filter. - - ## 2. Language syntax The grammar of the language is defined using the EBNF Notation from [W3C XML specification][ebnf-xml-spec]. @@ -186,6 +184,14 @@ The type system contains 3 _primitive_ types: 32-bit, twos-complement encoding. - _Boolean_: A boolean value of "true" or "false". +For each of the 3 _primitive_ types there is an associated zero value, which can be thought of as the "default" value for that type: + +| Type | Zero Value | +| --------- | ---------- | +| _String_ | `""` | +| _Integer_ | `0` | +| _Boolean_ | `false` | + The types _URI_, _URI Reference_, and _Timestamp_ ([defined in the CloudEvents specification][ce-type-system]) are represented as _String_. @@ -201,8 +207,13 @@ Unless otherwise specified, every attribute and extension MUST be represented by Through implicit type casting, the user can convert the addressed value instances to _Integer_ and _Boolean_. -When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to `false`. -For example, `true AND (missingAttribute = "")` would evaluate to `false` as the subexpression `missingAttribute = ""` would be false. +When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to the zero value for the return type of the subexpression. +For example, `true AND (missingAttribute = "")` would evaluate to `false` as the subexpression `missingAttribute = ""` would be false, given that the return type for the `=` operator is _Boolean_. +However, the expression `missingattribute * 5` would evaluate to `0` because the return type for the `*` operator is _Integer_. Note that this does not mean that the _value_ of the missing attribute +is set to be the zero value for the type of the missing attribute. Rather, the subexpression with the missingAttribute returns the zero value of the return type of the subexpression. +As an example, `1 / missingAttribute` does not raise a _MathError_ due to the division by zero, instead it returns `0` as that is the zero value for the return type of the subexpression. +In cases where the return type of the subexpression cannot be determined by the CESQL engine, the CESQL engine MUST assume a return type of _Boolean_. In such cases, the return value would +therefore be `false`. ### 3.3. Errors @@ -214,10 +225,24 @@ returned together with the evaluated value of the CESQL expression. Whenever possible, some error checks SHOULD be done at compile time by the expression evaluator, in order to prevent runtime errors. +Every CESQL engine MUST support the following error types: + +- _ParseError_: An error that occurs during parseing +- _MathError_: An error that occurs during the evaluation of a mathematical operation +- _CastError_: An error that occurs during an implicit or explicit type cast +- _MissingFunctionError_: An error that occurs due to a call to a function that has not been registered with the CESQL engine +- _FunctionEvaluationError_: An error that occurs during the evaluation of a function + +Whenever an operator or function encounters an error, it MUST return a return value as well as an error value or values. In cases where there is not an obvious return value for the expression, +the operator or function SHOULD return the zero value for the return type of the operator or function. + ### 3.4. Operators The following tables show the operators that MUST be supported by a CESQL evaluator. When evaluating an operator, -a CESQL engine MUST attempt to cast the operands to the correct types. If this type casting fails, a _Cast_ error will be returned, along with the zero value for the return type of the expression. +a CESQL engine MUST attempt to cast the operands to the correct types. A CESQL evaluator MUST stop evaluating an operator +when it encounters an error value, and it MUST return both the zero value for the return type of the operator as well +as the error. In cases where the return type for the operator is ambiguous, the return type MUST be assumed to be +_Boolean_ and the return value MUST be `false`. All the operators in the following tables are listed in precedence order. @@ -261,6 +286,8 @@ Corresponds to the syntactic rule `binary-operation`: The AND and OR operators MUST be short-circuit evaluated. This means that whenever the left operand of the AND operation evaluates to `false`, the right operand MUST NOT be evaluated. Similarly, whenever the left operand of the OR operation evaluates to `true`, the right operand MUST NOT be evaluated. +String comparisons (using the `=`, `!=` and `<>` operators) MUST be case sensitive. + #### 3.4.3. Like operator | Definition | Semantics | @@ -278,6 +305,8 @@ For example, the pattern `_b*` will accept values `ab`, `abc`, `abcd1` but won't Both `%` and `_` can be escaped with `\`, in order to be matched literally. For example, the pattern `abc\%` will match `abc%` but won't match `abcd`. +The pattern matching MUST be case sensitive. + #### 3.4.4. Exists operator | Definition | Semantics | @@ -291,8 +320,8 @@ assumed valid, e.g. `EXISTS id` MUST always return `true`. | Definition | Semantics | | ------------------------------------------------------ | -------------------------------------------------------------------------- | -| `x IN (y1, y2, ...): Any x Any^n -> Boolean` | Returns `true` if `x` is equal to an element in the _Set_ of `yN` elements | -| `x NOT IN (y1, y2, ...): Any x Any^n -> Boolean` | Same as `NOT (x IN set)` | +| `x IN (y1, y2, ...): Any x Any^n -> Boolean` | Returns `true` if `x` is equal to an element in the _Set_ of `yN` elements | +| `x NOT IN (y1, y2, ...): Any x Any^n -> Boolean` | Same as `NOT (x IN set)` | The matching is done using the same semantics of the equal `=` operator, but using `x` type as the target type for the implicit type casting. @@ -324,7 +353,7 @@ from the ternary function `ABC(x, y, z)` if the n-ary function were called with In order for these definitions to be valid, the n-ary function would need to have at least 4 fixed arguments. -When a function invocation cannot be dispatched, the return value is `false`, and a `_FunctionError` is also returned. +When a function invocation cannot be dispatched, the return value is `false`, and a _MissingFunctionError_ is also returned. The following tables show the built-in functions that MUST be supported by a CESQL evaluator. @@ -349,6 +378,13 @@ The following tables show the built-in functions that MUST be supported by a CES | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `ABS(x): Integer -> Integer` | Returns the absolute value of `x`. If the value of `x` is `-2147483648` (the most negative 32 bit integer value possible), then this returns `2147483647` as well as a math error. | +#### 3.5.3 Function Errors + +As specified in 3.3, in the event of an error a function MUST still return a valid return value for it's defined return type. +A CESQL engine MUST guarantee that all built-in functions comply with this. For user defined functions, if they return one or more errors +and fail to providea valid return value for their return type the CESQL engine MUST return the zero value for the return type of the +function, along with a _FunctionEvaluationError_. + ### 3.6. Evaluation of the expression Operators MUST be evaluated in left to right order, where all operators within a parenthized expression MUST be evaluated before continuing to the right of the parenthized expression. @@ -374,12 +410,12 @@ A CESQL engine MUST apply the following implicit casting rules in order: 1. If it's ambiguous, use the `y` type to search, in the set of ambiguous operators, every definition of the operator using the `y` type as the right parameter type: 1. If such operator definition exists and is unique, cast `x` to the type of the left parameter - 1. Otherwise, raise an error and the cast results are undefined + 1. Otherwise, raise a _CastError_ and the result is `false` 1. If the function is n-ary with `n > 1`: 1. Cast all the arguments to the corresponding parameter types. 1. If the operator is n-ary with `n > 2`: 1. If it's not ambiguous, cast all the operands to the target type. - 1. If it's ambiguous, raise an error and the cast results are undefined. + 1. If it's ambiguous, raise a _CastError_ and the cast result is `false`. For the `IN` operator, a special rule is defined: the left argument MUST be used as the target type to eventually cast the set elements. From 4c387fe8f6e7eee57978032879301f2a89d4acee Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Mon, 27 May 2024 16:11:55 -0400 Subject: [PATCH 05/11] further clarify error handling and zero values Signed-off-by: Calum Murray --- cesql/spec.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index fe9fc2cd..4b1fd423 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -275,8 +275,8 @@ Corresponds to the syntactic rule `binary-operation`: | `x > y: Integer x Integer -> Boolean` | Returns `true` if `x` is greater than `y` | | `x >= y: Integer x Integer -> Boolean` | Returns `true` if `x` is greater than or equal to `y` | | `x * y: Integer x Integer -> Integer` | Returns the product of `x` and `y` | -| `x / y: Integer x Integer -> Integer` | Returns the result of dividing `x` by `y`, rounded towards `0` to obtain an integer. Returns `0` and raises an error if `y = 0` | -| `x % y: Integer x Integer -> Integer` | Returns the remainder of `x` divided by `y`, where the result has the same sign as `x`. Returns `0` and raises an error if `y = 0` | +| `x / y: Integer x Integer -> Integer` | Returns the result of dividing `x` by `y`, rounded towards `0` to obtain an integer. Returns `0` and a _MathError_ if `y = 0` | +| `x % y: Integer x Integer -> Integer` | Returns the remainder of `x` divided by `y`, where the result has the same sign as `x`. Returns `0` and a _MathError_ if `y = 0` | | `x + y: Integer x Integer -> Integer` | Returns the sum of `x` and `y` | | `x - y: Integer x Integer -> Integer` | Returns the difference of `x` and `y` | | `x = y: String x String -> Boolean` | Returns `true` if the values of `x` and `y` are equal | @@ -367,16 +367,16 @@ The following tables show the built-in functions that MUST be supported by a CES | `LOWER(x): String -> String` | Returns `x` in lowercase. | | `UPPER(x): String -> String` | Returns `x` in uppercase. | | `TRIM(x): String -> String` | Returns `x` with leading and trailing whitespaces trimmed. This does not remove any non-whitespace characters, such as control characters. | -| `LEFT(x, y): String x Integer -> String` | Returns a new string with the first `y` characters of `x`, or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and raises an error. | -| `RIGHT(x, y): String x Integer -> String` | Returns a new string with the last `y` characters of `x` or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and raises an error. | -| `SUBSTRING(x, pos): String x Integer x Integer -> String` | Returns the substring of `x` starting from index `pos` (included) up to the end of `x`. Characters' index starts from `1`. If `pos` is negative, the beginning of the substring is `pos` characters from the end of the string. If `pos` is 0, then returns the empty string. Returns the empty string and raises an error if `pos > LENGTH(x) OR pos < -LENGTH(x)`. | -| `SUBSTRING(x, pos, len): String x Integer x Integer -> String` | Returns the substring of `x` starting from index `pos` (included) of length `len`. Characters' index starts from `1`. If `pos` is negative, the beginning of the substring is `pos` characters from the end of the string. If `pos` is 0, then returns the empty string. If `len` is greater than the maximum substring starting at `pos`, then return the maximum substring. Returns the empty string and raises an error if `pos > LENGTH(x) OR pos < -LENGTH(x)` or if `len` is negative. | +| `LEFT(x, y): String x Integer -> String` | Returns a new string with the first `y` characters of `x`, or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and a _FunctionEvaluationError_. | +| `RIGHT(x, y): String x Integer -> String` | Returns a new string with the last `y` characters of `x` or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and a _FunctionEvaluationError_. | +| `SUBSTRING(x, pos): String x Integer x Integer -> String` | Returns the substring of `x` starting from index `pos` (included) up to the end of `x`. Characters' index starts from `1`. If `pos` is negative, the beginning of the substring is `pos` characters from the end of the string. If `pos` is 0, then returns the empty string. Returns the empty string and a _FunctionEvaluationError_ if `pos > LENGTH(x) OR pos < -LENGTH(x)`. | +| `SUBSTRING(x, pos, len): String x Integer x Integer -> String` | Returns the substring of `x` starting from index `pos` (included) of length `len`. Characters' index starts from `1`. If `pos` is negative, the beginning of the substring is `pos` characters from the end of the string. If `pos` is 0, then returns the empty string. If `len` is greater than the maximum substring starting at `pos`, then return the maximum substring. Returns the empty string and a _FunctionEvaluationError_ if `pos > LENGTH(x) OR pos < -LENGTH(x)` or if `len` is negative. | #### 3.5.2. Built-in Math functions | Definition | Semantics | | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `ABS(x): Integer -> Integer` | Returns the absolute value of `x`. If the value of `x` is `-2147483648` (the most negative 32 bit integer value possible), then this returns `2147483647` as well as a math error. | +| `ABS(x): Integer -> Integer` | Returns the absolute value of `x`. If the value of `x` is `-2147483648` (the most negative 32 bit integer value possible), then this returns `2147483647` as well as a _MathError_. | #### 3.5.3 Function Errors @@ -404,7 +404,7 @@ A CESQL engine MUST apply the following implicit casting rules in order: 1. If the operator/function is unary (argument `x`): 1. If it's not ambiguous, cast `x` to the parameter type. - 1. If it's ambiguous, raise an error and the cast result is undefined. + 1. If it's ambiguous, raise a _CastError_ and the cast result is `false`. 1. If the operator is binary (left operand `x` and right operand `y`): 1. If it's not ambiguous, cast `x` and `y` to the corresponding parameter types. 1. If it's ambiguous, use the `y` type to search, in the set of ambiguous operators, every definition of the operator @@ -454,7 +454,7 @@ Because CESQL expressions are total, they always define a return value, included When evaluating an expression, the evaluator can operate in two _modes_, in relation to error handling: -* Fail fast mode: When an error is triggered, the evaluation is interrupted and returns the error, without any result. +* Fail fast mode: When an error is triggered, the evaluation is interrupted and returns the error, with the zero value for the return type of the expression. * Complete evaluation mode: When an error is triggered, the evaluation is continued, and the evaluation of the expression returns both the result and the error(s). Choosing which evaluation mode to adopt and implement depends on the use case. From 315195282735c1ba532a6e52757c681e691db0d8 Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Tue, 28 May 2024 14:20:33 -0400 Subject: [PATCH 06/11] parenthized -> paranthesized Signed-off-by: Calum Murray --- cesql/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cesql/spec.md b/cesql/spec.md index 4b1fd423..fb4a37bd 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -387,7 +387,7 @@ function, along with a _FunctionEvaluationError_. ### 3.6. Evaluation of the expression -Operators MUST be evaluated in left to right order, where all operators within a parenthized expression MUST be evaluated before continuing to the right of the parenthized expression. +Operators MUST be evaluated in left to right order, where all operators within a parenthesized expression MUST be evaluated before continuing to the right of the parenthesized expression. AND and OR operations MUST be short-circuit evaluated. When the left operand of the AND operation evaluates to `false`, the right operand MUST NOT be evaluated. Similarly, when the left operand of the OR operation evalues to `true`, the right operand MUST NOT be evaluated. From 12c7147c4fe8302993a4550239d037950dddeb50 Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Wed, 29 May 2024 15:28:37 -0400 Subject: [PATCH 07/11] more fixes from review by @duglin Signed-off-by: Calum Murray --- cesql/spec.md | 95 +++++++++++++++++++++++++++++---------------------- 1 file changed, 55 insertions(+), 40 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index fb4a37bd..92071c40 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -56,10 +56,11 @@ producer, or in an intermediary, and it can be implemented using any technology The CloudEvents Expression Language assumes the input always includes, but is not limited to, a single valid and type-checked CloudEvent instance. An expression MUST NOT mutate the value of the input CloudEvent instance, nor any of the other input values. The evaluation of an expression observes the concept of [referential -transparency][referential-transparency-wiki]. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_. +transparency][referential-transparency-wiki]. This means that any part of an expression can be replaced with it's output +value and the overall result of the expression will be unchanged. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_. The secondary output of a CESQL expression evaluation is a set of errors which occurred during evaluation. This set MAY be empty, indicating that no -error occurred during execution of the expression. The value used by CESQL engines to represent an empty set of errors is out of the scope of this -specification. +error occurred during execution of the expression. The values used by CESQL engines to represent a set of errors (empty or not) is out of the scope +of this specification. The CloudEvents Expression Language doesn't support the handling of the data field of the CloudEvent instances, due to its polymorphic nature and complexity. Users that need this functionality ought to use other more appropriate tools. @@ -73,8 +74,9 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S The CESQL can be used as a [filter dialect][subscriptions-filter-dialect] to filter on the input values. -When used as a filter predicate, the expression output value MUST be a _Boolean_. If the output value is not a _Boolean_ or any errors are returned, -the event MUST NOT pass the filter. +When used as a filter predicate, the expression output value MUST be a _Boolean_. If the output value is not a _Boolean_, or any errors are returned, +the event MUST NOT pass the filter. Due to the requirement that events MUST NOT pass the filter should any errors occur, when used in a filtering +context the CESQL engine SHOULD follow the "fail fast mode" error handling described in section 4.1. ## 2. Language syntax @@ -203,17 +205,20 @@ be used in the `IN` operator. Each CloudEvent context attribute and extension MUST be addressable from an expression using its identifier, as defined by the spec. For example, using `id` in an expression will address the CloudEvent [id attribute][ce-id-attribute]. -Unless otherwise specified, every attribute and extension MUST be represented by the _String_ type as its initial type. -Through implicit type casting, the user can convert the addressed value instances to _Integer_ and -_Boolean_. +Unless the value of the attribute or extension is one of the 3 primitive CESQL types or otherwise specified, +every attribute and extension MUST be represented by the _String_ type as its initial type. +If the attribute or extension is one of the 3 primitive CESQL types, the attribute or extension +MUST be represented by its initial type. Through implicit type casting, the user can convert the +addressed value instances to the necessary types for an operator or function. -When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to the zero value for the return type of the subexpression. -For example, `true AND (missingAttribute = "")` would evaluate to `false` as the subexpression `missingAttribute = ""` would be false, given that the return type for the `=` operator is _Boolean_. -However, the expression `missingattribute * 5` would evaluate to `0` because the return type for the `*` operator is _Integer_. Note that this does not mean that the _value_ of the missing attribute +When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to the zero value for the return type of the subexpression, +along with a _MissingAttributeError_. +For example, `true AND (missingAttribute = "")` would evaluate to `false, (missingAttributeError)` as the subexpression `missingAttribute = ""` would be false, given that the return type for the `=` operator is _Boolean_. +However, the expression `missingattribute * 5` would evaluate to `0, (missingAttributeError)` because the return type for the `*` operator is _Integer_. Note that this does not mean that the _value_ of the missing attribute is set to be the zero value for the type of the missing attribute. Rather, the subexpression with the missingAttribute returns the zero value of the return type of the subexpression. -As an example, `1 / missingAttribute` does not raise a _MathError_ due to the division by zero, instead it returns `0` as that is the zero value for the return type of the subexpression. +As an example, `1 / missingAttribute` does not raise a _MathError_ due to the division by zero, instead it returns `0, (missingAttributeError)` as that is the zero value for the return type of the subexpression. In cases where the return type of the subexpression cannot be determined by the CESQL engine, the CESQL engine MUST assume a return type of _Boolean_. In such cases, the return value would -therefore be `false`. +therefore be `false, (missingAttributeError)`. ### 3.3. Errors @@ -227,24 +232,21 @@ runtime errors. Every CESQL engine MUST support the following error types: -- _ParseError_: An error that occurs during parseing +- _ParseError_: An error that occurs during parsing - _MathError_: An error that occurs during the evaluation of a mathematical operation - _CastError_: An error that occurs during an implicit or explicit type cast +- _MissingAttributeError_: An error that occurs when addressing an attribute which is not present on the input event - _MissingFunctionError_: An error that occurs due to a call to a function that has not been registered with the CESQL engine - _FunctionEvaluationError_: An error that occurs during the evaluation of a function +- _GenericError_: Any error not specified above -Whenever an operator or function encounters an error, it MUST return a return value as well as an error value or values. In cases where there is not an obvious return value for the expression, +Whenever an operator or function encounters an error, it MUST result in a "return value" as well as one or more "error values". In cases where there is not an obvious "return value" for the expression, the operator or function SHOULD return the zero value for the return type of the operator or function. ### 3.4. Operators The following tables show the operators that MUST be supported by a CESQL evaluator. When evaluating an operator, -a CESQL engine MUST attempt to cast the operands to the correct types. A CESQL evaluator MUST stop evaluating an operator -when it encounters an error value, and it MUST return both the zero value for the return type of the operator as well -as the error. In cases where the return type for the operator is ambiguous, the return type MUST be assumed to be -_Boolean_ and the return value MUST be `false`. - -All the operators in the following tables are listed in precedence order. +a CESQL engine MUST attempt to cast the operands to the specified types. #### 3.4.1. Unary operators @@ -278,16 +280,14 @@ Corresponds to the syntactic rule `binary-operation`: | `x / y: Integer x Integer -> Integer` | Returns the result of dividing `x` by `y`, rounded towards `0` to obtain an integer. Returns `0` and a _MathError_ if `y = 0` | | `x % y: Integer x Integer -> Integer` | Returns the remainder of `x` divided by `y`, where the result has the same sign as `x`. Returns `0` and a _MathError_ if `y = 0` | | `x + y: Integer x Integer -> Integer` | Returns the sum of `x` and `y` | -| `x - y: Integer x Integer -> Integer` | Returns the difference of `x` and `y` | -| `x = y: String x String -> Boolean` | Returns `true` if the values of `x` and `y` are equal | -| `x != y: String x String -> Boolean` | Same as `NOT (x = y)` | -| `x <> y: String x String -> Boolean` | Same as `NOT (x = y)` | +| `x - y: Integer x Integer -> Integer` | Returns the value of `y` subtracted from `x` | +| `x = y: String x String -> Boolean` | Returns `true` if the values of `x` and `y` are equal (case sensitive) | +| `x != y: String x String -> Boolean` | Same as `NOT (x = y)` (case sensitive) | +| `x <> y: String x String -> Boolean` | Same as `NOT (x = y)` (case sensitive) | The AND and OR operators MUST be short-circuit evaluated. This means that whenever the left operand of the AND operation evaluates to `false`, the right operand MUST NOT be evaluated. Similarly, whenever the left operand of the OR operation evaluates to `true`, the right operand MUST NOT be evaluated. -String comparisons (using the `=`, `!=` and `<>` operators) MUST be case sensitive. - #### 3.4.3. Like operator | Definition | Semantics | @@ -299,14 +299,13 @@ The pattern of the `LIKE` operator can contain: - `%` represents zero, one, or multiple characters - `_` represents a single character +- Any other character, representing exactly that character (case sensitive) -For example, the pattern `_b*` will accept values `ab`, `abc`, `abcd1` but won't accept values `b` or `acd`. +For example, the pattern `_b*` will accept values `ab`, `abc`, `abcd1` but won't accept values `b` or `acd` or `aBc`. Both `%` and `_` can be escaped with `\`, in order to be matched literally. For example, the pattern `abc\%` will match `abc%` but won't match `abcd`. -The pattern matching MUST be case sensitive. - #### 3.4.4. Exists operator | Definition | Semantics | @@ -318,10 +317,10 @@ assumed valid, e.g. `EXISTS id` MUST always return `true`. #### 3.4.5. In operator -| Definition | Semantics | -| ------------------------------------------------------ | -------------------------------------------------------------------------- | -| `x IN (y1, y2, ...): Any x Any^n -> Boolean` | Returns `true` if `x` is equal to an element in the _Set_ of `yN` elements | -| `x NOT IN (y1, y2, ...): Any x Any^n -> Boolean` | Same as `NOT (x IN set)` | +| Definition | Semantics | +| ------------------------------------------------------- | -------------------------------------------------------------------------- | +| `x IN (y1, y2, ...): Any x Any^n -> Boolean`, n > 0 | Returns `true` if `x` is equal to an element in the _Set_ of `yN` elements | +| `x NOT IN (y1, y2, ...): Any x Any^n -> Boolean`, n > 0 | Same as `NOT (x IN set)` | The matching is done using the same semantics of the equal `=` operator, but using `x` type as the target type for the implicit type casting. @@ -335,7 +334,8 @@ A function is identified by its name, its parameters and the return value. A fun CESQL allows overloading, that is the engine MUST be able to distinguish between two functions defined with the same name but different arity. Because of implicit casting, no functions with the same name and same arity but different types are allowed. -An overload on a variadic function is allowed only if the number of initial fixed arguments is greater than the maximum arity for that particular function name. Only one variadic overload is allowed. +A function name MAY have at most one variadic overload definition and only if the number of initial fixed arguments is greater than the maximum arity +of all other function definitions for that function name. For example, the following set of definitions are valid and will all be allowed by the rules: @@ -362,11 +362,11 @@ The following tables show the built-in functions that MUST be supported by a CES | Definition | Semantics | | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `LENGTH(x): String -> Integer` | Returns the character length of the String `x`. | -| `CONCAT(x1, x2, ...): String^n -> String` | Returns the concatenation of `x1` up to `xN`. | -| `CONCAT_WS(delimiter, x1, x2, ...): String x String^n -> String` | Returns the concatenation of `x1` up to `xN`, using the `delimiter` between each string, but not before `x1` or after `xN`. | +| `CONCAT(x1, x2, ...): String^n -> String`, n >= 0 | Returns the concatenation of `x1` up to `xN`. | +| `CONCAT_WS(delimiter, x1, x2, ...): String x String^n -> String`, n >= 0 | Returns the concatenation of `x1` up to `xN`, using the `delimiter` between each string, but not before `x1` or after `xN`. | | `LOWER(x): String -> String` | Returns `x` in lowercase. | | `UPPER(x): String -> String` | Returns `x` in uppercase. | -| `TRIM(x): String -> String` | Returns `x` with leading and trailing whitespaces trimmed. This does not remove any non-whitespace characters, such as control characters. | +| `TRIM(x): String -> String` |Returns `x` with leading and trailing whitespaces (as defined by unicode) trimmed. This does not remove any characters which are not unicode whitespace characters, such as control characters. | | `LEFT(x, y): String x Integer -> String` | Returns a new string with the first `y` characters of `x`, or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and a _FunctionEvaluationError_. | | `RIGHT(x, y): String x Integer -> String` | Returns a new string with the last `y` characters of `x` or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and a _FunctionEvaluationError_. | | `SUBSTRING(x, pos): String x Integer x Integer -> String` | Returns the substring of `x` starting from index `pos` (included) up to the end of `x`. Characters' index starts from `1`. If `pos` is negative, the beginning of the substring is `pos` characters from the end of the string. If `pos` is 0, then returns the empty string. Returns the empty string and a _FunctionEvaluationError_ if `pos > LENGTH(x) OR pos < -LENGTH(x)`. | @@ -382,12 +382,27 @@ The following tables show the built-in functions that MUST be supported by a CES As specified in 3.3, in the event of an error a function MUST still return a valid return value for it's defined return type. A CESQL engine MUST guarantee that all built-in functions comply with this. For user defined functions, if they return one or more errors -and fail to providea valid return value for their return type the CESQL engine MUST return the zero value for the return type of the +and fail to provide a valid return value for their return type the CESQL engine MUST return the zero value for the return type of the function, along with a _FunctionEvaluationError_. ### 3.6. Evaluation of the expression -Operators MUST be evaluated in left to right order, where all operators within a parenthesized expression MUST be evaluated before continuing to the right of the parenthesized expression. +Operators and functions MUST be evaluated in order of precedence, and MUST be evaluated left to right when the precedence is equal. The order of precedence is as follows: + +1. Function invocations +1. Unary operators + 1. NOT unary operator + 1. `-` unary operator +1. LIKE operator +1. EXISTS operator +1. IN operator +1. Binary operators + 1. `*`, `/`, `%` binary operators + 1. `+`, `-` binary operators + 1. `=`, `!=`, `<>`, `>=`, `<=`, `>`, `<` binary operators + 1. AND, OR, XOR binary operators +1. Subexpressions +1. Attributes and literal values AND and OR operations MUST be short-circuit evaluated. When the left operand of the AND operation evaluates to `false`, the right operand MUST NOT be evaluated. Similarly, when the left operand of the OR operation evalues to `true`, the right operand MUST NOT be evaluated. From da5f5749a23f88cc04531f9ea95d40cdbe6d432d Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Wed, 29 May 2024 15:53:23 -0400 Subject: [PATCH 08/11] fix the should verification problem Signed-off-by: Calum Murray --- cesql/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cesql/spec.md b/cesql/spec.md index 92071c40..42b676c2 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -75,7 +75,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S The CESQL can be used as a [filter dialect][subscriptions-filter-dialect] to filter on the input values. When used as a filter predicate, the expression output value MUST be a _Boolean_. If the output value is not a _Boolean_, or any errors are returned, -the event MUST NOT pass the filter. Due to the requirement that events MUST NOT pass the filter should any errors occur, when used in a filtering +the event MUST NOT pass the filter. Due to the requirement that events MUST NOT pass the filter if any errors occur, when used in a filtering context the CESQL engine SHOULD follow the "fail fast mode" error handling described in section 4.1. ## 2. Language syntax From 864dff4d8866f5d3be1389108284b1507037dcf6 Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Thu, 30 May 2024 10:46:52 -0400 Subject: [PATCH 09/11] clarify default type of attributes Signed-off-by: Calum Murray --- cesql/spec.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index 42b676c2..839add44 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -205,11 +205,8 @@ be used in the `IN` operator. Each CloudEvent context attribute and extension MUST be addressable from an expression using its identifier, as defined by the spec. For example, using `id` in an expression will address the CloudEvent [id attribute][ce-id-attribute]. -Unless the value of the attribute or extension is one of the 3 primitive CESQL types or otherwise specified, -every attribute and extension MUST be represented by the _String_ type as its initial type. -If the attribute or extension is one of the 3 primitive CESQL types, the attribute or extension -MUST be represented by its initial type. Through implicit type casting, the user can convert the -addressed value instances to the necessary types for an operator or function. +If the value of the attribute or extension is not one of the primitive CESQL types, it MUST be represented +by the _String_ type. When addressing an attribute not included in the input event, the subexpression referencing the missing attribute MUST evaluate to the zero value for the return type of the subexpression, along with a _MissingAttributeError_. From 8b871e44bcc8d449e4de74ee7b93f7e1668d41aa Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Thu, 30 May 2024 11:49:24 -0400 Subject: [PATCH 10/11] fix indentation of operator precedence Signed-off-by: Calum Murray --- cesql/spec.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index 839add44..f1c710a4 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -388,16 +388,16 @@ Operators and functions MUST be evaluated in order of precedence, and MUST be ev 1. Function invocations 1. Unary operators - 1. NOT unary operator - 1. `-` unary operator + 1. NOT unary operator + 1. `-` unary operator 1. LIKE operator 1. EXISTS operator 1. IN operator 1. Binary operators - 1. `*`, `/`, `%` binary operators - 1. `+`, `-` binary operators - 1. `=`, `!=`, `<>`, `>=`, `<=`, `>`, `<` binary operators - 1. AND, OR, XOR binary operators + 1. `*`, `/`, `%` binary operators + 1. `+`, `-` binary operators + 1. `=`, `!=`, `<>`, `>=`, `<=`, `>`, `<` binary operators + 1. AND, OR, XOR binary operators 1. Subexpressions 1. Attributes and literal values From 24619e5d33a7836fd436b5019f7ccd5816b7c3e2 Mon Sep 17 00:00:00 2001 From: Calum Murray Date: Thu, 30 May 2024 12:14:36 -0400 Subject: [PATCH 11/11] it's -> its Signed-off-by: Calum Murray --- cesql/spec.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cesql/spec.md b/cesql/spec.md index f1c710a4..762135b6 100644 --- a/cesql/spec.md +++ b/cesql/spec.md @@ -56,7 +56,7 @@ producer, or in an intermediary, and it can be implemented using any technology The CloudEvents Expression Language assumes the input always includes, but is not limited to, a single valid and type-checked CloudEvent instance. An expression MUST NOT mutate the value of the input CloudEvent instance, nor any of the other input values. The evaluation of an expression observes the concept of [referential -transparency][referential-transparency-wiki]. This means that any part of an expression can be replaced with it's output +transparency][referential-transparency-wiki]. This means that any part of an expression can be replaced with its output value and the overall result of the expression will be unchanged. The primary output of a CESQL expression evaluation is always a _boolean_, an _integer_ or a _string_. The secondary output of a CESQL expression evaluation is a set of errors which occurred during evaluation. This set MAY be empty, indicating that no error occurred during execution of the expression. The values used by CESQL engines to represent a set of errors (empty or not) is out of the scope @@ -377,7 +377,7 @@ The following tables show the built-in functions that MUST be supported by a CES #### 3.5.3 Function Errors -As specified in 3.3, in the event of an error a function MUST still return a valid return value for it's defined return type. +As specified in 3.3, in the event of an error a function MUST still return a valid return value for its defined return type. A CESQL engine MUST guarantee that all built-in functions comply with this. For user defined functions, if they return one or more errors and fail to provide a valid return value for their return type the CESQL engine MUST return the zero value for the return type of the function, along with a _FunctionEvaluationError_.