normalization.tex

% !TEX root = ./main.tex
\section{Case Study: Normalization}\label{sec:norm}


To illustrate how \gcoql can be used to reason about query transformations, we study the {\em normalization} process proposed by Hartig and Pérez (\HP)~\cite{gqlph}, which is fundamental for the complexity results they prove.
%
These results are based on two premises: {\em a)} every query can be normalized to a semantically-equivalent query; {\em b)} on such queries, one can use a simplified evaluation function. For normalization,  \HP provide a set of equivalence rules to establish the existence of a normal form. However they do not provide an algorithm for query normalization, let alone prove it correct and semantics preserving. Likewise, they do not prove the equivalence of the simplified semantics when applied to normalized queries.

In this section, we show how to define the property of being in \textit{normal form} in \gcoql, define a normalization procedure, and prove it both correct and semantics preserving. Finally, we define the simplified semantics given by \HP, and prove it equivalent to the original semantics (\S~\ref{subsec:semantics}) when applied to normalized queries.

% \et{this paragraph can be skipped for gaining space}
% It is worth mentioning that most of our formalization effort was devoted to defining and establishing the correctness of this normalization procedure. In terms of code, definitions are coded in approximately $350$ lines, while proofs amount to around $1,200$ lines. The definition of the normalization procedure and the proofs that it produces normalized queries can be found in the files \texttt{QueryNormalForm.v} and \texttt{QueryNormalFormLemmas.v}. Meanwhile, the proofs about semantic preservation (point {\em a)} above) and semantic equivalence (point {\em b)} above) can be found in the file \texttt{QuerySemanticsLemmas.v}.

\subsection{Defining Normal Forms}

The notion of \textit{normal form} introduced by Hartig and Pérez consists of the conjunction of two conditions: being in \textit{ground-typed normal form} and being \textit{non-redundant}.
%\HP refers to the former as being in \textit{ground-typed normal form}. %\et{remove/why is that confusing?} We believe this naming is confusing so we decide to rename it to the previously described property.

%\et{skip/move} Similarly to what is described in Section~\ref{subsec:query}, the normalization process requires information on the type in context where the queries might be defined. This is crucial as it guides the process on how queries are transformed.

% Throuhgout our development, we noticed that this definition given by \cite{gqlph} was too general when proving correctness of our normalization procedure. In particular, the definition states that the subqueries of a field selection can be either fields or fragments. This means that if there are two field selections with the same response name, one may have subqueries consisting of fields, while the other contains only inline fragments. This would cause issues when trying to remove redundancies in queries because one could not directly establish if the resulting queries satisfied the property.

\paragraph{Ground-typed normal form.}
%\et{should it be "groundedness"? (being bounded is boundedness, being rounded is roundedness, etc.)}
Informally, a query is in ground-typed normal form, or {\em grounded}
for short, if %
any two selections within the same selection set are of the same kind,
either field selections or inline fragments. 
% it is completely specified down to object types.
Consider for instance the queries below:
%\td{Align to top or as it is? Aligning to top leaves a huge empty space :/}

\smallskip

\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Not grounded query
query {
  movie[id:2000] {
    title
    |$\ldots$| on Fiction {
	  year
    }
  }
}
\end{minted}
\end{minipage}%
\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Grounded query
query {
  movie[id:2000] {
    |$\ldots$| on Animation {
      title
    }
    |$\ldots$| on Fiction {
      title
      year
} } }
\end{minted} 
\end{minipage}

\medskip

\noindent The query on the left is not grounded because the selection
set under field \texttt{movie} contains two selections of different
kind. The query on the right represents a grounded version thereof. % on the right is
% grounded because 
% it specifies the (concrete) object types \animation and \fiction on which it requests the information.
%\et{verbose/repetitive} The main idea is that if a query is performed over an object type then the query should only be composed of field selections. In contrast, if the query is over an abstract type, then it should only be composed of inline fragments that specify the selections down to the object subtypes. In the former case, it does not make sense to use fragments to further specify a query because it is not possible to be more specific when querying an object. Meanwhile, in the latter case the query should clearly state what is being requested from each concrete subtype. 

% \td{Changing this definition}
\begin{definition}[\cite{gqlph}]
A \gql selection set $\sset$ is in \textit{ground-typed normal form}
if it can be generated by the following grammar, where \texttt{t} is an object type: %where \texttt{ty} is the type in scope.
\begin{displaymath}\small
	\begin{array}{rcl@{\qquad}rcl}
	\sset & ::= & \chi \ldots \chi & \chi & ::= & \fld \\
	& | & \psi \ldots \psi & & | & \afld\\
	& & & & | & \nfld{\sset}\\
	\psi & ::= & \ifrag{t}{\chi \ldots \chi} & & | & \anfld{\sset}	
	\end{array}
\end{displaymath}

\noindent A \gql query $\varphi$ is in \textit{ground-typed normal form} if its selection set is in ground-typed normal form.

	\end{definition}
\iffalse
\begin{itemize}
	\item If it is a field then its subselections are either all fields or all inline fragments
	\item If it is an inline fragment, then its type condition is an object type and its subselections are only fields, and
	\item Its subselections are in ground-typed normal form.
\end{itemize}
\end{definition}
\fi
% \et{there is a disjunction and a conjunction, but it's not explained how they are combined. I guess it's "(a or b) and c", but the writing is ambiguous}

% \begin{definition}
% A \gql query $\varphi$ is in \textit{ground-typed normal form} if its selection set is in ground-typed normal form.
%  %\et{This reads like it should be both, but I guess it's either one. Why isn't it simpy "if all the selections in the selection set are in ground-type normal form"?}
% \end{definition}

\iffalse
We formalize the latter with the following definition in Coq.
\begin{minted}[bgcolor=coqbg, escapeinside=||, mathescape=true]{coq}
Definition is_a_grounded_typed_nf_query 
  (s : wfGraphQLSchema) (|$\query$| : query) :=
    are_in_ground_typed_nf s |$\query$|.(selection_set).
  \end{minted}
  \fi 
%This definition of groundedness differs slightly from that of \HP \et{how is it different? why?}\td{See response in Zulip}\et{you can't just say that "it differs slightly" to the reader---you're  saying too much or too little. You can be approximate here and then clarify in the discussion section, or mention it here and saying that you'll expand on this in that later section}\et{I'd suggest to give you 2-3 lines to mention the main differences and put a forward ref to \S5)} nevertheless, we prove that our definition still implies being in ground-typed normal form \et{if you prove that it means you have also formalized the exact def from HP? so?}.

\iffalse
\begin{minted}[bgcolor=coqbg]{coq}
Lemma are_grounded_in_ground_typed_nf (s : wfGraphQLSchema)
                                      (type_in_scope : Name)
                                      (queries : seq Query) :
        are_grounded s type_in_scope queries ->
        are_in_ground_typed_nf s queries.
\end{minted}
\fi

%\et{in latter snippets, you use $\varphi$ for the queries variable.}

%\et{why this plural formulation?}\et{you must be kidding in your response...}\et{the question is why you don't have a lemma for a single query (and then if you need to check a sequence, you map that singular property over the sequence)}\et{notice that 4.1 talks about "*a* query is grounded"}

%\et{type-in-scope is a (too) long variable name - you can use use "ts" throughout, just explaining clearly when you first use it}
%\et{watch out the argument was called type-in-scope while its used as ty.}

%\et{why do we care about this lemma instead of the definition of is-grounded?}

\paragraph{Non-redundancy.} Informally, the notion of non-redundancy
further constraints that of groundedness by forbidding queries that induce a repeated evaluation of selections. For example, consider the two queries below:


% Informally, a query is non-redundant if it
% contains neither repeated field selections under the same type nor
% multiple inline fragments with the same type condition.

% repeated  field selections under the same type in scope and 
% multiple inline fragments with the same type condition under the same
% type in scope.

% ensuring that a naive evaluation of queries yield non-redundant
% responses. 

% orbidding repeated selections (allowing this way a simpler
%  evaluation process).


% The notion of non-redundancy further constraints that of ground-typed
% by forbidding repeated selections (allowing this way a simpler
% evaluation process).


% Informally, a query is non-redundant if its naive evaluation (\ie
% without the collecting, merging and filtering process) yields
% a non-redundant response.

% Informally, 
% % \et{bad phrasing: what is "non-redundant" is a "query", not "whether there are no queries"} the notion of non-redundancy refers to whether there are no queries that may produce repeated results. 
% a non-redundant query is a query that does not produce repeated results.

%\td{Should this be another figure?}\et{do the same as you do for groundedness -- better to just inline if there are no references from other parts of the paper}\et{use the same trick with comments to indicate grounded/ungrounded}

\smallskip

\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Redundant query
query {
  movie[id:2000] {
    |$\ldots$| on Fiction {
      title
      title
    }
    |$\ldots$| on Fiction {
      year
} } } }
\end{minted}
\end{minipage}%
\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Non-redundant query
query {
  movie[id:2000] {
    |$\ldots$| on Fiction {
      title
      year
    }
  }
}
\end{minted} 
\end{minipage}
%\et{fix layout}

\bigskip

\noindent The query on the left is redundant for two reasons: it requests the field
\texttt{title} twice, and it contains multiple inline fragments with
the same type condition. % If no collection and merger
% of fields is performed during the evaluation, this will produce two (or three) values with the key \texttt{title}. 
Conversely, the query on the right is semantically equivalent and non-redundant. % because it requests information about each type only once; only one fragment will actually be executed at a time, depending on the concrete object value that is used to evaluate the query.
% It is important to notice that, even though it requests the field \texttt{name} in both, this is considered not redundant because 

% \td{Rewrote this definition a little bit}
\begin{definition}[Adapted from~\cite{gqlph}]
A \gql selection set $\sset$ is \textit{non-redundant} if:
\begin{itemize}
    \item there is at most one field selection with a given response name, 
    
    \item at most one inline fragment with a given type condition, and
    
    \item subselections are non-redundant.
    % \et{ill-typed: the property isn't defined on an arbitrary subselection}
    \end{itemize}

\noindent    A \gql query $\varphi$ is \textit{non-redundant} if its selection set is non-redundant.

\end{definition}

% \begin{definition}
% A \gql query $\varphi$ is \textit{non-redundant} if its selection set is non-redundant.
% \end{definition}

%\et{why don't the GTNF definition and the NR definition follow the same pattern?}

% \td{I removed the paragraph about non-redundancy assuming groundedness. I feel it just adds noise.}
%Our definition of non-redundancy assumes that the queries are grounded \et{maybe the Coq definition, but not the definition above!}, much like \HP, mainly to simplify the implementation. The difficulty arises from using inline fragments and comparing their contents appropriately. This difficulty is further increased by the fact that the \spec currently allows using fragments that can possibly span over several unrelated types\footnote{https://graphql.github.io/graphql-spec/June2018/\#sec-Fragment-spread-is-possible}\footnote{https://github.com/graphql/graphql-spec/issues/367}\footnote{Example of inline fragments spanning to unrelated types - https://tinyurl.com/y4uxz3gu}\td{Should we expand on this somewhere?}.

\subsection{Defining Normalization}\label{subsec:normalization}

\begin{figure*}[t]
  \small
  \begin{flushright}
  \fbox{$\tnorm{\cdot} ~\colon~ \textit{seq Selection} \rightarrow  \textit{seq Selection}$}    
  \end{flushright}  
  \centering
  \vspace{-1ex}
    \begin{align*}
    % Empty
    &(1)\! \! \!& \tnorm{\cdot} &= [\cdot] \\
    % SingleField
    &(2)\! \! \!& \tnorm{\fld :: \sset} &= 
       	\fld :: \tnorm{\filter{\sset}{\fkey}} \\      
        %
    % Nested field
    &(3)\! \! \!& \tnorm{\nfld{\overline{\beta}} :: \sset} &=
    \begin{cases}
        \nfld{\norm{\mathit{\ftype{ts}{\fkey}}}{\overline{\beta} \mdoubleplus \mathit{merge (\collect{ts}{\fkey}{\sset})}}} :: \tnorm{\filter{\sset}{\fkey}} \hfill \text{ if } 
	            \mathit{is\_object\_type (\ftype{ts}{\fkey})}\\
        %
    \nfld{\mathit{map}\; (\lambda t_{i}.\; \ifrag{$t_{i}$}{\norm{t_{i}}{\overline{\beta} \mdoubleplus \mathit{merge (\collect{ts}{\fkey}{\sset})}}})\; \mathit{get\_possible\_types (ts)}} 
    		:: \tnorm{\filter{\sset}{\fkey}} \\
		\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\; \text{ otherwise } 
    \end{cases}\\
    %inline fragment
    &(4)\! \! \!& \tnorm{\ifrag{t}{\overline{\beta}} :: \sset} &= 
    \begin{cases}
    \tnorm{\overline{\beta} \mdoubleplus \sset} & 
    \text{if }\mathit{fragment\_type\_applies(ts, \texttt{t})}\\
    \tnorm{\sset} & \text{otherwise}
    \end{cases}
    \end{align*}
    \caption{Normalization procedure for \gql selections. 
	% \td{Forgot to add the inline fragments in 2nd case of eq. 3 -- I reformat it but not sure if it is aligned o not}\et{? the 3rd line}
    	\newline{\footnotesize 
      $\mathit{is\_object\_type}$ checks whether the given type is an object type in the schema. $\mathit{get\_possible\_types}$ get all object subtypes of the given type.}
    }
\label{fig:normalize}
\end{figure*}

% \begin{figure}[t]
% \begin{minipage}[t]{.25\textwidth}
% \begin{minted}[escapeinside=||, mathescape=true]{js}
% // Query not in normal form
% query {
%   |$\ldots$| on Query {
%     goodboi {
%       name
%   }
%   }
%   goodboi {
%     name
% } }
% \end{minted}
% \end{minipage}%
% \begin{minipage}[t]{.25\textwidth}
% \begin{minted}[escapeinside=||, mathescape=true]{js}
% // Normalized query
% query {
%   goodboi {
%     |$\ldots$| on Dog {
%     name
%   }
%   |$\ldots$| on Pig {
%     name
% } } }
% \end{minted} 
% \end{minipage}
% \caption{Query normalization example.}
% \label{fig:norm-example}
% \end{figure}

% The normalization procedure transforms a query into a normalized query
% by grounding and removing redundancies in its selection set. % The transformation can be understood as a form of abstract interpretation, which evaluates selections using only static information about the type in scope.


The normalization procedure of selection sets is described in
Figure~\ref{fig:normalize}. It is parametrized by a type in scope and
acts as follows. Whenever a field selection is encountered (2-3),
normalization removes any field that shares the same response
name. This step ensures the non-redundancy of the resulting selection
set.  However, in order not to lose information during filtering,
normalization collects fields with the same response name and merges
their subselections in the first occurrence (3). This also serves to
preserve the order of selections.

Finally, to obtain a selection in ground-typed normal form,
normalization performs two separate steps, depending on whether the
selection is a field (3) or an inline fragment (4). For the former,
the process either directly normalizes the subselections or wraps them
with inline fragments, based on the field's type.  For the latter, the
process either removes fragments or lifts their subselections,
depending on whether they apply to the given type in scope.

To normalize a query, we simply normalize its selection set, setting
the initial type in scope to the  query root type of the underlying
schema.

%The complete normalization process is actually composed of two separate functions; \texttt{normalize}, which performs all the actual work but under the assumption that the type in context is an object type, and \texttt{normalize\_queries}, which makes no assumption but only pipes the work\et{?}\td{Don't know what else to add. One function does all the work, the other one just pipes the work the other, based on the type in context} \et{my question is because the expression "pipe the work" is not clear/correct} to the former. The main process can be informally described as consisting of two subprocesses that deal with the two aforementioned properties.\et{which? (merging, mentioned below, was not mentioned above)}
\iffalse
\begin{itemize}
    \item Grounding: Selections are either wrapped with inline fragments or lifted from an inline fragment.

    \item Merging: Fields with the same response name have their subqueries merged into a single selection.
    
    %Whenever a field is encountered, the procedure tries to find all fields with the same response name and merge their subqueries. It then proceeds to remove them from the list to ensure \textit{non-redundancy}. Comparing it to the the semantics, this is equivalent to the case when we evaluate a field and collect similar ones.
    
    %Since it is assumed that the type in context is an Object type, it will try to transform the query such that there are only fields left. This means it will try to get rid of inline fragments and lift their subqueries as much as possible. Much like if we were standing on a node in the graph, we only evaluate fragments and subqueries that make sense for that node's type (which is an Object type). In the case of fields, it will first check on its return type. If it is an abstract type, then it will create a cover of all possible concrete subtypes of the abstract type, by wrapping the subqueries with inline fragments. Otherwise, it will proceed recursively. Once again, this is like finding the neighbors of a node. Since a priori it doesn't know the neighboring nodes that may be encountered, the procedure anticipates all possible scenarios.
\end{itemize}
\fi

%The first subprocess\et{name it} tackles\et{you "tackle" too much -- it's a vague term, be precise. What is "tackling the groundedness"? is it "transforming a query to an equivalent grounded one"? then say so. You can look for "tackle" in the whole paper and do the exercise of finding the thing tackle stands for.} the groundedness of queries, and corresponds to the \texttt{if-else} block \et{lines?} as well as the mapping in line 19 of Figure~\ref{fig:normalize}. The grounding is done by either wrapping selections with inline fragments, whenever the type in context is an abstract type, or by lifting nested selections from inside fragments, whenever their type conditions are compatible with the object type in context\td{This is related to the fact that \gql allows invalid fragments}. This process can be illustrated with the example for Figure~\ref{fig:grounded}. \et{there is no space for repetition: this is just saying that normalizing grounds, which we know already. So what is worth saying? (go the point, don't repeat)} Once again, the query to the left is not grounded since it requests the name of an animal without being specific down to the object types. The normalization process will then produce the query to the right, by wrapping the selection using fragments with the subtypes of the \texttt{Animal} type, namely \texttt{Dog} and \texttt{Pig}.

\iffalse
\td{This is the same example as above, so it can be reused}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Not grounded query
query {
    goodboi {
        name
    }
}
// Normalized query
query {
    goodboi {
        |$\ldots$| on Dog {
	    name
	}
	|$\ldots$| on Pig {
	    name
	}
    }	
}
\end{minted} 
\fi


%\td{Unnecessary?} \et{yeah, I think all this is taking too much space for the actual technical content and insights it provides. Think of a more concise way to describe normalization, ie. by guiding the reader walk through FIgure 9}\et{examples are not necessary, you illustrated before when introducing the notions} \et{for now, I'm skipping the rest of 4.2, jumping to 4.3}
%Next, to tackle inline fragments that are unnecessary or that specialize selections in the context of an object type, let us consider the following queries. The first query includes two fragments that are not necessary; a fragment with type condition \texttt{Query} and one with type condition \texttt{Animal}. The procedure eliminates both fragments, by lifting the subqueries.
% \et{this paragraph can be skipped for gaining space}

\paragraph{Example.}  We illustrate the normalization procedure with
the example below.

\smallskip

\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Query not in normal form
query {
  |$\ldots$| on Query {
    movie[id:2000] {
      title
    }
  }
  movie[id:2000] {
    title
} }
\end{minted}
\end{minipage}%
\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Normalized query
query {
  movie[id:2000] {
    |$\ldots$| on Animation {
      title
    }
    |$\ldots$| on Fiction {
      title
    }
} }
\end{minted} 
\end{minipage}

\medskip

\noindent On the left we have the original query and on the right its normalized
version. Subselections from fragment with type condition \texttt{Query} are lifted and the multiple occurrences of field \texttt{movie} are merged into a single occurrence. 
Since the type of the field \texttt{movie} is the abstract type \texttt{Movie}, the subselections are wrapped in fragments for each concrete object subtype, namely \animation and \fiction. The normalized query is both non-redundant and in ground-typed normal form. 


%When it comes to removing redundancies in a query, there is a second subprocess that handles it, which roughly correspond to lines 7-9 in Figure~\ref{fig:normalize}. The process collects fields that share the same response name, merging their subqueries into a single selection. For example, the first query below is redundant since it requests the same \texttt{goodboi} field twice, and the subqueries in both cases also contain repeated \texttt{name} selections. The normalization process then merges the selections with the same response name, leaving only one occurrence of each case.

\iffalse
\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}
// Redundant query
query {
    goodboi {
        name
    }
    goodboi { 
 	name
    } 
}
\end{minted}
\end{minipage}%
\begin{minipage}[t]{.25\textwidth}
\begin{minted}[escapeinside=||, mathescape=true]{js}

// Normalized query
query {
    goodboi {
        |$\ldots$| on Dog {
	    name
	}
	|$\ldots$| on Pig {
	    name
	}
    }	
}
\end{minted} 
\end{minipage}
\fi

% With this definition, we define a second one, which makes no assumption on the type in context. This procedure only checks what kind of type it receives and either pipes the job to the previous one, or covers the queries with the possible concrete subtypes (and then pipes the work to the previous definition).

\iffalse
\begin{minted}[bgcolor=coqbg]{coq}
 Definition normalize_queries (s : wfGraphQLSchema)
                             (type_in_scope : Name)
                             (queries : seq Query) :
                                         seq Query :=
    if is_object_type s type_in_scope then
        normalize s type_in_scope queries
    else
        [seq on t { normalize s t queries } |
            t <- get_possible_types s type_in_scope].

\end{minted}
\fi

%As a final note, it is worth mentioning that the subprocesses mentioned are not defined as separate functions, but occur interleaved in the normalization function. As a consequence, the definition is highly\td{?} non-structural but can be easily expressed using the Equations library. The  similarity between the normalization function and the semantics also eases reasoning about the preservation of the semantics. \td{Mention something about how we first split the definition into these 2 subprocesses but ended up being harder to reason about?}


% Finally, with these properties and definitions we prove the premises proposed by \HP. We leave that discussion, about \HP's approach and ours, to Section~\ref{subsec:discussion}.

\subsection{Correctness and Semantic Preservation}
We now establish two fundamental results about the normalization
procedure. The first result states that the normalization procedure
is {\em correct} in that it does indeed produce queries in normal
form.
%
\begin{minted}[escapeinside=||,mathescape=true,bgcolor=coqbg]{coq}
Lemma normalized_selections_are_in_nf
  (s : wfGraphQLSchema) (ts : Name) (|$\sel$|s : seq Selection) :
    are_in_normal_form s (normalize_selections s ts |$\sel$|s).

Corollary normalized_query_is_in_nf 
  (s : wfGraphQLSchema) (|$\query$| : query) :
    is_in_normal_form s (normalize s |$\query$|).
\end{minted}
%
\noindent The proof of the main lemma proceeds by well-founded
induction over the size of the selection set and relies on some
auxiliary lemmas about subtyping. Each of the two conditions
(groundedness and non-redundancy) are established separately.

The second result states that the normalization procedure is {\em
  semantics-preserving} in that a normalized query has the same
evaluation semantics as the original query from which it was
derived. Formally, this requires proving that evaluating a query and
its normalized version from the root node of a graph both yield the
same result. To be able to establish this equivalence, we must however
consider a generalized statement that quantifies over every node of the
graph.

\begin{minted}[escapeinside=||,mathescape=true,bgcolor=coqbg]{coq}
Lemma normalize_selections_preserves_semantics
  (s : wfGraphQLSchema) (g : conformedGraph s) 
  (|$\sel$|s : seq Selection) (u : node) :
    u |\textbackslash|in g.(nodes) ->
    execute_selection_set s |\vscalar| g coerce
      u (normalize_selections s u.(ntype) |$\sel$s|) =
    execute_selection_set s |\vscalar| g coerce
      u |$\sel$s|.

Corollary normalize_preserves_query_semantics 
  (s : wfGraphQLSchema) (g : conformedGraph s)
  (|$\query$| : query) :
    execute_query s |\vscalar| g coerce (normalize s |$\query$|) =
    execute_query s |\vscalar| g coerce |$\query$|.
\end{minted}

\noindent Similarly to the previous result, the proof of the main lemma above
proceeds by well-founded induction over the size of the selection
set.

% %
% %Of course, defining a normalization procedure in itself does not
% % ensure anything.
% We now establish that normalization is {\em a)} {\em correct} in that it does indeed produce queries in normal form, for any given query, and {\em b)} {\em semantics-preserving} in that a normalized query has the same evaluation semantics as the original query from which it was derived.
% Recall that both of these results are only assumed---not proven---by \HP.
%As described initially in this paper \et{you recalled it in the header of 4, no need to say it again}, \HP base the complexity results over \gql queries on two premises; queries can be normalized, preserving their semantics, and there is an equivalent simplified function to evaluate queries in normal form. 
% We now prove that any query can be normalized to an equivalent query, by 
% proving the correctness of our normalization procedure. 

\begin{figure*}[t]
  \small
  \begin{flushright}
  \fbox{$\seval{\cdot}{u} ~\colon~ \textit{seq Selection} \rightarrow  \textit{GraphQLResponse}$}    
  \end{flushright}
  \centering
    \vspace{-1ex}
    \begin{align*}
    % Empty\sset
    & (1) & \seval{\cdot}{u} &~=~ [\cdot] \\
    % SingleField
    & (2) & \sevalu{\fld :: \sset} &~=~ 
    \resp{(\mathit{complete\_value(ftype(u.type, \fkey), u.property(\fld))})} :: \sevalu{\sset} \\
    & (3) & % Nested field
    \sevalu{\nfld{\overline{\beta}} :: \sset} &~=~
    \begin{cases}
      \resp{\texttt{[} \mathit{map}\; (\lambda v_{i}.\; \texttt{\{} \seval{\overline{\beta}}{v_{i}} \texttt{\}})\; 
      \mathit{u.neighbors_{\graph}(\fld)} \texttt{]}} :: \sevalu{\sset}  \\
     \hfill \text{ if } 
            \mathit{ftype(u.type, \fkey)} = list \text{ and } \{v_{1}, \ldots, v_{k}\} = 
        \{v_{i} \mid (u, \fld, v_{i}) \in \mathit{edges}(\graph)\} \\
        % 
      \resp{\texttt{\{}\seval{\subqueries{\beta}}{v}\texttt{\}}} :: \sevalu{\sset}  
      \hfill \text{ if } 
        \mathit{ftype(u.type, \fkey)}  \neq list \text{ and } (u, \fld, v) \in \mathit{edges}(\graph) \\
        %
      \resp{\nval} :: \sevalu{\sset} 
     \qquad \qquad \qquad \qquad \qquad
    \text{ if } \mathit{ftype(u.type, \fkey)}  \neq list  \text{ and } \nexists v \text{ s.t. }  (u, \fld, v) \in \mathit{edges}(\graph) \\
    \end{cases}\\
    %inline fragment
    & (4) & \sevalu{\ifrag{t}{\overline{\beta}}:: \sset} &~=~ \begin{cases}
    \sevalu{\overline{\beta} \mdoubleplus \sset} & 
  	\text{ if }\mathit{fragment\_type\_applies (u.type, \texttt{t})}\\
    \sevalu{\sset} & \text{ otherwise}
    \end{cases}
    \end{align*}
    \caption{Simplified semantics for selections in normal form,
      adapted from~\cite{gqlph}.} 
    %\td{Fixing}} %\et{why is it $\in L$ and not $= L$?}}
    % \et{this is exactly as in HP? if so, we should cite here as well (e.g. "(from [X])" or "(adapted from [X])" if there are some light changes)--we might want to put the "adapted from" in other figures as well to prevent plagiarism attacks!}}
    % \td{It is mostly the same, except that their definition is on atomic selections, that we use coercion of values + the fact that they define everything with set and functions (mentioned in \S\ref{subsec:discussion})}
    \label{fig:simpl_semantics}
\end{figure*}


% First, we prove that the normalization procedure correctly produces queries in normal form. The proof is performed by generalizing the statement and proving that the normalization of selection sets results in non-redundant selections that are also in ground-typed normal form.
% Both cases are proven by well-founded induction over the size of the selection set and auxiliary lemmas about subtyping. 
%\td{Here as well, types are @Selection Scalar, which seems like a bit too much}
%\et{use the same implicit parameter trick?}
% \et{watchout: "selection tree" vs. "selection set": is there any difference?} 
% \td{The selection set is a list, but selections have a tree structure... So, idk. Probably just simpler to say
% selection set}
% \td{The size  of the selection set is not simply the size of the list, it is the size of the tree (length of list + depth)}

% \td{Swap order of this lemmas? I thought it'd be simpler to read the top level statement and then see the lemmas}
%\et{ok}


% To prove result {\em b)} 
% Next, we prove that normalization preserves the semantics of queries. We follow a similar approach as above and prove a generalized statement about the execution of selection sets. Whereas the evaluation of a query is performed over the root node of the graph, the statement about preservation of semantics for selections is over any node in the graph. 
% Likewise, the proof follows by well-founded induction over the size of the selection set.

%Next, we prove normalization preserves the semantics of queries. To begin with, we prove the case where the type in context is the same as the type of the node where queries are being normalized. Lifting this to the top level, it corresponds to normalizing over the Query type and evaluating on the root node (whose type is the same, due to graph conformance). We then extend this notion to normalization over any type in context, \texttt{ty}\et{here you're using "ty" - ts or ty is good for me but be consistent throughout}, but with the restriction that the node's type must be subtype of \texttt{ty}. Once again, this is valid at top level over the Query type and the root node. Conformance of the graph also ensures that normalization and evaluation over neighboring nodes is preserved\et{why? I don't get it}. The proof also follows by well-founded induction over the queries size and auxiliary lemmas about graph conformance.

% % \td{Same here (order)}
% \begin{minted}[escapeinside=||,mathescape=true,bgcolor=coqbg]{coq}
% Lemma normalize_selections_preserves_semantics
%   (s : wfGraphQLSchema) (g : conformedGraph s) 
%   (|$\sel$|s : seq Selection) (u : node) :
%     u |\textbackslash|in g.(nodes) ->
%     execute_selection_set s |\vscalar| g coerce
%       u (normalize_selections s u.(ntype) |$\sel$s|) =
%     execute_selection_set s |\vscalar| g coerce
%       u |$\sel$s|.

% Theorem normalize_preserves_query_semantics 
%   (s : wfGraphQLSchema) (g : conformedGraph s)
%   (|$\query$| : query) :
%     execute_query s |\vscalar| g coerce (normalize s |$\query$|) =
%     execute_query s |\vscalar| g coerce |$\query$|.
% \end{minted}
% \td{Also removed coerce from the definition here -- I could actually do the same with the schema and the graph tbh...}
%\et{I guess you need the Lemma first, to be able to prove the Theorem.}

% This concludes our proofs of normalization and establish that the first premise assumed by \HP is correct in the context of our system.
%\et{you should mention the problems you discovered about \HP by doing this formalization}
% The next section continues with the second premise, namely the definition of a simplified version of the semantics and the proof of equivalence.

\subsection{Simplified Semantics of Normalized Queries}
\label{sec:simpl-semantics}

One of the main properties of queries in normal form is that they produce non-redundant responses.
% \et{is that the main motivation for HP to propose normalization in the first place? (if so, say it)}\td{Idk, but I guess it is, since they exploit the equivalence of both semantics when proving complexity}
This in turn permits defining a simplified evaluation function which \HP crucially use to establish their complexity results.
 % shows the simplified semantics's formal definition. %Aliased cases\et{?} are not included due to space constraints.
However, \HP do not formally prove that this simplified semantics is equivalent to the original, when considering normalized queries.

We define the simplified semantics $\seval{\cdot}{}$ of \HP as shown in
Figure~\ref{fig:simpl_semantics} and prove that, for queries in normal
form, both $\seval{\varphi}{}$ and $\eval{\varphi}{}$ produce the same response. 
% \et{can skip this sentence} The complete definitions can be found in files \texttt{QuerySemantics.v} and \texttt{QuerySemanticsLemmas.v}. 
%
%\et{size of the selection sets (unless you've defined somewhere the size of queries)} 
% \et{any insight from the proofs?}\td{Not really? Once the normalization was ok the proof came out smoothly.}
% \td{These ones as well are parameterized by the coercion function + predicate for valid values}
\begin{minted}[bgcolor=coqbg, escapeinside=||,mathescape=true]{coq}
Lemma exec_sel_eq_simpl_exec
  (s : wfGraphQLSchema) (g : conformedGraph s) 
  (|$\sel$|s : seq Selection) (u : node) :
    are_in_normal_form s |$\sel$|s -> 
    execute_selection_set s |\vscalar| g coerce
      u |$\sel$|s = 
    simpl_execute_selection_set s |\vscalar| g coerce
      u |$\sel$|s.   

Corollary exec_query_eq_simpl_exec 
  (s : wfGraphQLSchema) (g : conformedGraph s)
  (|$\query$| : query) : 
    is_in_normal_form s |$\query$| -> 
    execute_query s |\vscalar| g coerce |$\query$| =
    simpl_execute_query s |\vscalar| g coerce |$\query$|.
    
 
\end{minted}
%\et{fix parentheses above}

\noindent The proof is once again performed by induction over the size of the selection set.

%\et{what are these "coerce" stuff?}
%\td{Something missing from the definition of semantics. The \spec says that values obtained from your "universe" must be coerced to the proper type 
%expected by the schema -> we include a coercion function}
%\et{make sure this is explained clearly in section 3}
% This concludes the definitions and proofs necessary for \HP's results, however, before closing this section we will refer to some issues we encountered, as well as some general observations. 
 
\subsection{Observations}\label{subsec:norm_lims}

Mechanizing normalization and the simplified semantics, as well as associated properties and proofs, led us to identify some issues in \HP's definitions. While these are admittedly minor, they confirm the value of mechanized formalization.
% , ranging from minor issues with the definition of non-redundancy, unspecified or underspecified rules of equivalence, to limitations on the data model and semantics.
 % (\S\ref{subsec:discussion}). We describe these more thoroughly in \S\ref{subsec:norm_lims}.
% We now discuss some discoveries made regarding \HP's definitions and how we solve them. In particular, we review the non-redundancy property and the set of equivalence rules they define to normalize queries.

First, some queries are considered non-redundant by \HP although they actually produce redundant results.
% \et{is being redundant a property of the query syntax or of its (semantic) results?}\td{the syntax, but it is reflected in the results when there is no merger of fields}. 
A simple example is the following query:

\begin{minted}{js}
        query {
          movie[id:2000] { title }
          movie:movie[id:2000] { title } }
\end{minted}
which produces two repeated values. This occurs because \HP's definition of non-redundancy does not consider the case when an unaliased field and an aliased field share the same response name. This cornercase is properly handled in \gcoql, by grouping fields by their response names. We discovered this case when proving the equivalence of the simplified semantics.
% \et{do you remember when/how you discovered that? by proving which lemma/theorem?}
% \td{Not sure, it was probably when that proving non-redundancy of the normalization function.}
% \td{Ahhh no ! It was actually when proving the equivalence of the simplified semantics}
% Our implementation addresses this by grouping fields by their response names.
% considered non-redundant by \HP, but which would produce two repeated values. It is a very minor slip, which 
% occurs because their definition of non-redundancy does not consider the cases of unaliased and aliased fields sharing the same response name. Our implementation addresses this by grouping fields by their response names.

Second, using \HP's equivalence rules, some queries cannot be normalized.
 % (\cf \S3~\cite{gqlph}), 
 % \td{I just realized that the version of the paper we are citing does not include the equivalence rules}. 
For instance:
\begin{minted}[escapeinside=||, mathescape=true]{js}
          query {
            movie[id:2000] { title }
            |$\ldots$| on Query { movie[id:2000] { title } } }
\end{minted}
%\et{like above explain why this problem happens in HP}
In \HP there are no rules that explicitly consider and use the type in scope to transform selections. In this example, by considering the \texttt{Query} type, either the fragment's contents could be lifted or the field could be wrapped in a fragment, and then the existing rules would apply. 


%
We uncovered this issue when working on the semantics preservation proof of the normalization function. In fact, at the time we were formalizing the query semantics following \HP's approach. Working on fixing this issue and the others discussed in \S\ref{subsec:discussion} led us to change the approach of the query semantics, and to adjust the definition of responses.


% Regarding the equivalence rules defined by \HP, we note that their current definition  fails to normalize queries.
% For example, the following query cannot be normalized with the current set of rules.
% \td{Another rule does not account for order, but is not mentioned and I'm not sure if it's ok with their formalization (I mentioned it in Zulip). I don't think it adds too much info tbh}
% \et{before you mentioned that you also identified "limitations on the data model and semantics" -- those are all discussed in section 3.5?}
% \td{Yeah. I discovered those issues when I was originally working with \HP's semantics (with post-processing) and trying to prove that the
% normalization function preserved the semantics. We were not being able to prove something and then we noticed that \HP's responses did not cover the 
% whole possible spectrum. That finally triggered changing the semantics to be as the spec and fix the definition of responses we had, etc.}
% \et{any discovery when working on the simplified semantics + equivalence?}
% \td{That non-redundancy was wrong}

% With these final observations we conclude the normalization transformation. 
% We have provided an algorithmic definition of the notion of query normalization proposed by \HP, proven it correct and preserving the original query's semantics.
% Finally, we define the simplified semantics used in \HP and we have proven that it is equivalent to the original semantics, property that is exploited by \HP to prove complexity results over \gql queries.


% Moving on to the equivalence rules\et{which}, there are three aspects we have to highlight. The first one is that rule number (2)\et{from where?}, which refers to the merging of fields with subqueries, is correct but does not preserve {\em ordering} of the queries \et{what does it mean? the order of each query in the sequence, or the order within a given query?}. While this is not imposed by the \spec, it is an important feature of \gql evaluation\et{who says?}. This is also important at the moment of defining and comparing semantic equivalence between queries.\et{why?}

%The second aspect is about the elements they use \td{?}\et{super unclear} in their rules. In some cases they use list of queries while in some other they define it over single queries, or sometimes mix them\et{that's our query/queries discussion here}. While this is no big issue, it was a bit confusing when trying to implement their rules in \coq.
%\td{Not sure how to describe this, but the thing is their rules are a bit weird. They describe rules for individual selections, but there is no... "global" rewriting. I imagine this is "simpler" to understand with their semantics, because they do not modify the queries as they evaluate them (pushing everything to the responses), but it is still weird to define it as a procedure in Coq (or even as inductive relation).}\et{unclear}

%Finally, there is an implicit notion of type in context when they describe their rules\td{and maybe a missing rule?}. This is crucial, because otherwise there are queries that cannot be normalized. However, if the type in context is included, which corresponds to \texttt{Query} in this case, it is possible to do more. The queries can be wrapped in an inline fragment with type condition \texttt{Query}. Then, with a mix of other rules the normalized query can be obtained.


%\et{not useful conclusion paragraph---and not finished} This result concludes the normalization process. We have described our approach to proving both premises exploited by \HP. As mentioned earlier, these are fundamental for their complexity results over \gql queries but were not proven true. We believe our approach properly and rigorously addresses them. The following section discusses

%\et{to consider: having a subsection 4.5 on Discussions/Lessons/Observations that gathers all the insights about this normalization story}
%\et{this would make section 5 (which could be placed before section 4) a section dedicated to the discussions on the core formalization (section 3)}