Internal SQL Library

Introduction

The Internal SQL library is a tool to query the content of Java arrays using the SQL language.

You call a java method, named ISQLQuery.query, to which you pass some arrays and an SQL query.
The arrays get grouped in virtual database tables, which are queried with the SQL query.
The result set is converted to arrays, returned by ISQLQuery.query.

The following picture shows how this is done:

In the figure:

Each input array has a name. The name is used to identify the array in the SQL query.
The SQL is similar to the one we use to query a database. One of the most important differences is in the from clause: Since there are no real tables (we are not working with a real database), we build our tables in the SQL query itself. [a1, a2] as a in the from clause means that the two input arrays a1 and a2 become the columns of the table a. In the rest of the SQL query a1 and a2 are always mentioned as a.a1 and a.a2 (it is not possible to use the column name without the table).
The output arrays c1 and c2 are the result set of the query. They are the aliases (as ...) in the select clause of the query.

When we call the query function:

The SQL query is parsed, building an internal structure. If it contain errors the call is stopped and the errors are returned as exceptions.
The input arrays are applied to the query internal structure.
The output arrays are filled with the return values of the query.

Dependencies

The Internal SQL library uses ANTLR v 3.x to parse the SQL query with the grammar defined in ISQLSelect.g.

How to use it in a program

After linked the library to your project, you need to create an instance of the class ISQLQuery.
The ISQLQuery constructor has two parameters:

The log parameter, which is an instance of the ILogger class. That's needed for this library to be able to log information about what it is doing.
On the other side the library cannot decide which log library you use (log4j, java.util.Logger...), so what you need to do is to make an adapter that wraps a Logger object of the log library you use in the ILogger interface.
An example of such adapter (just for test purposes) is SystemLogger, that prints the log information directly to the console.
The isDebug parameter. If this is true the library logs a lot of debug information that is very useful when testing but reduces substantially the performance.

Once you created the ISQLQuery object, you call its query method to apply the SQL query to the arrays.
The query method has the following parameters:

The sql parameter, which is the SQL query.
The mapArrayByName parameter, which contains the arrays that will be handled as table columns by the SQL query. They are in form of a map name -> array.
The name is the one that identifies the array in the query.
The array is an instance of the one of the following classes:

NumericArray (numbers)
TextArray (texts)
BooleanArray (boolean values)
DateArray (dates)

The arrays elements are objects (Double, String, Boolean, Timestamp). The reason why we did not use primitive arrays when possible (double[] and boolean[]) is practical: handle the arrays in a generic way, avoiding code duplication.
If your arrays are based on primitives (double[] or boolean[]) you can use ArrayBuilder to convert them to the equivalent object arrays (Double[] or Boolean[]).

The query method returns a map name -> result value.
Again, the name is the same that is used in the query as alias of the result item (after the as word).
The result value is an instance of ITypedValue. It means that it can be an array or a constant. It is a constant if resulting from an expression containing only a constant or an operation of constants.

Limitations

The SQL dialect to write the queries is not standard. These are the differences:

The FROM clause is used to build the tables to query, using the input arrays (like in the example we saw before, [a1, a2] as a).
All input columns have to be written as <tableName>.<columnName> (as a.a1 and b.b2 in the figure), everywhere in the SQL query. To write columnName without tableName is an error.
You need to set an alias (AS ...) for each output value in the SELECT clause. This is needed to connect the results with the output arrays.
Only aggregate functions are allowed in the SELECT clause, in the WHERE clause no functions at all are allowed. Operators (+,-,*,/,%) are allowed.
No sub-queries are allowed.
Only SELECT, FROM, WHERE, ORDER BY, GROUP BY clauses are available.
The LIKE operator works on regular expressions. Therefore don't write LIKE 'inter%' but LIKE 'inter.*' .

We have these limitations because:

Queries are not run against a database, but arrays in memory.
This is the first version of the library and reliability was preferred to completeness.

It is not excluded that in future versions some of this limitations will be removed.

Query Syntax

For who wants to understand precisely the SQL used for these queries, there is a grammar file, called ISQLSelect.g. Here I give a more human readable explanation. The query is in the following form:

SELECT <expression1> as <alias1>, ... <expressionN> as <aliasN>

FROM [<column1_1>, ... <columnN_1>] as <table1>, ... [<column1_M>, .. <columnP_M>] as <tableM>

WHERE <condition1> [AND|OR <condition2> ... AND|OR <conditionN>]

ORDER BY <column1>, ... <columnN>

GROUP BY <column1>, ... <columnN>

and composed by the following parts:

SELECT clause

The SELECT clause contains the result values of the query. There must be an alias (AS) for each value field.
Each result value can be:

a table column
a constant
an operation (+, -, *, /, %) between two expressions
an aggregate function (SUM, AVERAGE, MAX, MIN) Other functions (exp, log, ...) are not implemented. If you need to apply a function to an input or output array you can do it in the code that calls ISQLQuery.query, before or after the call.
a CASE expression:

CASE <column>
WHEN <column> = (or another operator) <constant1> THEN <expression1>
WHEN <column> = (or another operator) <constant2> THEN <expression2>
...
ELSE expressionN
END

In each WHEN condition you compare the column (always the same column) with a constant (constant1, constant2...). When one of the WHEN condition is true, the result value is set to the THEN expression. If none of them is true, the result value is set to the ELSE expression.

FROM clause

The FROM clause contains the definitions of the tables. Each definition is in the form [<column1>, <column2..>] AS <table>, like in the query in the figure. All columns in a table need to be of the same size. If they have different size, you can distribute them in two or more tables and join them using the WHERE clause.

WHERE clause (optional)

The WHERE clause is, as in the standard SQL, a set of predicates connected by AND and OR. Each predicate is in the form:

a comparison between two expressions, using standard operators ( =, <>, <, >) . We saw an example of it in the figure query.
a comparison between an expression and a regular expression, using the LIKE operator. For example:

WHERE a.a1 LIKE 'inter.*'

to search all the a.a1 values that start with 'inter'.
an IN clause: <expression> IN (<constant1>, <constant2>..., <constantN>). For example:

WHERE a.a1 IN (10, 12, 16 34)

to accept only values of the column a.a1 that are in the given list

An expressions in the WHERE clause can be:

a table column
a constant
an operation (+, -, *, /) between two expressions

ORDER BY clause (optional)

As in the standard SQL, the ORDER BY clause consists of the columns by which the query result has to be sorted. It is in the form: ORDER BY <column1>, <column2>, ... <columnN>

GROUP BY clause (optional)

Also the GROUP BY clause is the same as in the standard SQL. When you have some aggregate function in the SELECT clause (SUM, AVERAGE...) all the columns in the SELECT clause that are not parameters of the average functions must be mentioned in the GROUP BY clause. The GROUP BY clause is in the form: GROUP BY <column1>, <column2>, ... <columnN>

Some additional information.

Constants can be of 4 types, like the arrays.

Texts. They need to be surrounded by single quotes ('example')
Numbers (with or without decimals).
Dates. They need to be surrounded by sharp characters and written as ISO dates (#2007-10-11#).
Booleans (true or false).