-
Notifications
You must be signed in to change notification settings - Fork 979
Code Generation and "Short, Fat" Queries
Drill is a query engine for "Big Data" queries: scans of files with billions of records distributed over large clusters. As it turns out, sometimes people want to use Drill for smaller queries: queries that could run just as well in a typical database such as MySQL. The advantage of Drill, however, is the lack of an ETL step: no need to load data into a table. For this reason, people seek to use Drill even on small data files.
This write-up discusses work done to analyze performance of one such query: a "short, fat" query that has just 6000 rows, but 240+ columns, mostly of float columns representing money amounts.
It turns out we can reduce run-time cost by 33% by using plain-old Java without byte-code fixups, and using the JDK compiler.
The query profile showed setup time for the selection vector remover operation takes over 50% of the run time (the first time.) Most of that is code gen (code gen, compilation, byte-code fixup.)
To get a clear read, an option was added to turn off the code cache so that the code is rebuilt on each run. Then, the query was run 10 times on the same (embedded) Drillbit, 10 with “plain old Java” compilation, 10 with normal Drill compilation. (“Plain old Java” just means having each generated class extend its’ template, then using the byte codes directly from the compiler with no ASM-based byte-code manipulations. See DRILL-5052.)
Results:
Traditional: 3259 ms
Straight-Java: 2488 ms
Savings: 1041 ms or 30% savings
Even more interesting, we can time the actual code gen & compile step. Virtually the entire cost is the byte code fixup. Using the traditional approach, cost is about 1500 ms per code gen/compile/bytecode fixup. Using plain-old Java, the cost drops 90% to about 150 ms.
Next we can go all-in and convert all operators in this particular query to use “plain-old Java”. This reduced run time further, to 2232 ms, for another 250 ms savings (about 10%).
Going against conventional wisdom a bit more, we can select the JDK compiler instead of Janino. Result: 2183 ms, for another 50 ms improvement. Not big, but it shows that the JDK compiler is not bad: perhaps because the JDK compiler does not write code to a file before compiling.
The net result is an improvement of 1076 ms per run, or 33% reduction.
As a sanity check, code caching was enabled again. The 10-run average is now 1817 ms. (The original 10-run average time, with caching and “stock" settings, was 2124 ms.)
After these changes, selection vector remover setup time now accounts for 23% of total time, the JSON scan accounts for 32%, which is an improvement, but still provides opportunities for further speed-ups.
The savings above will be less for queries with fewer columns. Of course the flip side is that the savings will be greater for more columns.
“That’s all fine,” I hear you say, “but we do byte-code fixup for a reason!” To check this, we can use a 2M rows, 300 MB data set and tried out the the original external sort. With original settings, the 10-run average is 15.2 seconds. Switching to “plain-old Java” and the JDK compiler produced an average run time of 15.2 seconds — any difference between the two cases is lost in the noise.
It turns out that doing scalar replacement in byte code fixup saves nothing with modern Oracle JVMs - the JVM (or compiler) already does the scalar replacement for us.
Net takeaway: using plain-old Java (without byte code fixup) and the JDK compiler produces better performance for queries with many columns, and speeds up code generation. At the same time, the technique does not degrade performance for queries with large numbers of rows.