Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanner error while joining two tables. #9

Open
ajaysant opened this issue May 30, 2019 · 0 comments
Open

Scanner error while joining two tables. #9

ajaysant opened this issue May 30, 2019 · 0 comments

Comments

@ajaysant
Copy link

I am trying to read two tables from Kudu and join them in the query.

I followed the example steps of reading the Table to DataFrame and registering it as a temp table. I repeat the same steps for a second table and then I query on them.

I have then use the dbGetQuery() method to pass a query joining the two tables and getting it in the data frame.

I get the following error:

Failed to fetch data: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 8.0 failed 1 times, most recent failure: Lost task 19.0 in stage 8.0 (TID 163, localhost, executor driver): org.apache.kudu.client.NonRecoverableException: Scanner not found at org.apache.kudu.client.KuduException.transformException(KuduException.java:110) at org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:352) at org.apache.kudu.client.KuduScanner.nextRows(KuduScanner.java:58) at org.apache.kudu.spark.kudu.RowIterator.hasNext(KuduRDD.scala:120) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148) at org.apache.spark.schedule

The sample query is:
`test_query <- paste("SELECT * FROM tbl1 n0 FULL OUTER JOIN tbl2 n1 on n0.id = n1.id WHERE n0.id LIKE CONCAT(cast(default.getJulianFromDate('yyyy-MM-dd hh:mm:ss', '", Sys.getenv("START"), "') AS STRING),'%') AND n1.id LIKE CONCAT(cast(default.getJulianFromDate('yyyy-MM-dd hh:mm:ss', '", Sys.getenv("START"), "') AS STRING),'%') LIMIT 100",sep="")

table_df <- dbGetQuery(sc, test_query)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant