Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0] #17347

Closed
3 tasks done
harshgadhia opened this issue Nov 5, 2021 · 3 comments
Closed
3 tasks done

SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0] #17347

harshgadhia opened this issue Nov 5, 2021 · 3 comments
Assignees
Labels
question & help wanted Use Github discussions instead

Comments

@harshgadhia
Copy link

harshgadhia commented Nov 5, 2021

I have installed elasticsearch-dbapi library, and I have setup a connection to AWS elasticsearch [running Kibana version 6.8.0] with the following connection string:

odelasticsearch+https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/

Superset is able to successfully connect to the ES endpoint, and get the list of indexes correctly. As I see that the dropdown for tables is correctly populated. However, its not able to parse index metadata and run SQL query on ES:

Below I have provided as much details as possible for the issues relating to

1. Superset is not able to parse the index metadata:

  • When I select any of the indexes (or table schema as called in the superset UI) from the list, I get a UI error at the bottom.
  • ERROR An error occurred while fetching table metadata
  • Presumably this is because, SuperSet is not able to parse the index metadata coming from the AWS elastic search endpoint.
  • I have verified using curl command that the ES server is responding with data. This is the same endpoint that superset is hitting (see logs below).
curl --location --request GET 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/<INDEX-NAME>/_mapping?format=json'

Error

File "/usr/local/lib/python3.7/site-packages/es/opendistro/api.py", line 236, in get_valid_columns
response[index_real_name]["mappings"]["properties"], []
KeyError: 'properties'

Application Logs

2021-11-04 20:08:21,317:INFO:elasticsearch:GET https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/dummy_index_alias/_mapping?format=json [status:200 request:0.026s]
2021-11-04 20:08:21,317:DEBUG:elasticsearch:> None
2021-11-04 20:08:21,317:DEBUG:elasticsearch:< {"dummy_app":{"mappings":{"dummy_app":{"properties":{"param1":{"type":"boolean"},"param2":{"type":"float"},"param3":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param4":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param5":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param6":{"type":"date"},"deleted":{"type":"boolean"},"event_time":{"type":"date"},"gwGen":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}}
2021-11-04 20:08:21,318:ERROR:root:'properties'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flask_appbuilder/api/__init__.py", line 85, in wraps
    return f(self, *args, **kwargs)
  File "/app/superset/views/base_api.py", line 85, in wraps
    raise ex
  File "/app/superset/views/base_api.py", line 82, in wraps
    duration, response = time_function(f, self, *args, **kwargs)
  File "/app/superset/utils/core.py", line 1429, in time_function
    response = func(*args, **kwargs)
  File "/app/superset/utils/log.py", line 241, in wrapper
    value = f(*args, **kwargs)
  File "/app/superset/databases/api.py", line 517, in table_metadata
    table_info = get_table_metadata(database, table_name, schema_name)
  File "/app/superset/databases/utils.py", line 66, in get_table_metadata
    columns = database.get_columns(table_name, schema_name)
  File "/app/superset/models/core.py", line 650, in get_columns
    return self.db_engine_spec.get_columns(self.inspector, table_name, schema)
  File "/app/superset/db_engine_specs/base.py", line 887, in get_columns
    return inspector.get_columns(table_name, schema)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 391, in get_columns
    self.bind, table_name, schema, info_cache=self.info_cache, **kw
  File "/usr/local/lib/python3.7/site-packages/es/opendistro/sqlalchemy.py", line 60, in get_columns
    result = connection.execute(query)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2235, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1003, in execute
    return self._execute_text(object_, multiparams, params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1178, in _execute_text
    parameters,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
    e, statement, parameters, cursor, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1514, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.7/site-packages/es/baseapi.py", line 37, in wrap
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/es/opendistro/api.py", line 275, in execute
    return self.get_valid_columns(re_table_name[1])
  File "/usr/local/lib/python3.7/site-packages/es/opendistro/api.py", line 236, in get_valid_columns
    response[index_real_name]["mappings"]["properties"], []
KeyError: 'properties'

2. SuperSet gets 400 Bad Request error from ES endpoint, when running a SQL query for ES index.

  • When running a SQL query in the SQL Lab editor, we get the error as shown in the below screenshot.
  • The issue seems to be in superset, as it is sending a malformed POST request to the ES server.
  • I have verified using curl, that the ES server is responding correctly with appropriate data.
  • Below is the example endpoint hit by SuperSet, which is responding with 400 Bad request error.
https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/_opendistro/_sql'
  • Using curl, I verified response OK from ES server
curl --location --request POST 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/_opendistro/_sql' \
--header 'Content-Type: application/json' \
--data-raw '{
 "query": "SELECT * FROM dummy_index where some_id = 1 order by id DESC limit 1"
}'

Sample response from the above REST call:

{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":43464,"max_score":null,"hits":[{"_index":"dummy_index","_type":"dummy_index","_id":"dummy_index-app_1636096855000","_score":null,"_source":{"param1":"value1","param2":false,"param3":"1636096855000","param4":"2021-11-05T07:21:01.439Z"},"sort":[1636096861439]}]}}

However, elasticsearch-dbapi is not able to parse this correctly, and shows the error as shown in the screenshot below.

Application Logs

[2021-11-05 07:35:25,181: ERROR/ForkPoolWorker-14] Query 127: <class 'es.exceptions.DataError'>
Traceback (most recent call last):
  File "/app/superset/sql_lab.py", line 266, in execute_sql_statement
    db_engine_spec.execute(cursor, sql, async_=True)
  File "/app/superset/db_engine_specs/base.py", line 1094, in execute
    raise cls.get_dbapi_mapped_exception(ex)
  File "/app/superset/db_engine_specs/base.py", line 1092, in execute
    cursor.execute(query)
  File "/usr/local/lib/python3.7/site-packages/es/baseapi.py", line 37, in wrap
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/es/opendistro/api.py", line 284, in execute
    "Missing columns field, maybe it's an elastic sql ep"
es.exceptions.DataError: Missing columns field, maybe it's an elastic sql ep

How to reproduce the bug

  1. As mentioned above, setup a connection to Elasticsearch from SuperSet
  2. For seeing first error: Click on any index name on the left side in the SQL Lab editor, it will result into error as shown in screenshot 1 below.
  3. For seeing second error, Run a SQL query as explained in problem 2 above, and it will result into error as shown in screenshot 2 below.

Expected results

For problem 1, superset should correctly parse the index metadata.
For problem 2, superset should correctly form the query request for ES server.

Actual results

For problem 1: See Screenshot 1
For problem 2: See Screenshot 2

Screenshots

For Problem 1:
image

For problem 2:

Run query like: SELECT * FROM dummy_index where some_id = 1 order by id DESC limit 1

error2

Environment

  • browser type and version: Google Chrome [Version 95.0.4638.69 (Official Build) (x86_64)]
  • superset version: 1.3.1
  • python version: 3.7
  • node.js version: node -v
  • any feature flags active: None
  • pip elasticsearch-dbapi version: 0.2.6
  • pip elasticsearch version: 7.13.4
  • Kibana version: 6.8.0

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

I looked into a similar issue described here Trouble connecting to AWS OpenSearch via Superset, but my issue seems to be different from theirs. For me, It seems that the connection to Elastic search is succeeding. However, there are issues with parsing of the data / forming right POST request in the library.
Any help is greatly appreciated.

@stockholmux
Copy link

stockholmux commented Nov 8, 2021

@harshgadhia That version is of ES is quite old. I know the SQL extensions have gone through some changes recently I'm not sure which version the SuperSet supports. Maybe someone from the SuperSet team can fill us in on what versions they test against.

@srinify
Copy link
Contributor

srinify commented Nov 8, 2021

cc @dpgaspar who may know a bit more!

@harshgadhia
Copy link
Author

@stockholmux @srinify Thank you very much for responding to my issue.

I have also reached out to maintainers of the elasticsearch-dbapi plugin. As the fix for problem 2 mentioned above, seems to be simple. The older versions of ES before 7.4 needs a query parameter passed to the sql endpoint, to get the data in jdbc format.
Please see this thread.

@junlincc junlincc added question & help wanted Use Github discussions instead and removed #bug Bug report labels Nov 11, 2021
@apache apache locked and limited conversation to collaborators Feb 2, 2022
@geido geido converted this issue into discussion #18341 Feb 2, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question & help wanted Use Github discussions instead
Projects
None yet
Development

No branches or pull requests

5 participants