Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

字段血缘不能获取目的表的字段名 #4

Open
huajianyihujiu opened this issue Jul 24, 2020 · 2 comments
Open

字段血缘不能获取目的表的字段名 #4

huajianyihujiu opened this issue Jul 24, 2020 · 2 comments

Comments

@huajianyihujiu
Copy link

您好,以下是我执行的SQL:
insert overwrite table bi_tmp.t_tmp_000184_order
select concat(create_time,'123') -- order
from bi_tmp.t_tmp_000189_order
where order is not null
limit 1;
通过Hook得到的字段血缘信息如下:
"columnLineage":[{"expression":["concat(bi_tmp.t_tmp_000189_order.create_time,'123')"],"qualifiedName":"bi_tmp.t_tmp_000184_order.","inputs":[{"qualifiedName":"bi_tmp.t_tmp_000189_order.create_time"}],"name":""}]
可以看到qualifiedName的值为bi_tmp.t_tmp_000184_order.
没有拿到输出表的字段名
如果加入SQL中被注释掉的别名 order ,血缘信息中qualifiedName的值为bi_tmp.t_tmp_000184_order.order
但这并不一定是输出表中真正的物理字段名,只是SQL中的临时别名

请问这个问题之后会进行维护嘛,期待回复!

@bill-cc
Copy link
Owner

bill-cc commented Jul 25, 2020

首先建议这种表达式字段添加上别名,已具体对应输出字段名称,而输入字段是在inputs属性字段中体现的,比如说你输出字段的别名定义为order,则order字段的输入字段就是inputs中对应的字段列表(如上[{"qualifiedName":"bi_tmp.t_tmp_000189_order.create_time"}]),否则如果不定义输出字段的别名,则该名称就要用整个表达式代替了,如果表达式复杂的话,这样表示也是不合适的。或者您这边有好的建议也可以提供。

@huajianyihujiu
Copy link
Author

感谢回复。
是否可以连接hive元数据库,获取输出表的字段及顺序,补充到字段血缘中来
或者是从hive的某个对象中获取输出表字段名(虽然具体到哪个类还不得而知)
另外请问,想要更好的了解hive源码目前只有编译之后debug观察一种方法吗,或者可以从哪里查看相关的资料

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants