-
Notifications
You must be signed in to change notification settings - Fork 4
MEMO: Upgrade embulk output orc to 0.4.0
Yukihiro Okada edited this page Jun 1, 2020
·
8 revisions
Upgrade embulk-output-orc to 0.4.0 by yuokada · Pull Request #22 · yuokada/embulk-output-orc
依存性解決するまでのメモ書き。
- org.apache.hadoop : hadoop-hdfs : 3.2.1 - The Central Repository Search Engine
- Maven Repository: org.apache.hadoop » hadoop-aws » 3.2.1
依存関係を眺めながら解決をしていくしかない。
- emembulk-output-orcのコードの方も改修が必要かな。
Hmm...
~/w/I/embulk-output-orc ❯❯❯ ./gradlew test
> Task :compileScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
/Users/yuokada/works/IdeaProjects/embulk-output-orc/src/main/scala/org/embulk/output/orc/OrcColumnVisitor.scala:8: Class org.apache.hadoop.io.Writable not found - continuing with a stub.
class OrcColumnVisitor(val reader: PageReader, val batch: VectorizedRowBatch, val i: Integer) extends ColumnVisitor {
^
one error found
> Task :compileScala FAILED
FAILURE: Build failed with an exception.
hive-storage-api
がエラーの原因らしいのでこれに依存しているものを調べる。
~/w/I/embulk-output-orc ❯❯❯ ./gradlew dependencyInsight --configuration compile --dependency hive-storage-api
Starting a Gradle Daemon, 1 busy and 1 incompatible Daemons could not be reused, use --status for details
> Task :dependencyInsight
org.apache.hive:hive-storage-api:2.7.1
variant "runtime" [
org.gradle.status = release (not requested)
org.gradle.usage = java-runtime (not requested)
org.gradle.libraryelements = jar (not requested)
org.gradle.category = library (not requested)
]
org.apache.hive:hive-storage-api:2.7.1
\--- org.apache.orc:orc-core:1.5.10
\--- compile
A web-based, searchable dependency report is available by adding the --scan option.
BUILD SUCCESSFUL in 38s
1 actionable task: 1 executed
provided
なのか新しいバージョンが使えそうではある。
~/w/I/embulk-output-orc ❯❯❯ ./gradlew dependencyInsight --configuration compile --dependency hive-storage-api
> Task :dependencyInsight
org.apache.hive:hive-storage-api:2.7.2
variant "runtime" [
org.gradle.status = release (not requested)
org.gradle.usage = java-runtime (not requested)
org.gradle.libraryelements = jar (not requested)
org.gradle.category = library (not requested)
]
Selection reasons:
- By conflict resolution : between versions 2.7.2 and 2.7.1
org.apache.hive:hive-storage-api:2.7.2
\--- compile
org.apache.hive:hive-storage-api:2.7.1 -> 2.7.2
\--- org.apache.orc:orc-core:1.6.3
\--- compile
A web-based, searchable dependency report is available by adding the --scan option.
BUILD SUCCESSFUL in 30s
1 actionable task: 1 executed
Hadoop 3系にアップグレードするのはORCがボトルネックとなって難しい感じだな。。。
- Maven Repository: org.apache.hadoop » hadoop-aws » 3.2.1
- Maven Repository: org.apache.hadoop » hadoop-aws » 2.10.0
- Maven Repository: org.apache.hadoop » hadoop-aws » 3.1.1
Hadoop awsだけアップグレードするって選択肢はあるんだろうか?
-
apache spark - hadoop aws versions compatibility - Stack Overflow
-
Hadoop 3: Comparison with Hadoop 2 and Spark | ActiveWizards: data science and engineering lab
-
Hadoop 2 vs Hadoop 3 - Why You Should Work on Hadoop Latest Version - DataFlair
-
S3の署名バージョン2 廃止に対応する方法を調べてみた(Embulk S3プラグイン編) – サーバーワークスエンジニアブログ