Skip to content

Commit

Permalink
Merge pull request #25 from embulk/article-installing-maven-style-emb…
Browse files Browse the repository at this point in the history
…ulk-plugins

New article: Installing Maven-style Embulk plugins
  • Loading branch information
dmikurube authored Jun 13, 2024
2 parents 910e264 + 3d7c935 commit 8c938df
Showing 1 changed file with 197 additions and 0 deletions.
197 changes: 197 additions & 0 deletions _posts/2024-06-13-installing-maven-style-embulk-plugins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
---
layout: posts
title: "Installing Maven-style Embulk plugins"
date: 2024-06-13
description: "We recently started to provide a couple of methods to install the Maven-style Embulk plugins more easily, which was not very easy in the beginning of Maven-style plugins, indeed. This article is a brief introduction of the methods to install the Maven-style Embulk plugins."
author: "dmikurube"
---

Since [Embulk v0.11.0 was released a year ago](https://github.com/embulk/embulk/releases/tag/v0.11.0), we have pushed the new Maven-style Embulk plugins rather than the legacy RubyGems-style plugins.

See also: [Embulk v0.11 is coming soon: JRuby](https://www.embulk.org/articles/2023/04/13/embulk-v0.11-is-coming-soon.html#jruby)

We recently started to provide a couple of methods to install the Maven-style Embulk plugins more easily, which was not very easy in the beginning of Maven-style plugins, indeed.

This article is a brief introduction of the methods to install the Maven-style Embulk plugins.

## Revisit: Embulk home

Embulk now has a concept of the "Embulk home" directory, which is a directory to contain `embulk.properties` and Embulk plugin installations. The Maven-style Embulk plugins will also be installed in the Embulk home directory.

See again: [Embulk v0.11 is coming soon: Embulk home](https://www.embulk.org/articles/2023/04/13/embulk-v0.11-is-coming-soon.html#embulk-home)

## #1: Embulk's built-in subcommand `install`

[Embulk v0.11.3](https://github.com/embulk/embulk/releases/tag/v0.11.3) introduced a new Embulk subcommand: `embulk install`, instead of `embulk gem install` for RubyGems-style plugins. This subcommand takes a Maven artifact notation as its argument. The example below installs [`org.embulk:embulk-input-s3:0.6.0` from Maven Central](https://central.sonatype.com/artifact/org.embulk/embulk-input-s3/0.6.0).

```
$ java -jar embulk-0.11.3.jar install "org.embulk:embulk-input-s3:0.6.0"
...
...
2024-06-13 15:46:11.537 +0900 [INFO] (main): The path "/home/user/.embulk/lib/m2/repository" (m2_repo) does not exist. Creating it as a directory.
2024-06-13 15:46:11.619 +0900 [INFO] (main): No alternative remote Maven repositories are specified. Downloading artifacts from Maven Central.
2024-06-13 15:46:11.633 +0900 [INFO] (main): Downloading org.embulk:embulk-input-s3:pom:0.6.0 from https://repo.maven.apache.org/maven2
2024-06-13 15:46:12.725 +0900 [INFO] (main): Downloaded org.embulk:embulk-input-s3:pom:0.6.0 at /home/user/.embulk/lib/m2/repository/org/embulk/embulk-input-s3/0.6.0/embulk-input-s3-0.6.0.pom
2024-06-13 15:46:12.776 +0900 [INFO] (main): Downloading com.amazonaws:aws-java-sdk-s3:pom:1.11.466 from https://repo.maven.apache.org/maven2
2024-06-13 15:46:13.027 +0900 [INFO] (main): Downloaded com.amazonaws:aws-java-sdk-pom:pom:1.11.466 at /home/user/.embulk/lib/m2/repository/com/amazonaws/aws-java-sdk-pom/1.11.466/aws-java-sdk-pom-1.11.466.pom
...
...
2024-06-13 15:46:14.857 +0900 [INFO] (main): Downloading org.embulk:embulk-input-s3:jar:0.6.0 from https://repo.maven.apache.org/maven2
2024-06-13 15:46:14.857 +0900 [INFO] (main): Downloading com.amazonaws:aws-java-sdk-s3:jar:1.11.466 from https://repo.maven.apache.org/maven2
...
...
2024-06-13 15:46:15.720 +0900 [INFO] (main): Downloaded org.embulk:embulk-input-s3:jar:0.6.0 at /home/user/.embulk/lib/m2/repository/org/embulk/embulk-input-s3/0.6.0/embulk-input-s3-0.6.0.jar
2024-06-13 15:46:15.721 +0900 [INFO] (main): Downloaded com.amazonaws:aws-java-sdk-s3:jar:1.11.466 at /home/user/.embulk/lib/m2/repository/com/amazonaws/aws-java-sdk-s3/1.11.466/aws-java-sdk-s3-1.11.466.jar
...
...
2024-06-13 15:46:15.730 +0900 [INFO] (main): Installed org.embulk:embulk-input-s3:jar:0.6.0 at /home/user/.embulk/lib/m2/repository/org/embulk/embulk-input-s3/0.6.0/embulk-input-s3-0.6.0.jar
2024-06-13 15:46:15.730 +0900 [INFO] (main): Installed com.amazonaws:aws-java-sdk-s3:jar:1.11.466 at /home/user/.embulk/lib/m2/repository/com/amazonaws/aws-java-sdk-s3/1.11.466/aws-java-sdk-s3-1.11.466.jar
...
...
```

This subcommand downloads also the dependencies of the specified Maven artifact transitively as you can see in the example above.

Note that you can change the destination Embulk home directory by Embulk's standard options. See the example below.

```
$ java -jar embulk-0.11.3.jar -Xembulk_home=/tmp/foo install "org.embulk:embulk-input-s3:0.6.0"
...
$ env EMBULK_HOME=/tmp/bar java -jar embulk-0.11.3.jar install "org.embulk:embulk-input-s3:0.6.0"
...
```

It now supports only [Maven Central](https://central.sonatype.com/) as the remote repository, unfortunately.

## #2: Out-of-Embulk Embulk plugin installer

Embulk has had the `mkbundle` subcommand and the `-b` option so that users can maintain plugin installations by `Gemfile`, but it works only for RubyGems-style plugins, of course.

[The Gradle `org.embulk.runset` plugin](https://github.com/embulk/gradle-embulk-runset) is an alternative for Maven-style Embulk plugin. It works out of the Embulk package at all.

To use this, set up an environment for [Gradle](https://gradle.org/install/) at first. [Gradle 8.7](https://docs.gradle.org/8.7/userguide/userguide.html) is at least required. You may want to choose [the Gradle wrapper](https://docs.gradle.org/8.7/userguide/userguide.html) in typical use-cases.

Next, write `build.gradle` to declare the Maven-based Embulk plugins you wanted to install.

```
plugins {
id "org.embulk.runset" version "0.2.0" // Just apply this Gradle plugin.
}
repositories {
mavenCentral()
}
installEmbulkRunSet {
// Set your Embulk home directory (absolute path) to install the Embulk plugins and "embulk.properties".
embulkHome file("/home/user/my-embulk-home")
// Specify the Maven-style Embulk plugin by the "artifact" directive.
artifact "org.embulk:embulk-input-s3:0.6.0"
// You can specify multiple versions of the same Embulk plugin so that you can choose the version at runtime.
// You can also specify an artifact with the split-style notation.
artifact group: "org.embulk", name: "embulk-input-s3", version: "0.5.3"
// Specify this if you need JRuby.
// It downloads jruby-complete-9.1.15.0.jar, and set the "jruby" Embulk System Property in "embulk.properties".
jruby "org.jruby:jruby-complete:9.1.15.0"
// Specify this if you need to set some Embulk System Properties manually.
// It sets the "key" Embulk System Property to "value" in "embulk.properties".
embulkSystemProperty "key", "value"
}
```

Then, run `gradle installEmbulkRunSet` (`./gradlew` when you use the Gradle wrapper) to set up.

```
$ gradlew installEmbulkRunSet
> Configure project :
Supplied embulkHome "/home/user/my-embulk-home" does not exist, then will be created.
Setting to copy org.embulk:embulk-input-s3:0.6.0:jar into org/embulk/embulk-input-s3/0.6.0
Setting to copy com.amazonaws:aws-java-sdk-s3:1.11.466:jar into com/amazonaws/aws-java-sdk-s3/1.11.466
...
...
Setting to copy org.embulk:embulk-input-s3:0.5.3:jar into org/embulk/embulk-input-s3/0.5.3
...
...
Setting to copy org.embulk:embulk-input-s3:0.5.3:pom into org/embulk/embulk-input-s3/0.5.3
...
...
Setting to copy org.jruby:jruby-complete:9.1.15.0:jar into org/jruby/jruby-complete/9.1.15.0
BUILD SUCCESSFUL in 2s
1 actionable task: 1 executed
```

The Embulk System Properties file `embulk.properties` is automatically generated in the specified Embulk home, too.

```
#Generated by the "org.embulk.embulk-runset" Gradle plugin.
#Thu Jun 13 16:53:31 JST 2024
key=value
jruby=file\:///home/user/my-embulk-home/lib/m2/repository/org/jruby/jruby-complete/9.1.15.0/jruby-complete-9.1.15.0.jar
```

## Run!

In either style of installation, you can run Embulk with the installed Maven-style Embulk plugins.

See the example `s3_with_maven.yaml` below.

```yaml
in:
# The full-style type notation for Maven-style Embulk plugins.
type:
source: maven
group: org.embulk
name: s3
version: 0.6.0
bucket: ...
parser:
type: csv
...
out:
type: stdout
```
Then, run Embulk!
```
$ java -jar embulk-0.11.4.jar -Xembulk_home=/home/user/my-embulk-home run s3_with_maven.yml
2024-06-13 17:01:55.373 +0900 [INFO] (main): embulk_home is set from command-line: /home/user/my-embulk-home
2024-06-13 17:01:55.378 +0900 [INFO] (main): m2_repo is set as a sub directory of embulk_home: /home/user/my-embulk-home/lib/m2/repository
2024-06-13 17:01:55.378 +0900 [INFO] (main): gem_home is set as a sub directory of embulk_home: /home/user/my-embulk-home/lib/gems
2024-06-13 17:01:55.378 +0900 [INFO] (main): gem_path is set empty.
2024-06-13 17:01:55.378 +0900 [DEBUG] (main): Embulk system property "default_guess_plugin" is set to: "gzip,bzip2,json,csv"
2024-06-13 17:01:55.634 +0900 [INFO] (main): Started Embulk v0.11.4
2024-06-13 17:01:55.811 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-s3 (maven:org.embulk:s3:0.6.0)
...
...
2024-06-13 17:01:55.948 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-stdout
...
...
2024-06-13 17:01:56.052 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-csv
...
...
2024-06-13 17:01:56.691 +0900 [INFO] (0001:transaction): Start listing file with prefix [******]
2024-06-13 17:01:57.577 +0900 [INFO] (0001:transaction): Found total [1] files
2024-06-13 17:01:57.721 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=16 / output tasks 8 = input tasks 1 * 8
2024-06-13 17:01:57.759 +0900 [INFO] (0001:transaction): {done: 0 / 1, running: 0}
...
...
1,foo
2,bar
3,baz
2024-06-13 17:01:58.602 +0900 [INFO] (0001:transaction): {done: 1 / 1, running: 0}
2024-06-13 17:01:58.603 +0900 [INFO] (0001:transaction): Incremental job, setting last_path to [******.csv]
2024-06-13 17:01:58.618 +0900 [INFO] (0001:transaction): Embulk system property "plugins.output.stdout" is not set.
2024-06-13 17:01:58.619 +0900 [INFO] (0001:transaction): Embulk system property "plugins.default.output.stdout" is not set.
2024-06-13 17:01:58.621 +0900 [INFO] (main): Committed.
2024-06-13 17:01:58.629 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"******.csv"},"out":{}}
```
We hope those installation methods will help you.

0 comments on commit 8c938df

Please sign in to comment.