Skip to content

Commit 2c34a94

Browse files
authored
Added CLI command to read schema from a file (#1244)
* Added CLI command to read schema from a file * Excluded symfony commands from static analysis
1 parent cf4e25c commit 2c34a94

File tree

15 files changed

+519
-35
lines changed

15 files changed

+519
-35
lines changed

composer.lock

Lines changed: 14 additions & 14 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/components/cli/docs.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Flow Command Line Interface
2+
3+
4+
## Installation
5+
6+
```
7+
composer require flow-php/cli
8+
```
9+
10+
In some cases it might make sense to install the CLI globally:
11+
12+
```
13+
composer global require flow-php/cli
14+
```
15+
16+
Now you can run the CLI using the `flow` command.
17+
18+
## Usage
19+
20+
```shell
21+
$ flow
22+
Flow PHP - Data processing framework
23+
24+
Usage:
25+
command [options] [arguments]
26+
27+
Options:
28+
-h, --help Display help for the given command. When no command is given display help for the list command
29+
-q, --quiet Do not output any message
30+
-V, --version Display this application version
31+
--ansi|--no-ansi Force (or disable --no-ansi) ANSI output
32+
-n, --no-interaction Do not ask any interactive question
33+
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
34+
35+
Available commands:
36+
completion Dump the shell completion script
37+
help Display help for a command
38+
list List commands
39+
run Execute ETL pipeline from a php/json file.
40+
file
41+
file:schema Read data schema from a file.
42+
parquet
43+
parquet:read [parquet:read:data] Read data from parquet file
44+
parquet:read:metadata Read metadata from parquet file
45+
```
46+
47+
### `file:schema`
48+
49+
```shell
50+
$ flow file:schema --help
51+
Description:
52+
Read data schema from a file.
53+
54+
Usage:
55+
file:schema [options] [--] <source>
56+
schema
57+
58+
Arguments:
59+
source Path to a file from which schema should be extracted.
60+
61+
Options:
62+
--pretty[=PRETTY] Pretty print schema [default: false]
63+
--table[=TABLE] Pretty schema as ascii table [default: false]
64+
--auto-cast[=AUTO-CAST] When set Flow will try to automatically cast values to more precise data types, for example datetime strings will be casted to datetime type [default: false]
65+
-h, --help Display help for the given command. When no command is given display help for the list command
66+
-q, --quiet Do not output any message
67+
-V, --version Display this application version
68+
--ansi|--no-ansi Force (or disable --no-ansi) ANSI output
69+
-n, --no-interaction Do not ask any interactive question
70+
-if, --input-format=INPUT-FORMAT Source file format. When not set file format is guessed from source file path extension
71+
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
72+
```
73+
74+
Example:
75+
76+
```shell
77+
$ flow schema orders.csv --table --auto-cast
78+
+------------+----------+----------+-------------+----------+
79+
| name | type | nullable | scalar_type | metadata |
80+
+------------+----------+----------+-------------+----------+
81+
| order_id | uuid | false | | [] |
82+
| created_at | datetime | false | | [] |
83+
| updated_at | datetime | false | | [] |
84+
| discount | scalar | true | string | [] |
85+
| address | json | false | | [] |
86+
| notes | json | false | | [] |
87+
| items | json | false | | [] |
88+
+------------+----------+----------+-------------+----------+
89+
7 rows
90+
```

phpstan.neon

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ parameters:
3333
- examples/topics
3434

3535
excludePaths:
36+
- src/cli/src/Flow/CLI/Command/*
3637
- src/core/etl/src/Flow/ETL/Formatter/ASCII/ASCIITable.php
3738
- src/core/etl/src/Flow/ETL/Sort/ExternalSort/RowsMinHeap.php
3839
- src/adapter/etl-adapter-elasticsearch/src/Flow/ETL/Adapter/Elasticsearch/ElasticsearchPHP/SearchResults.php

psalm.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828

2929
<file name="src/lib/parquet-viewer/src/Flow/ParquetViewer/Command/ReadMetadataCommand.php" />
3030

31+
<directory name="src/cli/src/Flow/CLI/Command" />
3132
<directory name="src/lib/parquet/src/Flow/Parquet/ThriftStream/" />
3233
<directory name="src/lib/parquet/src/Flow/Parquet/Thrift/" />
3334
<directory name="src/lib/parquet/src/Flow/Parquet/BinaryReader/" />

src/cli/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Flow Command Line Interface
2+
3+
Flow CLI is a powerful command-line interface that provides a wide range of tools and utilities for managing and interacting with various data sources.
4+
5+
> [!IMPORTANT]
6+
> This repository is a subtree split from our monorepo. If you'd like to contribute, please visit our main monorepo [flow-php/flow](https://github.com/flow-php/flow).
7+
8+
- 📜 [Documentation](https://github.com/flow-php/flow/blob/1.x/docs/cli/docs.md)
9+
- ➡️ [Installation](https://github.com/flow-php/flow/blob/1.x/docs/installation.md)
10+
- 🛠️ [Contributing](https://github.com/flow-php/flow/blob/1.x/CONTRIBUTING.md)

src/cli/flow

100644100755
Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
11
#!/usr/bin/env php
22
<?php declare(strict_types=1);
33

4+
use Flow\CLI\Command\ConvertCommand;
45
use Flow\CLI\Command\RunCommand;
6+
use Flow\CLI\Command\FileSchemaCommand;
57
use Flow\CLI\FlowVersion;
68
use Flow\ParquetViewer\Command\ReadDataCommand;
79
use Flow\ParquetViewer\Command\ReadMetadataCommand;
810
use Symfony\Component\Console\Application;
911

10-
if ('' !== Phar::running(false)) {
12+
$pharRuntime = ('' !== Phar::running(false));
13+
14+
if ($pharRuntime) {
1115
require 'phar://flow.phar/vendor/autoload.php';
1216
} else {
1317
if (\is_file(__DIR__ . '/vendor/autoload.php')) {
@@ -35,10 +39,11 @@ $_ENV['FLOW_PHAR_APP'] = 1;
3539

3640
\ini_set('memory_limit', -1);
3741

38-
$application = new Application('Flow-PHP - Extract Transform Load - Data processing framework', FlowVersion::getVersion());
42+
$application = new Application('Flow PHP - Data processing framework', $pharRuntime ? FlowVersion::getVersion() : 'UNKNOWN');
3943

40-
$application->add((new ReadDataCommand())->setName('parquet:read:data'));
44+
$application->add((new ReadDataCommand())->setName('parquet:read')->setAliases(['parquet:read:data']));
4145
$application->add((new ReadMetadataCommand())->setName('parquet:read:metadata'));
4246
$application->add((new RunCommand())->setName('run'));
47+
$application->add((new FileSchemaCommand())->setName('file:schema')->setAliases(['schema']));
4348

4449
$application->run();
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Flow\CLI\Command;
6+
7+
use function Flow\ETL\Adapter\CSV\from_csv;
8+
use function Flow\ETL\Adapter\JSON\from_json;
9+
use function Flow\ETL\Adapter\Parquet\from_parquet;
10+
use function Flow\ETL\Adapter\XML\from_xml;
11+
use function Flow\ETL\DSL\{config_builder, df, from_array, ref, schema_to_json, to_output};
12+
use function Flow\Filesystem\DSL\path_real;
13+
use Flow\ETL\Config;
14+
use Flow\Filesystem\Path;
15+
use Symfony\Component\Console\Command\Command;
16+
use Symfony\Component\Console\Exception\InvalidArgumentException;
17+
use Symfony\Component\Console\Input\{InputArgument, InputInterface, InputOption};
18+
use Symfony\Component\Console\Output\OutputInterface;
19+
use Symfony\Component\Console\Style\SymfonyStyle;
20+
21+
final class FileSchemaCommand extends Command
22+
{
23+
private ?Config $flowConfig = null;
24+
25+
private ?string $inputFormat = null;
26+
27+
private ?Path $sourcePath = null;
28+
29+
public function configure() : void
30+
{
31+
$this
32+
->setName('file:schema')
33+
->setDescription('Read data schema from a file.')
34+
->addArgument('source', InputArgument::REQUIRED, 'Path to a file from which schema should be extracted.')
35+
->addOption('input-format', 'if', InputArgument::OPTIONAL, 'Source file format. When not set file format is guessed from source file path extension', null)
36+
->addOption('pretty', null, InputOption::VALUE_OPTIONAL, 'Pretty print schema', false)
37+
->addOption('table', null, InputOption::VALUE_OPTIONAL, 'Pretty schema as ascii table', false)
38+
->addOption('auto-cast', null, InputOption::VALUE_OPTIONAL, 'When set Flow will try to automatically cast values to more precise data types, for example datetime strings will be casted to datetime type', false);
39+
}
40+
41+
protected function execute(InputInterface $input, OutputInterface $output) : int
42+
{
43+
$style = new SymfonyStyle($input, $output);
44+
45+
$autoCast = ($input->getOption('auto-cast') !== false);
46+
47+
$df = df($this->flowConfig)
48+
->read(match ($this->inputFormat) {
49+
'csv' => from_csv($this->sourcePath),
50+
'json' => from_json($this->sourcePath),
51+
'xml' => from_xml($this->sourcePath),
52+
'parquet' => from_parquet($this->sourcePath),
53+
});
54+
55+
if ($autoCast) {
56+
$df->autoCast();
57+
}
58+
59+
$schema = $df->schema();
60+
61+
$prettyValue = $input->getOption('pretty');
62+
$prettyPrint = ($prettyValue !== false);
63+
64+
$tableValue = $input->getOption('table');
65+
$tablePrint = ($tableValue !== false);
66+
67+
if ($tablePrint) {
68+
ob_start();
69+
df()
70+
->read(from_array($schema->normalize()))
71+
->withEntry('type', ref('type')->unpack())
72+
->renameAll('type.', '')
73+
->rename('ref', 'name')
74+
->collect()
75+
->select('name', 'type', 'nullable', 'scalar_type', 'metadata')
76+
->write(to_output())
77+
->run();
78+
79+
$style->write(ob_get_clean());
80+
81+
return Command::SUCCESS;
82+
}
83+
84+
$style->writeln(schema_to_json($schema, $prettyPrint ? JSON_PRETTY_PRINT | JSON_THROW_ON_ERROR : JSON_THROW_ON_ERROR));
85+
86+
return Command::SUCCESS;
87+
}
88+
89+
protected function initialize(InputInterface $input, OutputInterface $output) : void
90+
{
91+
$this->flowConfig = config_builder()->build();
92+
93+
$source = (string) $input->getArgument('source');
94+
95+
$sourcePath = path_real($source);
96+
97+
$fs = $this->flowConfig->fstab()->for($sourcePath);
98+
99+
if (!$fs->status($sourcePath)) {
100+
throw new InvalidArgumentException(\sprintf('File "%s" does not exist.', $sourcePath->path()));
101+
}
102+
103+
$supportedFormats = ['csv', 'json', 'xml', 'parquet', 'txt'];
104+
105+
$inputFormat = \mb_strtolower($input->getOption('input-format') ?: $sourcePath->extension());
106+
107+
if (!\in_array($inputFormat, $supportedFormats, true)) {
108+
throw new InvalidArgumentException(\sprintf('File format "%s" is not supported. Input file format can be set with --input-format option', $inputFormat));
109+
}
110+
111+
$this->sourcePath = $sourcePath;
112+
$this->inputFormat = $inputFormat;
113+
}
114+
}

src/cli/src/Flow/CLI/Command/RunCommand.php

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,23 @@ public function configure() : void
1818
$this
1919
->setName('run')
2020
->setDescription('Execute ETL pipeline from a php/json file.')
21+
->setHelp(
22+
<<<'HELP'
23+
<info>input-file</info> argument must point to a valid php file that returns DataFrame instance.
24+
<comment>Make sure to not execute run() or any other trigger function.</comment>
25+
26+
<fg=blue>Example of pipeline.php:</>
27+
<?php
28+
return df()
29+
->read(from_array([
30+
['id' => 1, 'name' => 'User 01', 'active' => true],
31+
['id' => 2, 'name' => 'User 02', 'active' => false],
32+
['id' => 3, 'name' => 'User 03', 'active' => true],
33+
]))
34+
->collect()
35+
->write(to_output());
36+
HELP
37+
)
2138
->addArgument('input-file', InputArgument::REQUIRED, 'Path to a php/json with DataFrame definition.');
2239
}
2340

0 commit comments

Comments
 (0)