Skip to content

Commit 46d5547

Browse files
authored
[http] Add indirect HTTP GET examples (#44)
* Add indirect examples * Bugfix * Dedup host in curl example * Improve examples based on review feedback * Fix formatting * Improve README * Improve READMEs
1 parent 260ca77 commit 46d5547

File tree

9 files changed

+299
-0
lines changed

9 files changed

+299
-0
lines changed

http/get_indirect/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,6 @@
2222
This directory contains examples of HTTP clients and servers that use a two-step sequence to retrieve Arrow data:
2323
1. The client sends a GET request to a server and receives a JSON response from the server containing one or more server URIs.
2424
2. The client sends GET requests to each of those URIs and receives a response from each server containing an Arrow IPC stream of record batches (exactly as in the [simple GET examples](https://github.com/apache/arrow-experiments/tree/main/http/get_simple)).
25+
26+
> [!IMPORTANT]
27+
> The structure of the JSON document in these examples is an illustration, not a recommendation. Developers should use JSON document structures appropriate to the needs of their applications.

http/get_indirect/curl/.gitignore

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
*.arrows
+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# HTTP GET Arrow Data: Indirect curl Client Example
21+
22+
This directory contains an example of a series of shell commands that use `curl` and `jq` to:
23+
1. Send a GET request to the server to get a JSON listing of the URIs of a set of `.arrows` files.
24+
2. Send GET requests to download each of the `.arrows` files from the server to files in the current directory.
+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
!/bin/sh
2+
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
20+
21+
# Use curl to get a JSON document containing URIs to
22+
# Arrow stream files, then use jq to extract the URIs
23+
uris=$(curl -s -S http://localhost:8008/ | jq -r '.arrow_stream_files[].uri')
24+
25+
# Use curl to download the files from the URIs in parallel
26+
if [ -n "$uris" ]; then
27+
curl --parallel --remote-name-all $(print $uris)
28+
fi

http/get_indirect/python/.gitignore

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
*.arrows
+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# HTTP GET Arrow Data: Indirect Python Client Example with Requests
21+
22+
This directory contains an example of an HTTP client implemented in Python using the [Requests](https://requests.readthedocs.io/) library. The client:
23+
1. Sends a GET request to the server to get a JSON listing of the URIs of available `.arrows` files.
24+
2. Sends GET requests to download each of the `.arrows` files from the server.
25+
3. Loads the contents of each file into an in-memory PyArrow Table.
26+
27+
To run this example, first start one of the indirect server examples in the parent directory, then:
28+
29+
```sh
30+
pip install requests pyarrow
31+
python client.py
32+
```
+70
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
import requests
19+
import json
20+
import os
21+
import pyarrow as pa
22+
23+
24+
HOST = "http://localhost:8008/"
25+
26+
JSON_FORMAT = "application/json"
27+
ARROW_STREAM_FORMAT = "application/vnd.apache.arrow.stream"
28+
29+
json_response = requests.get(HOST)
30+
31+
response_status = json_response.status_code
32+
if not response_status == 200:
33+
raise ValueError(f"Expected response status 200, got {response_status}")
34+
35+
content_type = json_response.headers.get("Content-Type", "")
36+
if not content_type.startswith(JSON_FORMAT):
37+
raise ValueError(f"Expected content type {JSON_FORMAT}, got {content_type}")
38+
39+
print("Downloaded JSON file listing.")
40+
41+
parsed_data = json_response.json()
42+
uris = [file["uri"] for file in parsed_data["arrow_stream_files"]]
43+
44+
if not all(uri.endswith(".arrows") for uri in uris):
45+
raise ValueError(f"Some listed files do not have extension '.arrows'")
46+
47+
print(f"Parsed JSON and found {len(uris)} Arrow stream files.")
48+
49+
tables = {}
50+
51+
for uri in uris:
52+
arrow_response = requests.get(uri)
53+
54+
response_status = arrow_response.status_code
55+
if not response_status == 200:
56+
raise ValueError(f"Expected response status 200, got {response_status}")
57+
58+
content_type = arrow_response.headers.get("Content-Type", "")
59+
if not content_type.startswith(ARROW_STREAM_FORMAT):
60+
raise ValueError(f"Expected content type {ARROW_STREAM_FORMAT}, got {content_type}")
61+
62+
filename = os.path.basename(uri)
63+
64+
print(f"Downloaded file '{filename}'.")
65+
66+
tablename = os.path.splitext(filename)[0]
67+
with pa.ipc.open_stream(arrow_response.content) as reader:
68+
tables[tablename] = reader.read_all()
69+
70+
print(f"Loaded into in-memory Arrow table '{tablename}'.")
+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# HTTP GET Arrow Data: Indirect Python Server Example
21+
22+
This directory contains an example of an HTTP server implemented in Python using the built-in [`http.server`](https://docs.python.org/3/library/http.server.html) module. The server:
23+
1. Listens for HTTP GET requests from clients.
24+
2. Upon receiving a GET request for the document root, serve a JSON document that lists the URIs of all the `.arrows` files in the current directory.
25+
3. Upon receiving a GET request for a specific `.arrows` file, serve that file.
26+
27+
To run this example, first copy two `.arrows` files from the `data` section of this repository into the current directory:
28+
29+
```sh
30+
cp ../../../../data/arrow-commits/arrow-commits.arrows .
31+
cp ../../../../data/rand-many-types/random.arrows .
32+
```
33+
34+
Then start the HTTP server:
35+
36+
```sh
37+
python server.py
38+
```
39+
40+
In this example, the JSON document listing the URIs of the `.arrows` files is structured as shown below. **This JSON structure is provided for example purposes only. It is not a recommendation.** Developers should use JSON document structures appropriate to the needs of their applications.
41+
42+
```json
43+
{
44+
"arrow_stream_files": [
45+
{
46+
"uri": "http://127.0.0.1:8008/random.arrows"
47+
},
48+
{
49+
"uri": "http://127.0.0.1:8008/arrow-commits.arrows"
50+
}
51+
]
52+
}
53+
```
+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
from http.server import SimpleHTTPRequestHandler, HTTPServer
19+
import json
20+
import os
21+
import mimetypes
22+
23+
mimetypes.add_type("application/vnd.apache.arrow.stream", ".arrows")
24+
25+
class MyServer(SimpleHTTPRequestHandler):
26+
def list_directory(self, path):
27+
host, port = self.server.server_address
28+
29+
try:
30+
file_paths = [
31+
f for f in os.listdir(path)
32+
if f.endswith(".arrows") and os.path.isfile(os.path.join(path, f))
33+
]
34+
except OSError:
35+
self.send_error(404, "No permission to list directory")
36+
return None
37+
38+
file_uris = [f"http://{host}:{port}{self.path}{f}" for f in file_paths]
39+
uris_doc = {"arrow_stream_files": [{"uri": f} for f in file_uris]}
40+
self.send_response(200)
41+
self.send_header("Content-Type", "application/json")
42+
self.end_headers()
43+
self.wfile.write(json.dumps(uris_doc, indent=4).encode("utf-8"))
44+
return None
45+
46+
server_address = ("localhost", 8008)
47+
try:
48+
httpd = HTTPServer(server_address, MyServer)
49+
print(f"Serving on {server_address[0]}:{server_address[1]}...")
50+
httpd.serve_forever()
51+
except KeyboardInterrupt:
52+
print("Shutting down server")
53+
httpd.socket.close()

0 commit comments

Comments
 (0)