I figured out how to do everything from a bunch of good guides.
- https://www.oreilly.com/library/view/hadoop-the-definitive/9780596521974/
- https://www.linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/
- https://dev.to/awwsmm/building-a-raspberry-pi-hadoop-spark-cluster-8b2
- https://medium.com/@jasonicarter/how-to-hadoop-at-home-with-raspberry-pi-part-1-3b71f1b8ac4e
- https://www.youtube.com/watch?v=KZwb-QTmxks&list=PLkz1SCf5iB4dw3jbRo0SYCk2urRESUA3v
- https://hadoop.apache.org/docs/stable/
- https://www.oreilly.com/library/view/spark-the-definitive/9781491912201/
- https://spark.apache.org/docs/latest/api/scala/index.html
- https://www.oreilly.com/library/view/hadoop-the-definitive/9780596521974/ (the Spark chapter)
- https://dev.to/awwsmm/building-a-raspberry-pi-hadoop-spark-cluster-8b2
- https://stackoverflow.com/questions/25836316/how-dag-works-under-the-covers-in-rdd/30685279#30685279
- https://www.youtube.com/watch?v=dmL0N3qfSc8
- http://www.russellspitzer.com/2017/09/01/Spark-Locality/
- https://blog.matthewrathbone.com/2016/09/01/a-beginners-guide-to-hadoop-storage-formats.html
- https://boristyukin.com/is-snappy-compressed-parquet-file-splittable/
- https://stackoverflow.com/questions/32382352/is-snappy-splittable-or-not-splittable
- https://nxtgen.com/hadoop-file-formats-when-and-what-to-use
- https://stackoverflow.com/questions/34243134/what-is-sequence-file-in-hadoop
- https://examples.javacodegeeks.com/enterprise-java/apache-hadoop/hadoop-sequence-file-example/
- https://sparkbyexamples.com/spark/read-write-avro-file-spark-dataframe/
- https://sparkbyexamples.com/spark/spark-read-write-dataframe-parquet-example/
- https://www.amazon.com.au/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 (the encoding chapter)
- https://www.oreilly.com/library/view/hadoop-the-definitive/9780596521974/ (the Avro and Parquet chapters)
- https://spark.apache.org/docs/latest/sql-data-sources-avro.html
- https://www.oreilly.com/library/view/hadoop-the-definitive/9780596521974/ (the Hive chapter)
- https://www.youtube.com/watch?v=2YUnuuzeXxs
- https://docs.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_ig_hive_metastore_configure.html
- https://data-flair.training/blogs/apache-hive-metastore/
- https://www.youtube.com/watch?v=dQueAnZSJRM
- https://stackoverflow.com/questions/30921515/main-difference-between-dynamic-and-static-partitioning-in-hive
- https://www.youtube.com/watch?v=5C3_HZY2Ek4
- https://www.qubole.com/blog/5-tips-for-efficient-hive-queries/