-
Notifications
You must be signed in to change notification settings - Fork 1
Add option to use write_batch
#7
Comments
Hi @shriv, thanks for raising an issue! I tried to quickly look into the differences between write_batch() and write_table(), however i didn't really get a full understanding of if one is faster than the other, but they should at least produce the same results. I will have time during the weekend to look into it a bit more and do some testing, but if you want to do your own tests, you can fork the repo and just replace the code in the
I haven't tested it so I can't guarantee that it will work, but if it works and is indeed faster, then let me know or you can raise a PR with the modified code. Otherwise I will do some testing of my own during the weekend. |
Thanks @jkausti !! :-) I looked up |
Ok I get your point. I think the only option of writing to multiple files would be to use write_dataset(), however I'm not sure the ParquetWriter -class supports that and the implementation would look quite different. This might also be a limitation of how the Singer protocol inside Meltano is passing data between the tap and target. Are you able to test with other targets and to see if you experience similar slowness with them? |
Good suggestion @jkausti ! I'll do some tests with the local |
I did a couple of simple tests with Looking at the code of N.B I am on LAN with ~ 64.3Mbps upload speed and 180Mbps download speed.
Hope this benchmarking helps! If you are able to enable the batch file write option, that would be quite helpful! :-) Thanks! |
Hi there,
I'm a new user of meltano and singer style taps and targets! I'm currently working on POC pipeline to move data from an oracle database to s3 parquet files..! I really appreciate this s3 parquet version you have written. :-)
One issue at the moment is that I find that the pipeline is running rather slowly. I can't figure out where the bottleneck is. Is it possible to include a batch write feature that would use
write_batch()
instead ofwrite_table()
? If the pipeline is still slow, I'll then see if I can add a batch query feature in the tap itself..!Let me know if you're busy and I can try to add this myself - though it will probably take me much longer! :-)
Thanks a lot..!
The text was updated successfully, but these errors were encountered: