-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider mounting volumes outside of the container for data persistence #3
Comments
Persisting data from containers is a well-known and well-documented topic so we can assume those familiar with Docker would know how to do it, be it using volumes, bind mounts, or third-party storage drivers. However, the documentation of this repo should at least describe all the places (or declare them as volumes) where data of different services are being stored so that users know where to mount drives for persistence. Also, a simple bind mount example command will not hurt either. That said, I am not a big fan of monolithic containers that run too many services in a single container. This might work well when things are used as a portable desktop application, but for any serious scalable work setup every service should have its own container and orchestrated using a stack file (or docker-compose). |
Mastodon solves this with docker-compose - that works indeed quite nice, so we can "borrow" their setup. |
That would be useful indeed. Currently we have the following places:
pywb automatically indexes what it finds in the linked directory, so the pywb index would not need to be stored persistently. This is (currently) not the case for the elasticsearch index, but this could be changed relatively easily. It would then be enough to store the WARC files persistently, which would make sense to me. This would also allow you to just add WARC files you recorded with another system. What do you think? (As the different services are currently "talking" to each other by the file system, separating them into different services would take some effort. I agree that this is the way to go for scalable setups, but a scalable setup is probably not needed for a one-person archiver.) |
I think we don't need to put applications in deeper directories when running in containers because of the file system isolation. I would perhaps suggest to place all the individual apps directly under the Alternatively, we should be able to modify data directories of all these applications and place them under something like
If I know it correctly, PyWB indexes WARC files automatically that are not indexed already (i.e., their CDXJ records are missing). @ikreymer correct me if I am wrong here. If so, then persisting PyWB index is also important otherwise each time a container is started, CDXJ indexing needs to happen all over again. This might not be a big deal for small collections, but it will become important otherwise.
If a stack/compose file is provided, it can define necessary volumes and make them available in each service to share the file system to deal with this.
If the intent of this project is only for single-user small setups then this assumption is fair enough. |
Mastodon uses e.g. Postgres and Redis as external services, that run using their own images. In my
The See also the |
There are many ways to achieve this. We can ever declare volumes and networks as top-level objects in the compose file, then use those to deploy using docker compose for quick testing or built-in docker stack for a more robust long-running system. Shared volumes will allow file-based communication and shared networks would allow containers service to talk to reach to each other using service names. |
Docker allows one to mount directories outside of the container as volumes. Doing so would prevent the above scenario of the data disappearing when the container is gone.
The text was updated successfully, but these errors were encountered: