You can use blob storage on your cloud provider of choice and retrieve those files via your fast api. Edit: Also with less than 5 gigs that may be fine for docker image to handle. I’ve used one before that was closer to 50G to hold NLP libraries. It’s obviously not ideal, but it can work.


Ahh I see, is it doable with aws free tier? as it only has so less memory wont the server crash trying to load the 4gb model file? Thanks you so much for responding


At the end of the day a 4gb model is 4gbs in size. Expect it to be about the same in memory. You can try sharing it into multiple models otherwise. That gets difficult though


This is a common problem. For a long time my team and I were just encoding the models into docker containers. That resulted in difficulty with fast releasing models. We then tried changing to a GCS syncing method. Where inside of tensorflow serving we used a GCS url to reference the model. This resulted in a different problem. We then had to download the model at each startup because it wasn't cached locally. This was problematic if outages happened because it's crash all of our infrastructure (5k req/s). Our final solution was to have a set of containers that coexisted. They share a volume mount. One pulls down the models locally. Another updates the configurations when they've been pulled and hotswaps them. That has been the most successful approach for us and has been working for a couple years now in production. You can see the code here: https://github.com/cjmcgraw/mlmodel-manager