How To Cache A Large Machine Learning Model In Flask?

January 30, 2024 Post a Comment

Here is the situation I am facing with: I just wrote a Flask app that people can input the text review they want and my app would return the most similar reviews from our dataset.

Solution 1:

I will suggest to load the model once when you run your application. It can be done simply loading the model inside the main function. First time when you load your apps it will take some time, but each time when you call predict API it will be faster.

@app.route('/predict', methods=['POST', 'OPTIONS']) defpredict(tokenized):
     global model
     "do something"return jsonify(result)

 if __name__ == '__main__':
    model = load_model('/model/files/model.h5')
    app.run(host='0.0.0.0', port=5000)

Solution 2:

Obviously, the model loading operation is time intensive and blocking, and increases with the size of the updated model. Flask is blocking by default, unless you use something like a uwsgi load balancer on top of it and deploy n threads or processes to ensure some amount of horizontal scalability. Assuming you have a single instance of the Flask Application running, what makes sense is to load the updated model whenever you initialize the application (specifically before app.run()). You do maintain this as a global variable such that its context is available across the entire application instance. You could also add a /reload-model endpoint, which accepts the model name and updates the in memory reference to point to the updated model. Ofcourse you will not call this endpoint very frequently, but from time to time.

A fancier solution would be:

Write some sort of a scheduler that runs with the main Flask app (look at ApScheduler's BackgroundScheduler).
This scheduler has a job which periodically polls the directory that contains your trained model and checks if the model was changed recently (use something like https://pythonhosted.org/watchdog/), and if the file has changed, the scheduler just reloads and updates the model instance reference in the global reference variable.

Solution 3:

This is how I do it :

Create a class file for my model, and load everything in the def __init__(self,...):
I instantiate the class in the main flask server file in the global scope, where it becomes available for all my functions (I load it as a module from its own subdirectory)

Not sure it's the best way to architecture this, but it's easy and works very well in my case, where I usually only need to expose few routes for a data driven models, and not design complex softwares according to solid principles !

Solution 4:

Instead of loading a model in function each time you can initialize the model once in the script so that it remains in the memory and do not have to reload it. You can try out the above method first instead of using flask-cache.

Solution 5:

Here is my solution.

from flask import current_app
from flask import Flask

app = Flask(__name__)

@app.route("/")defindex():
    current_app.model.predict_something()

if __name__ == '__main__':
    with app.app_context():
        current_app.model = load_your_model()
    app.run()

Using global variables is not a good idea.

Python stackoverflow Examples