In most cases, for training the model with the dataset we have is very time consuming and also processing hungry job which is costly task. To test in our development environment we have to do is test the trained model or use the trained model for in production without going for multiple training.
If you have done some ML project you would have understood, how time and processor consuming task it is even when done in GPUs. For a application to use the model and train each time the application runs is unacceptable, so we can save the current trained state of the model for later use without of retraining the model on the same dataset again and again.
We can accomplish this in python using some packages like
- Pickle (Python Object Serialization Library)
- Joblib (One of the Scikit-learn Method)
You might have heard this term somewhere when you go though ML articles or doing projects. This library is popular for Serialization(Pickling) and Marshalling (Unpickling). Pickling is the process of converting any Python object into a stream of bytes in hierarchy.Unpickling is process of converting the pickled stream of bytes to original python object following the object hierarchy.
Serialization (Pickling) import pickle pickle_file = 'string_list_pickle.pkl' names = ['apple', 'ball', 'cat'] store_pickle = open(pickle_file, 'wb') pickle.dump(names, store_pickle) store_pickle.close()
Marshalling (Unpickling) import pickle pickle_file = 'string_list_pickle.pkl' unpickling_list = open(pickle_file, 'r') names_list = pickle.load(unpickling_list) print ("Name in pickled list: ", names_list)
Ok then this is simple usage of how picking is done with Pickle.
We will now work with a ML model for classification. A Decision Tree classifier is good point to start.
import pickle import pandas from sklearn.cross_validation import train_test_split from sklearn.tree import DecisionTreeClassifier # load dataset #load_tic_tac_toe_dataset = pandas.read_csv( # "https://archive.ics.uci.edu/ml/machine-learning-databases/tic-tac-toe/tic-tac-toe.data", sep=',', header=None) load_balance_scale_dataset = pandas.read_csv( "https://archive.ics.uci.edu/ml/machine-learning-databases/balance-scale/balance-scale.data", sep=',', header=None) print "Dataset length: ", len(load_balance_scale_dataset) print "Dataset Shape: ", load_balance_scale_dataset.shape X = load_balance_scale_dataset.values[:, 1:5] Y = load_balance_scale_dataset.values[:, 0] X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=100) decision_tree_model = DecisionTreeClassifier(criterion="gini", random_state=100, max_depth=3, min_samples_leaf=5) decision_tree_model.fit(X_train, y_train) print ("Decision tree classifier: ", decision_tree_model) # dumping the model decision_tree_pkl = 'decision_tree_classifier.pkl' decision_tree_model_pkl = open(decision_tree_pkl, 'wb') pickle.dump(decision_tree_model, decision_tree_model_pkl) decision_tree_model_pkl.close() # loading the model decision_tree_model_pkl = open(decision_tree_pkl, 'rb') decision_tree_model = pickle.load(decision_tree_model_pkl) print ("Loaded model: ", decision_tree_model)
Its late night already and sleepy long before. But couldn’t help myself to write down this pickle from writing in this blog.
I will continue writing to classify balanced scale model using picked dataset also, classifier for TIC TAC TOE dataset (if you wondering what that load_tic_tac_toe_dataset variable meant) to classify if a board state is winning state for x or losing state for x.
Also, I haven’t forgotten about Joblib, oh no I haven’t. Wait for next post. 😉 ;P
Good night guys