如何快速優化機器學習的模型參數
作者 | Thomas Ciha
譯者 | 劉旭坤
編輯 | Jane
出品 | AI科技大本營
【導讀】一般來說機器學習模型的優化沒什麼捷徑可循。用什麼架構,選擇什麼優化演算法和參數既取決於我們對數據集的理解,也要不斷地試錯和修正。所以快速構建和測試模型的能力對於項目的推進就顯得至關重要了。本文我們就來構建一條生產模型的流水線,幫助大家實現參數的快速優化。
對深度學習模型來說,有下面這幾個可控的參數:
隱藏層的個數
各層節點的數量
激活函數
優化演算法
學習效率
正則化的方法
正則化的參數
我們先把這些參數都寫到一個存儲模型參數信息的字典 model_info 中:
1model_info = {}
2model_info["Hidden layers"] = [100] *6
3model_info["Input size"] = og_one_hot.shape[1] -1
4model_info["Activations"] = ["relu"] *6
5model_info["Optimization"] ="adadelta"
6model_info["Learning rate"] =.005
7model_info["Batch size"] =32
8model_info["Preprocessing"] ="Standard"
9model_info["Lambda"] =
10model_2["Regularization"] ="l2"
11model_2["Reg param"] =0.0005
這裡我們想實現對數據集的二元分類,大家可以從下面的鏈接中下載CSV格式的數據文件。
https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
了解一個數據集最直觀的方法就是把數據用可視化的方法呈現出來,降維方法我用了 PCA 和 t-SNE,不過從下面圖片中看來,t-SNE 能實現數據的最大區分。(其實我個人認為處理數據用 scikit-learn 帶的 StandardScaler 就挺好)
接下來我們就可以用 model_info 中的參數來構建一個深度學習模型。下面這個 build_nn 函數根據輸入的 model_info 中的參數構建,並返回一個深度學習模型:
1defbuild_nn(model_info):
2"""
3This function builds and compiles a NN given a hash table of the model"s parameters.
4:param model_info:
5:return:
6"""
7
8try:
9ifmodel_info["Regularization"] =="l2":# if we"re using L2 regularization
10lambda_ = model_info["Reg param"]# get lambda parameter
11batch_norm, keep_prob =False,False# set other regularization tactics
12
13elifmodel_info["Regularization"] =="Batch norm":# batch normalization regularization
14lambda_ =
15batch_norm = model_info["Reg param"]# get param
16keep_prob =False
17ifbatch_normnotin["before","after"]:# ensure we have a valid reg param
18raiseValueError
19
20elifmodel_info["Regularization"] =="Dropout":# Dropout regularization
21lambda_, batch_norm =,False
22keep_prob = model_info["Reg param"]
23except:
24lambda_, batch_norm, keep_prob =,False,False# if no regularization is being used
25
26hidden, acts = model_info["Hidden layers"], model_info["Activations"]
27model = Sequential(name=model_info["Name"])
28model.add(InputLayer((model_info["Input size"],)))# create input layer
29first_hidden =True
30
31forlay, act, iinzip(hidden, acts, range(len(hidden))):# create all the hidden layers
32iflambda_ >:# if we"re doing L2 regularization
33ifnotfirst_hidden:
34model.add(Dense(lay, activation=act, W_regularizer=l2(lambda_), input_shape=(hidden[i -1],)))# add additional layers
35else:
36model.add(Dense(lay, activation=act, W_regularizer=l2(lambda_), input_shape=(model_info["Input size"],)))
37first_hidden =False
38else:# if we"re not regularizing
39ifnotfirst_hidden:
40model.add(Dense(lay, input_shape=(hidden[i-1], )))# add un-regularized layers
41else:
42model.add(Dense(lay, input_shape=(model_info["Input size"],)))# if its first layer, connect it to the input layer
43first_hidden =False
44
45ifbatch_norm =="before":
46model.add(BatchNormalization(input_shape=(lay,)))# add batch normalization layer
47
48model.add(Activation(act))# activation layer is part of the hidden layer
49
50ifbatch_norm =="after":
51model.add(BatchNormalization(input_shape=(lay,)))# add batch normalization layer
52
53ifkeep_prob:
54model.add(Dropout(keep_prob, input_shape=(lay,)))# dropout layer
55
56# --------- Adding Output Layer -------------
57model.add(Dense(1, input_shape=(hidden[-1], )))# add output layer
58ifbatch_norm =="before":# if we"re using batch norm regularization
59model.add(BatchNormalization(input_shape=(hidden[-1],)))
60model.add(Activation("sigmoid"))# apply output layer activation
61ifbatch_norm =="after":
62model.add(BatchNormalization(input_shape=(hidden[-1],)))# adding batch norm layer
63
64ifmodel_info["Optimization"] =="adagrad":# setting an optimization method
65opt = optimizers.Adagrad(lr = model_info["Learning rate"])
66elifmodel_info["Optimization"] =="rmsprop":
67opt = optimizers.RMSprop(lr = model_info["Learning rate"])
68elifmodel_info["Optimization"] =="adadelta":
69opt = optimizers.Adadelta()
70elifmodel_info["Optimization"] =="adamax":
71opt = optimizers.Adamax(lr = model_info["Learning rate"])
72else:
73opt = optimizers.Nadam(lr = model_info["Learning rate"])
74model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])# compile model
75
76returnmodel
有了這個 build_nn 函數我們就可以傳不同的 model_info 給它,從而快速創建模型。下面我用了五個不同的隱藏層數目來實驗不同模型架構的分類效果。
1defcreate_five_nns(input_size, hidden_size, act = None):
2"""
3Creates 5 neural networks to be used as a baseline in determining the influence model depth & width has on performance.
4:param input_size: input layer size
5:param hidden_size: list of hidden layer sizes
6:param act: activation function to use for each layer
7:return: list of model_info hash tables
8"""
9act = ["relu"]ifnotactelse[act]# default activation = "relu"
10nns = []# list of model info hash tables
11model_info = {}# hash tables storing model information
12model_info["Hidden layers"] = [hidden_size]
13model_info["Input size"] = input_size
14model_info["Activations"] = act
15model_info["Optimization"] ="adadelta"
16model_info["Learning rate"] =.005
17model_info["Batch size"] =32
18model_info["Preprocessing"] ="Standard"
19model_info2, model_info3, model_info4, model_info5 = model_info.copy(), model_info.copy(), model_info.copy(), model_info.copy()
20
21model_info["Name"] ="Shallow NN"# build shallow nn
22nns.append(model_info)
23
24model_info2["Hidden layers"] = [hidden_size] *3# build medium nn
25model_info2["Activations"] = act *3
26model_info2["Name"] ="Medium NN"
27nns.append(model_info2)
28
29model_info3["Hidden layers"] = [hidden_size] *6# build deep nn
30model_info3["Activations"] = act *6
31model_info3["Name"] ="Deep NN 1"
32nns.append(model_info3)
33
34model_info4["Hidden layers"] = [hidden_size] *11# build really deep nn
35model_info4["Activations"] = act *11
36model_info4["Name"] ="Deep NN 2"
37nns.append(model_info4)
38
39model_info5["Hidden layers"] = [hidden_size] *20# build realllllly deep nn
40model_info5["Activations"] = act *20
41model_info5["Name"] ="Deep NN 3"
42nns.append(model_info5)
43returnnns
可能是因為我們的數據比較非線性,我發現隱藏層的數量和節點個數與測試的結果成正比,隱藏層越多效果越好。這裡每組參數構建出的模型我都用了五折交叉驗證。五折交叉驗證簡單說就是說把數據集分成五份,四份用來訓練模型,一份用來測試模型。這樣輪換測試五次,五份中每一份都會當一次測試數據。然後我們取這五次測試結果的均值作為這個模型的測試結果。這裡我們測試了正確率和 AUC,測試結果如下圖:
如果嫌交叉驗證費時間,但是數據夠用的話,我們也可以像下面的代碼這樣直接把數據集分成訓練和測試兩個子數據集:
1defquick_nn_test(model_info, data_dict, save_path):
2model = build_nn(model_info)# use model info to build and compile a nn
3stop = EarlyStopping(patience=5, monitor="acc", verbose=1)# maintain a max accuracy for a sliding window of 5 epochs. If we cannot breach max accuracy after 15 epochs, cut model off and move on.
4tensorboard_path =save_path + model_info["Name"]# create path for tensorboard callback
5tensorboard = TensorBoard(log_dir=tensorboard_path, histogram_freq=, write_graph=True, write_images=True)# create tensorboard callback
6save_model = ModelCheckpoint(filepath= save_path + model_info["Name"] +"\"+ model_info["Name"] +"_saved_"+".h5")# save model after every epoch
7
8
9model.fit(data_dict["Training data"], data_dict["Training labels"], epochs=150,# fit model
10batch_size=model_info["Batch size"], callbacks=[save_model, stop, tensorboard])# evaluate train accuracy
11train_acc = model.evaluate(data_dict["Training data"], data_dict["Training labels"],
12batch_size=model_info["Batch size"], verbose =)
13test_acc = model.evaluate(data_dict["Test data"], data_dict["Test labels"],# evaluate test accuracy
14batch_size=model_info["Batch size"], verbose =)
15
16
17# Get Train AUC
18y_pred = model.predict(data_dict["Training data"]).ravel()# predict on training data
19fpr, tpr, thresholds = roc_curve(data_dict["Training labels"], y_pred)# compute fpr and tpr
20auc_train = auc(fpr, tpr)# compute AUC metric
21# Get Test AUC
22y_pred = model.predict(data_dict["Test data"]).ravel()# same as above with test data
23fpr, tpr, thresholds = roc_curve(data_dict["Test labels"], y_pred)# compute AUC
24auc_test = auc(fpr, tpr)
25
26
27returntrain_acc, test_acc, auc_train, auc_test
有的書上可能會講到用網格搜索來實現超參數的優化,但網格搜索其實就是窮舉法,現實中是很少能用到的。我們更常會用到的是優化思路:由粗到精,逐步收窄最優參數的範圍。
1"""This section of code allows us to create and test many neural networks and save the results of a quick
2test into a CSV file. Once that CSV file has been created, we will continue to add results onto the existing
3file."""
4
5rapid_testing_path ="YOUR PATH HERE"
6data_path ="YOUR DATA PATH"
7
8try:# try to load existing csv
9rapid_mlp_results = pd.read_csv(rapid_testing_path +"Results.csv")
10index = rapid_mlp_results.shape[1]
11except:# if no csv exists yet, create a DF
12rapid_mlp_results = pd.DataFrame(columns=["Model","Train Accuracy","Test Accuracy","Train AUC","Test AUC",
13"Preprocessing","Batch size","Learn Rate","Optimization","Activations",
14"Hidden layers","Regularization"])
15index =
16
17og_one_hot = np.array(pd.read_csv(data_path))# load one hot data
18
19model_info = {}# create model_info dicts for all the models we want to test
20model_info["Hidden layers"] = [100] *6# specifies the number of hidden units per layer
21model_info["Input size"] = og_one_hot.shape[1] -1# input data size
22model_info["Activations"] = ["relu"] *6# activation function for each layer
23model_info["Optimization"] ="adadelta"# optimization method
24model_info["Learning rate"] =.005# learning rate for optimization method
25model_info["Batch size"] =32
26model_info["Preprocessing"] ="Standard"# specifies the preprocessing method to be used
27
28model_0 = model_info.copy()# create model 0
29model_0["Name"] ="Model0"
30
31model_1 = model_info.copy()# create model 1
32model_1["Hidden layers"] = [110] *3
33model_1["Name"] ="Model1"
34
35model_2 = model_info.copy()# try best model so far with several regularization parameter values
36model_2["Hidden layers"] = [110] *6
37model_2["Name"] ="Model2"
38model_2["Regularization"] ="l2"
39model_2["Reg param"] =0.0005
40
41model_3 = model_info.copy()
42model_3["Hidden layers"] = [110] *6
43model_3["Name"] ="Model3"
44model_3["Regularization"] ="l2"
45model_3["Reg param"] =0.05
46
47# .... create more models ....
48
49#-------------- REGULARIZATION OPTIONS -------------
50# L2 Regularization: Regularization: "l2", Reg param: lambda value
51# Dropout: Regularization: "Dropout", Reg param: keep_prob
52# Batch normalization: Regularization: "Batch norm", Reg param: "before" or "after"
53
54
55models = [model_0, model_1, model_2]# make a list of model_info hash tables
56
57column_list = ["Model","Train Accuracy","Test Accuracy","Train AUC","Test AUC","Preprocessing",
58"Batch size","Learn Rate","Optimization","Activations","Hidden layers",
59"Regularization","Reg Param"]
60
61formodelinmodels:# for each model_info in list of models to test, test model and record results
62train_data, labels = preprocess_data(og_one_hot, model["Preprocessing"],True)# preprocess raw data
63data_dict = split_data(0.9,, np.concatenate((train_data, labels.reshape(29999,1)), axis=1))# split data
64train_acc, test_acc, auc_train, auc_test = quick_nn_test(model, data_dict, save_path=rapid_testing_path)# quickly assess model
65
66try:
67reg = model["Regularization"]# set regularization parameters if given
68reg_param = model["Reg param"]
69except:
70reg ="None"# else set NULL params
71reg_param ="NA"
72
73val_lis = [model["Name"], train_acc[1], test_acc[1], auc_train, auc_test, model["Preprocessing"],
74model["Batch size"], model["Learning rate"], model["Optimization"], str(model["Activations"]),
75str(model["Hidden layers"]), reg, reg_param]
76
77df_dict = {}
78forcol, valinzip(column_list, val_lis):# create df dict to append to csv file
79df_dict[col] = val
80
81df = pd.DataFrame(df_dict, index=[index])
82rapid_mlp_results = rapid_mlp_results.append(df, ignore_index=False)
83rapid_mlp_results.to_csv(rapid_testing_path +"Results.csv", index=False)
我們先要有一個大致的優化方向和參數的大致範圍。這樣我們才能在範圍內進行參數的隨機抽樣,然後根據結果進一步收窄參數的範圍。下面的代碼就在生成模型(其實是用於生成模型的 model_info 字典)的過程中加入了一些隨機數:
1defgenerate_random_model():
2optimization_methods = ["adagrad","rmsprop","adadelta","adam","adamax","nadam"]# possible optimization methods
3activation_functions = ["sigmoid","relu","tanh"]# possible activation functions
4batch_sizes = [16,32,64,128,256,512]# possible batch sizes
5range_hidden_units = range(5,250)# range of possible hidden units
6model_info = {}# create hash table
7same_units = np.random.choice([,1], p=[1/5,4/5])# dictates whether all hidden layers will have the same number of units
8same_act_fun = np.random.choice([,1], p=[1/10,9/10])# will each hidden layer have the same activation function?
9really_deep = np.random.rand()
10range_layers = range(1,10)ifreally_deep
11num_layers = np.random.choice(range_layers, p=[.1,.2,.2,.2,.05,.05,.05,.1,.05])ifreally_deep
12model_info["Activations"] = [np.random.choice(activation_functions, p = [0.25,0.5,0.25])] * num_layersifsame_act_funelse[np.random.choice(activation_functions, p = [0.25,0.5,0.25])for_inrange(num_layers)]# choose activation functions
13model_info["Hidden layers"] = [np.random.choice(range_hidden_units)] * num_layersifsame_unitselse[np.random.choice(range_hidden_units)for_inrange(num_layers)]# create hidden layers
14model_info["Optimization"] = np.random.choice(optimization_methods)# choose an optimization method at random
15model_info["Batch size"] = np.random.choice(batch_sizes)# choose batch size
16model_info["Learning rate"] =10** (-4* np.random.rand())# choose a learning rate on a logarithmic scale
17model_info["Training threshold"] =0.5# set threshold for training
18returnmodel_info
到這裡將我們快速優化的思路總結成八個大字就是:自動建模,逐步收窄。自動建模是通過 build_nn 這個函數實現的,逐步收窄則是通過參數區間的判斷和隨機抽樣實現的。只要掌握好這個思路,相信大家都能實現對機器學習尤其是深度學習模型參數的快速優化。
https://towardsdatascience.com/how-to-rapidly-test-dozens-of-deep-learning-models-in-python-cb839b518531
【完】
2018 AI開發者大會
只講技術,拒絕空談
2018 AI開發者大會是一場由中美人工智慧技術高手聯袂打造的AI技術與產業的年度盛會!是一場以技術落地為導向的乾貨會議!大會設置了10場技術專題論壇,力邀15+矽谷實力講師團和80+AI領軍企業技術核心人物,多位一線經驗大咖帶你將AI從雲端落地。
即刻購票,可享5折優惠票價,10月12日開啟8折購票通道。
※《紐約客》:還原真實的扎克伯格
※Apache Kylin v2.5.0正式發布,開源分散式分析引擎
TAG:AI科技大本營 |