代碼＋實戰：TensorFlow Estimator of Deep CTR——DeepFM/NFM/AFM/FNN/PNN

知識 02-18

AI 研習社按，本文作者 lambdaJi，本文首發於知乎，AI 研習社獲其授權轉載。

深度學習在 ctr 預估領域的應用越來越多，新的模型不斷冒出。從 ctr 預估問題看看 f(x) 設計—DNN 篇（https://zhuanlan.zhihu.com/p/28202287）整理了各模型之間的聯繫之後，一直在琢磨這些東西如何在工業界落地。經過幾個月的調研，發現目前存在的一些問題：

開源的實現基本都是學術界的人在搞，距離工業應用還有較大的鴻溝

模型實現大量調用底層 API，各版本實現千差萬別，代碼臃腫難懂，遷移成本較高

單機，放到工業場景下跑不動

針對存在的問題做了一些探索，摸索出一套可行方案，有以下特性：

讀數據採用 Dataset API，支持 parallel and prefetch 讀取

通過 Estimator model_fn 來實現 f(x)，遷移到其他演算法非常方便，只需要改寫 model_fn f(x) 部分

支持分散式以及單機多線程訓練

支持 export model，然後用 TensorFlow Serving 提供線上預測服務

按工業界的套路，完整的機器學習項目應該包含五個部分：特徵框架，訓練框架，服務框架，評估框架和監控框架，這裡只討論前三個框架。

特徵框架 -- logs in，samples out

實驗數據集用 criteo，特徵工程參考: https://github.com/PaddlePaddle/models/blob/develop/deep_fm/preprocess.py

#1 連續特徵剔除異常值/歸一化

#2 離散特徵剔掉低頻，然後統一編碼（特徵編碼需要保存下來，線上預測的時候要用到）

對大規模離散特徵建模是用 DNN 做 ctr 預估的優勢，paper 關注點大都放在 ID 類特徵如何做 embedding 上，至於連續特徵如何處理很少討論，大概有以下 3 種方式：

--不做 embedding

|1--concat[continuous, emb_vec] 做 fc

--做 embedding

|2--離散化之後 embedding

|3--類似 FM 二階部分, 統一做 embedding，離散特徵 val=1.0

為了模型設計上的簡單統一，採用第 3 種方式，感興趣的讀者可是試試前兩種的效果。

訓練框架 -- samples in，model out

目前實現了 DeepFM/wide_n_deep/NFM/AFM/FNN/PNN 幾個演算法。以 DeepFM 為例來看看如何使用 TensorFlow Estimator and Datasets API 來實現 input_fn and model_fn：

#1 1:0.5 2:0.03519 3:1 4:0.02567 7:0.03708 8:0.01705 9:0.06296 10:0.18185 11:0.02497 12:1 14:0.02565 15:0.03267 17:0.0247 18:0.03158 20:1 22:1 23:0.13169 24:0.02933 27:0.18159 31:0.0177 34:0.02888 38:1 51:1 63:1 132:1 164:1 236:1def input_fn(filenames, batch_size=32, num_epochs=1, perform_shuffle=False):

print("Parsing", filenames)

defdecode_libsvm(line):

columns = tf.string_split([line]," ")

labels = tf.string_to_number(columns.values[], out_type=tf.float32)

splits = tf.string_split(columns.values[1:],":")

id_vals = tf.reshape(splits.values,splits.dense_shape)

feat_ids, feat_vals = tf.split(id_vals,num_or_size_splits=2,axis=1)

feat_ids = tf.string_to_number(feat_ids, out_type=tf.int32)

feat_vals = tf.string_to_number(feat_vals, out_type=tf.float32)

return{"feat_ids": feat_ids,"feat_vals": feat_vals}, labels

# Extract lines from input files using the Dataset API, can pass one filename or filename list

dataset = tf.data.TextLineDataset(filenames).map(decode_libsvm, num_parallel_calls=10).prefetch(500000)# multi-thread pre-process then prefetch

# Randomizes input using a window of 256 elements (read into memory)

ifperform_shuffle:

dataset = dataset.shuffle(buffer_size=256)

# epochs from blending together.

dataset = dataset.repeat(num_epochs)

dataset = dataset.batch(batch_size)# Batch size to use

iterator = dataset.make_one_shot_iterator()

batch_features, batch_labels = iterator.get_next()

returnbatch_features, batch_labels

defmodel_fn(features, labels, mode, params):

"""Bulid Model function f(x) for Estimator."""

#------hyperparameters----

field_size = params["field_size"]

feature_size = params["feature_size"]

embedding_size = params["embedding_size"]

l2_reg = params["l2_reg"]

learning_rate = params["learning_rate"]

layers = map(int, params["deep_layers"].split(","))

dropout = map(float, params["dropout"].split(","))

#------bulid weights------

FM_B = tf.get_variable(name="fm_bias", shape=[1], initializer=tf.constant_initializer(0.0))

FM_W = tf.get_variable(name="fm_w", shape=[feature_size], initializer=tf.glorot_normal_initializer())

FM_V = tf.get_variable(name="fm_v", shape=[feature_size, embedding_size], initializer=tf.glorot_normal_initializer())

#------build feaure-------

feat_ids = features["feat_ids"]

feat_ids = tf.reshape(feat_ids,shape=[-1,field_size])

feat_vals = features["feat_vals"]

feat_vals = tf.reshape(feat_vals,shape=[-1,field_size])

#------build f(x)------

withtf.variable_scope("First-order"):

feat_wgts = tf.nn.embedding_lookup(FM_W, feat_ids)# None * F * 1

y_w = tf.reduce_sum(tf.multiply(feat_wgts, feat_vals),1)

withtf.variable_scope("Second-order"):

embeddings = tf.nn.embedding_lookup(FM_V, feat_ids)# None * F * K

feat_vals = tf.reshape(feat_vals, shape=[-1, field_size,1])

embeddings = tf.multiply(embeddings, feat_vals)#vij*xi

sum_square = tf.square(tf.reduce_sum(embeddings,1))

square_sum = tf.reduce_sum(tf.square(embeddings),1)

y_v =0.5*tf.reduce_sum(tf.subtract(sum_square, square_sum),1)# None * 1

withtf.variable_scope("Deep-part"):

ifFLAGS.batch_norm:

ifmode == tf.estimator.ModeKeys.TRAIN:

train_phase =True

else:

train_phase =False

deep_inputs = tf.reshape(embeddings,shape=[-1,field_size*embedding_size])# None * (F*K)

foriinrange(len(layers)):

#if FLAGS.batch_norm:

# deep_inputs = batch_norm_layer(deep_inputs, train_phase=train_phase, scope_bn="bn_%d" %i)

#normalizer_params.update({"scope": "bn_%d" %i})

deep_inputs = tf.contrib.layers.fully_connected(inputs=deep_inputs, num_outputs=layers[i], #normalizer_fn=normalizer_fn, normalizer_params=normalizer_params,

weights_regularizer=tf.contrib.layers.l2_regularizer(l2_reg), scope="mlp%d"% i)

ifFLAGS.batch_norm:

deep_inputs = batch_norm_layer(deep_inputs, train_phase=train_phase, scope_bn="bn_%d"%i)#放在RELU之後 https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md#bn----before-or-after-relu

ifmode == tf.estimator.ModeKeys.TRAIN:

deep_inputs = tf.nn.dropout(deep_inputs, keep_prob=dropout[i])#Apply Dropout after all BN layers and set dropout=0.8(drop_ratio=0.2)

#deep_inputs = tf.layers.dropout(inputs=deep_inputs, rate=dropout[i], training=mode == tf.estimator.ModeKeys.TRAIN)

y_deep = tf.contrib.layers.fully_connected(inputs=deep_inputs, num_outputs=1, activation_fn=tf.identity, weights_regularizer=tf.contrib.layers.l2_regularizer(l2_reg), scope="deep_out")

y_d = tf.reshape(y_deep,shape=[-1])

withtf.variable_scope("DeepFM-out"):

#y_bias = FM_B * tf.ones_like(labels, dtype=tf.float32) # None * 1 warning;這裡不能用label，否則調用predict/export函數會出錯，train/evaluate正常；初步判斷estimator做了優化，用不到label時不傳

y_bias = FM_B * tf.ones_like(y_d, dtype=tf.float32)# None * 1

y = y_bias + y_w + y_v + y_d

pred = tf.sigmoid(y)

predictions={"prob": pred}

export_outputs =

# Provide an estimator spec for `ModeKeys.PREDICT`

ifmode == tf.estimator.ModeKeys.PREDICT:

returntf.estimator.EstimatorSpec(mode=mode,predictions=predictions,export_outputs=export_outputs)

#------bulid loss------

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y, labels=labels)) + l2_reg * tf.nn.l2_loss(FM_W) + l2_reg * tf.nn.l2_loss(FM_V)

# Provide an estimator spec for `ModeKeys.EVAL`

eval_metric_ops = {

"auc": tf.metrics.auc(labels, pred)

}

ifmode == tf.estimator.ModeKeys.EVAL:

returntf.estimator.EstimatorSpec(mode=mode,predictions=predictions,loss=loss,eval_metric_ops=eval_metric_ops)

#------bulid optimizer------

ifFLAGS.optimizer =="Adam":

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.9, beta2=0.999, epsilon=1e-8)

elifFLAGS.optimizer =="Adagrad":

optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate, initial_accumulator_value=1e-8)

elifFLAGS.optimizer =="Momentum":

optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.95)

elifFLAGS.optimizer =="ftrl":

optimizer = tf.train.FtrlOptimizer(learning_rate)

train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())

# Provide an estimator spec for `ModeKeys.TRAIN` modes

ifmode == tf.estimator.ModeKeys.TRAIN:

returntf.estimator.EstimatorSpec(mode=mode,predictions=predictions,loss=loss,train_op=train_op)

封裝成 estimator 之後，調用非常簡單

#train

python DeepFM.py --task_type=train --learning_rate=0.0005--optimizer=Adam --num_epochs=1--batch_size=256--field_size=39--feature_size=117581--deep_layers=400,400,400--dropout=0.5,0.5,0.5--log_steps=1000--num_threads=8--model_dir=./model_ckpt/criteo/DeepFM/--data_dir=../../data/criteo/

#predict

python DeepFM.py --task_type=infer --learning_rate=0.0005--optimizer=Adam --num_epochs=1--batch_size=256--field_size=39--feature_size=117581--deep_layers=400,400,400--dropout=0.5,0.5,0.5--log_steps=1000--num_threads=8--model_dir=./model_ckpt/criteo/DeepFM/--data_dir=../../data/criteo/

完整代碼: lambdaji/tf_repos

https://github.com/lambdaji/tf_repos/tree/master/deep_ctr/Model_pipeline

服務框架 -- request in，pctr out

TensorFlow Serving 是一個用於機器學習模型 serving 的高性能開源庫。它可以將訓練好的機器學習模型部署到線上，使用 gRPC 作為介面接受外部調用。更加讓人眼前一亮的是，它支持模型熱更新與自動模型版本管理。這意味著一旦部署 TensorFlow Serving 後，你再也不需要為線上服務操心，只需要關心你的線下模型訓練。

首先要導出 TF-Serving 能識別的模型文件

python DeepFM.py --task_type=export--learning_rate=0.0005--optimizer=Adam --batch_size=256--field_size=39--feature_size=117581--deep_layers=400,400,400--dropout=0.5,0.5,0.5--log_steps=1000--num_threads=8--model_dir=./model_ckpt/criteo/DeepFM/--servable_model_dir=./servable_model/

默認以時間戳來管理版本，生成文件如下：

$ ls -lh servable_model/1517971230

|--saved_model.pb

|--variables

|--variables.data-00000-of-00001

|--variables.index

然後寫一個client發送請求，這裡用C++來寫

PredictRequest predictRequest;

PredictResponse response;

ClientContext context;

predictRequest.mutable_model_spec()->set_name(model_name);

predictRequest.mutable_model_spec()->set_signature_name(model_signature_name);//serving_default

google::protobuf::Map& inputs = *predictRequest.mutable_inputs();

//feature to tfrequest

std::vector ids_vec = {1,2,3,4,5,6,7,8,9,10,11,12,13,15,555,1078,17797,26190,26341,28570,35361,35613,35984,48424,51364,64053,65964,66206,71628,84088,84119,86889,88280,88283,100288,100300,102447,109932,111823};

std::vector vals_vec = {0.05,0.006633,0.05,,0.021594,0.008,0.15,0.04,0.362,0.1,0.2,,0.04,

1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};

tensorflow::TensorProto feat_ids;

for(uint32_t i =; i

feat_ids.add_int64_val(ids_vec[i]);

}

feat_ids.mutable_tensor_shape()->add_dim()->set_size(1);//batch_size

feat_ids.mutable_tensor_shape()->add_dim()->set_size(feat_ids.int64_val_size());

feat_ids.set_dtype(tensorflow::DataType::DT_INT64);

inputs["feat_ids"] = feat_ids;

tensorflow::TensorProto feat_vals;

for(uint32_t i =; i

feat_vals.add_float_val(vals_vec[i]);

}

feat_vals.mutable_tensor_shape()->add_dim()->set_size(1);//batch_size

feat_vals.mutable_tensor_shape()->add_dim()->set_size(feat_vals.float_val_size());//sample size

feat_vals.set_dtype(tensorflow::DataType::DT_FLOAT);

inputs["feat_vals"] = feat_vals;

Status status = _stub->Predict(&context, predictRequest, &response);

完整代碼: lambdaji/tf_repos

https://github.com/lambdaji/tf_repos/tree/master/deep_ctr/Serving_pipeline

生產環境對時耗和性能的要求較高，而 DNN 的計算量比 LR 的簡單查表操作大得多，往往需要在效果和性能之間做折中. 這個環節比較考驗工程能力, 下圖是 wide_n_deep model 放到線上環境的真實數據，可以看到：

截距部分15ms：對應解析請求包，查詢redis/tair，轉換特徵格式以及打log等

斜率部分0.5ms：一條樣本forward一次需要的時間

一個比較有意思的現象是：隨著進一步放量，平均時耗不升反降，懷疑 TF-Serving 內部做了 cache 類的優化。

Model Performance

本來打算調好參再放出來，但是自從把機器跑掛三次就放棄了：（

圖上跑出來的效果不好，可能有幾個原因：

--特徵工程沒做好（連續特徵不適合做embedding，負採樣，shuffle等等）

--模型設計有問題（不確定有沒有bug）

--調參，模型沒有收斂到一個足夠好的解

感興趣的小夥伴可以fork下來折騰折騰，做人肉層面的並行，比一個人閉門搞快得多。

項目地址：https://github.com/lambdaji/tf_repos

最後提前祝大家新年煉丹愉快！

參考資料：

https://github.com/wnzhang/deep-ctr

https://github.com/Atomu2014/product-nets

https://github.com/hexiangnan/attentional_factorization_machine

https://github.com/hexiangnan/neural_factorization_machine

https://github.com/ChenglongChen/tensorflow-DeepFM

https://zhuanlan.zhihu.com/p/32563337

https://zhuanlan.zhihu.com/p/28202287

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自 AI研習社 的精彩文章:

※2017 年關於 Python 案例的 Top45 文章
※谷歌 2018 技術實習生正式開放申請！還有這些 AI 職位虛左以待！

TAG:AI研習社 |