資源 | 概率編程工具：TensorFlow Probability官方簡介

科技 04-23

選自

Medium

作者：

Josh Dillon、Mike Shwe、Dustin Tran

機器之心編譯

參與：

白妤昕、李澤南

在 2018 年 TensorFlow 開發者峰會上，谷歌發布了 TensorFlow Probability，這是一個概率編程工具包，機器學習研究人員和從業人員可以使用它快速可靠地構建最先進、複雜的硬體模型。

TensorFlow Probability 適用於以下需求：

希望建立一個生成數據模型，推理其隱藏進程。

需要量化預測中的不確定性，而不是預測單個值。

訓練集具有大量相對於數據點數量的特徵。

結構化數據（例如，使用分組，空間，圖表或語言語義）並且你想獲取其中重要信息的結構。
存有一個逆問題 - 請參考 TFDS"18 演講視頻（https://www.youtube.com/watch?v=Bb1_zlrjo1c）以重建測量中的融合等離子體。

TensorFlow Probability 可以解決這些問題。它繼承了 TensorFlow 的優勢，例如自動差異化，以及跨多種平台（CPU，GPU 和 TPU）性能拓展能力。

TensorFlow Probability 有哪些能力？

谷歌的機器學習概率工具為 TensorFlow 生態系統中的概率推理和統計分析提供模塊抽象。

TensorFlow Probability 的結構示意圖。概率編程工具箱為數據科學家和統計人員以及所有 TensorFlow 用戶提供便利。

第 0 層：TensorFlow。

數值運算。LinearOperator 藉助特殊結構（對角線，低秩等）進行高效計算，而不再藉助矩陣。它由 TensorFlow Probability 團隊構建和維護，現在已經是 TensorFlow 核心 tf.linalg 的一部分

第 1 層：統計構建模塊

分布（tf.contrib.distributions，tf.distributions）：包含大量概率分布和相關的統計數據，以及批量語義和廣播語義。

Bijectors（tf.contrib.distributions.bijectors）：可逆隨機變數的組合變換。Bijectors 提供了豐富的變換分布類別，從經典的例子（如對數正態分布）到複雜的深度學習模型（如 masked 自回歸流）。

第 2 層：模型構建

Edward2（tfp.edward2）：這是一種指定靈活的概率模型為程序的概率編程語言。

概率層（tfp.layers）：它們所代表的功能對神經網路層具有不確定性，擴展了 TensorFlow 圖層。

可訓練分布（tfp.trainable_distributions）：由單個張量參數化的概率分布，我們更容易建立輸出概率分布的神經網路。

第 3 層：概率推斷

馬爾可夫鏈蒙特卡羅方法（tfp.mcmc）：通過採樣近似積分的演算法。包括 Hamiltonian Monte Carlo（HMC 演算法），隨機過程 Metropolis-Hastings，以及構建自定義過渡內核的能力。

變分推理（tfp.vi）：通過優化來近似積分的演算法。

優化器（tfp.optimizer）：隨機優化方法，擴展 TensorFlow 優化器。包括隨機梯度 Langevin 動態。

蒙特卡羅（tfp.monte_carlo）：用於計算蒙特卡羅期望值的工具。

第 4 層：預製模型和推理（類似於 TensorFlow 的預製估算器）

貝葉斯結構時間序列（即將推出）：用於擬合時間序列模型的高級介面（即類似於 R 的 BSTS 包）。

廣義線性混合模型（即將推出）：用於擬合混合效應回歸模型的高級界面（即與 R 的 lme4 軟體包相似）。

TensorFlow Probability 團隊致力於通過最新的功能，持續代碼更新和錯誤修復來支持用戶和貢獻者。谷歌稱，該工具在未來會繼續添加端到端的示例和教程。

讓我們看看一些例子！

Edward2 的線性混合效應模型

線性混合效應模型是對數據中結構化關係進行建模的簡單方法。也稱為分級線性模型，它分享各組數據點之間的統計強度，以便改進對任何單個數據點的推論。

演示中考慮到 R 語言中流行的 lme4 包里的 InstEval 數據集，其中包含大學課程及其評估評級。使用 TensorFlow Probability，我們將模型指定為 Edward2 概率程序（tfp.edward2），該程序擴展了 Edward。下面的程序根據其生成過程來確定模型。

import

 tensorflow as
 tf
from
 tensorflow_probability import
 edward2 as
 ed
def
 model
(features)
:

  # Set up fixed effects and other parameters.

  intercept = tf.get_variable("intercept"
, [])
  service_effects = tf.get_variable("service_effects"
, [])
  student_stddev_unconstrained = tf.get_variable(
      "student_stddev_pre"
, [])
  instructor_stddev_unconstrained = tf.get_variable(
      "instructor_stddev_pre"
, [])
  # Set up random effects.

  *student_effects = ed.MultivariateNormalDiag(
      loc=tf.zeros(num_students),
      scale_identity_multiplier=tf.exp(
          student_stddev_unconstrained),
      name="student_effects"
)
  instructor_effects = ed.MultivariateNormalDiag(
      loc=tf.zeros(num_instructors),
      scale_identity_multiplier=tf.exp(
          instructor_stddev_unconstrained),
      name="instructor_effects"
)*
  # Set up likelihood given fixed and random effects.

  *ratings = ed.Normal(
      loc=(service_effects * features["service"
] +
           tf.gather(student_effects, features["students"
]) +
           tf.gather(instructor_effects, features["instructors"
]) +
           intercept),
      scale=1.
,
      name="ratings"
)*
return
 ratings

該模型將「服務」、「學生」和「教師」的特徵字典作為輸入，它對每個元素描述單個課程的向量。模型會回歸這些輸入，假設潛在的隨機變數，並返回課程評估評分的分布。在此輸出上運行的 TensorFlow 會話將返回 yigediedai 一個迭代的評分。

你可以查看「線性混合效應模型」教程，詳細了解如何使用 tfp.mcmc.HamiltonianMonteCarlo 演算法訓練模型，以及如何使用後驗預測來探索和解釋模型。

高斯 Copulas 與 TFP Bijectors

Copula 是多變數概率分布，其中每個變數的邊際概率分布是均勻的。要構建使用 TFP 內在函數的 copula，可以使用 Bijectors 和 TransformedDistribution。這些抽象可以輕鬆創建複雜的分布，例如：

import

 tensorflow_probability as
 tfp
tfd = tfp.distributions
tfb = tfp.distributions.bijectors
# Example: Log-Normal Distribution

log_normal = tfd.TransformedDistribution(
    distribution=tfd.Normal(loc=0.
, scale=1.
),
    bijector=*tfb.Exp*())
# Example: Kumaraswamy Distribution

Kumaraswamy = tfd.TransformedDistribution(
    distribution=tfd.Uniform(low=0.
, high=1.
),
    bijector=*tfb.Kumaraswamy*(
        concentration1=2.
,
        concentration0=2.
))
# Example: Masked Autoregressive Flow

# https://arxiv.org/abs/1705.07057

shift_and_log_scale_fn = *tfb.masked_autoregressive_default_template*(
    hidden_layers=[512
, 512
],
    event_shape=[28
*28
])
maf = tfd.TransformedDistribution(
    distribution=tfd.Normal(loc=0.
, scale=1.
),     
    bijector=*tfb.MaskedAutoregressiveFlow*(
        shift_and_log_scale_fn=shift_and_log_scale_fn))

「高斯 Copula」創建了一些自定義的 Bijectors，然後展示了如何輕鬆構建多個 copula。有關分布的更多背景信息，請參閱「了解張量流量分布形狀」一節。其中介紹了如何管理抽樣，批量訓練和建模事件的形狀。

帶有 TFP 實用工具的變分自編碼器

變分自編碼器是一種機器學習模型，使用一個學習系統來表示一些低維空間中的數據，並且使用第二學習系統來將低維數據還原為原本的輸入值。由於 TensorFlow 支持自動微分，因此黑盒變分推理是一件輕而易舉的事！

示例：

import

 tensorflow as
 tf
import
 tensorflow_probability as
 tfp
# Assumes user supplies `likelihood`, `prior`, `surrogate_posterior`

# functions and that each returns a 

# tf.distribution.Distribution-like object.

elbo_loss = *tfp.vi.monte_carlo_csiszar_f_divergence(    *f=*tfp.vi.kl_reverse*,  # Equivalent to "Evidence Lower BOund"

    p_log_prob=lambda
 z: likelihood(z).log_prob(x) + prior().log_prob(z),
    q=surrogate_posterior(x),
    num_draws=1
)
train = tf.train.AdamOptimizer(
    learning_rate=0.01
).minimize(elbo_loss)

要查看更多詳細信息，請查看我們的變分自編碼器示例！

具有 TFP 概率層的貝葉斯神經網路

貝葉斯神經網路是一個在其權重和偏倚上具有先驗分布的神經網路。它通過這些先驗提供了更加先進的不確定性。貝葉斯神經網路也可以解釋為神經網路的無限集合：分配給每個神經網路配置的概率是有先驗根據的。

作為演示，考慮具有特徵（形狀為 32 × 32 × 3 的圖像）和標籤（值為 0 到 9）的 CIFAR-10 數據集。為了擬合神經網路，我們將使用變分推理，這是一套方法來逼近神經網路在權重和偏差上的後驗分布。也就是說，我們使用 TensorFlow Probabilistic Layers 模塊（tfp.layers）中最近發布的 Flipout 估計器。

import

 tensorflow as
 tf
import
 tensorflow_probability as
 tfp
def
 neural_net
(inputs)
:

  net = tf.reshape(inputs, [-1
, 32
, 32
, 3
])
  *net = tfp.layers.Convolution2DFlipout(filters=64
,
                                        kernel_size=5
,
                                        padding="SAME"
,
                                        activation=tf.nn.relu)(net)*
  net = tf.keras.layers.MaxPooling2D(pool_size=2
,
                                     strides=2
,
                                     padding="SAME"
)(net)
  net = tf.reshape(net, [-1
, 8
 * 8
 * 64
])
  *net = tfp.layers.DenseFlipout(units=10
)(net)*
  return
 net
# Build loss function for training.

logits = neural_net(features)
neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
    labels=labels, logits=logits)
kl = sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
loss = neg_log_likelihood + kl
train_op = tf.train.AdamOptimizer().minimize(loss)

neural_net 函數在輸入張量上組成神經網路層，並且針對概率卷積層和概率密集連接層執行隨機前向通道。該函數返回具有批大小 10 的形狀的輸出張量。張量的每一行代表每個數據點屬於 10 個類別之一的 logits（無約束概率值）。

我們需要為訓練建立損失函數，它包括兩個項：預期的負對數似然和 KL 分歧。我們可以通過蒙特卡羅接近預期的負的 log 似然函數。KL 分歧是通過作為層的參數的正規化術語添加的。

tfp.layers 也可以用於使用 tf.keras.Model 類的 eager execution。

class

 MNISTModel
(tf.keras.Model)
:

  
def
 __init__
(self)
:

    super(MNISTModel, self).__init__()
    *self.dense1 = tfp.layers.DenseFlipout(units=10
)
    self.dense2 = tfp.layers.DenseFlipout(units=10
)*
  
def
 call
(self, input)
:

    """Run the model."""

    result = self.dense1(input)
    result = self.dense2(result)
    # reuse variables from dense2 layer

    result = self.dense2(result)  
    return
 result
model = MNISTModel()