在Jupyter中安裝Python包

知識 12-09

摘要：對於任何想要學習編程語言的小夥伴來說，選擇編程的工具是非常重要的。在編程工具和Python庫中的連接中，一直是一個讓很多人頭疼的問題。

對於使用Jupyter notebook的戶來說，你會經常遇到下面的問題：

我安裝了軟體包X，現在我無法將其導入到notebook中。幫幫我！

這個問題幾乎是所有初學者第一個攔路虎，任何語言都是如此。今天我們就來說說Jupyter notebook如何解決這類問題。

從根本上來說，這個問題的根是Jupyter內核與Jupyter的shell分離的事實，換句話說，安裝程序與筆記本中默認使用的是不同的Python版本。在最簡單的情況下，這個問題不會出現，但是當調試代碼時，需要了解操作系統的複雜性、Python軟體包安裝的複雜性以及Jupyter本身的複雜性。

在了解了一些在線（A，B）和一些關於這個話題的討論，我決定在這裡深入討論這個問題。這篇文章將解決一些事情：

·首先，我為一般問題提供一個快速，簡單的答案，例如我如何安裝一個Python包，以便使用pip或conda與我的jupyter筆記本一起工作。

·其次，我將深入到Jupyter筆記本抽象是幹什麼的，如何將其與操作系統的複雜交互簡單化。

·第三，我將討論一些我在社區的想法，其中包括Jupyter，Pip和Conda開發人員可能考慮的一些變化，以減輕用戶的認知負擔。

本文將重點介紹兩種安裝Python軟體包的方法：pip和conda。

1.如何在Jupyter中安裝軟體包

Pip和conda

對於許多用戶來說，pip和conda之間的選擇可能是一個令人困惑的選擇，我總結了兩者之間的本質區別在於：

Pip可以在任何環境下安裝python軟體包。
conda在conda環境中安裝任何軟體包。

·如果您使用Anacondaconda安裝Python ，則使用安裝Python軟體包。如果conda告訴你所需的軟體包不存在，那麼你必須使用pip。

即使你在短期內可以解決問題，也可能會出現長期的問題。例如，如果pip install給你一個許可權錯誤，這可能意味著你正在試圖在系統中安裝/更新python軟體包，比如/usr/bin/python。這樣做會產生不好的後果，因為操作系統本身通常依賴於Python安裝中的特定版本。對於Python的日常使用，你應該使用虛擬環境或Anaconda把你的軟體包與系統Python隔離。

1.1：如何使用Conda在Jupyter中

如果您使用的是jupyter，並且想要使用conda安裝軟體包，則可能會使用!記號直接從Jupyter上運行conda作為shell命令：

# DON"T DO THIS!
!conda install --yes numpy
Fetching package metadata ...........
Solving package specifications: .
# All requested packages already installed.
# packages in environment at /Users/jakevdp/anaconda/envs/python3.6:
#
numpy 1.13.3 py36h2cdce51_0

我將在下面更全面地概述，如果您想從當前的jupyter中使用這些已安裝的軟體包。

這是一個在一般情況下出現的對話：

# Install a conda package in the current Jupyter kernel
import sys
!conda install --yes --prefix {sys.prefix} numpy
Fetching package metadata ...........
Solving package specifications: .
# All requested packages already installed.
# packages in environment at /Users/jakevdp/anaconda:
#
numpy 1.13.3 py36h2cdce51_0

這個方法使得conda在當前運行的Jupyter內核中安裝軟體包。

1.2：如何使用pip在Jupyter中

如果您使用的是Jupyter，並想安裝一個軟體包pip，您可能會傾向於直接運行pip：

# DON"T DO THIS
!pip install numpy
Requirement already satisfied: numpy in /Users/jakevdp/anaconda/envs/python3.6/lib/python3.6/site-packages

如果您想從當前的jupyter中使用這些已安裝的軟體包。

# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install numpy
Requirement already satisfied: numpy in /Users/jakevdp/anaconda/lib/python3.6/s

如果你想要在Jupyter中直接使用，指令應該是：

$ python -m pip install <package>

而不是：

$ pip install <package>

因為前者更明確地說明了軟體包的安裝位置（下面會詳細介紹）。

2.為什麼Jupyter的安裝如此混亂？

上述的方案應該在所有情況下都能正常工作，但為什麼還需要額外的方法？這是因為在Jupyter中，shell環境和Python可執行文件是斷開的。想要深入理解理解為什麼，你就必須要對以下的概念有了解：

您的操作系統如何查找可執行程序。
Python如何安裝和查找軟體包。
Jupyter如何決定使用哪個Python可執行文件。

注意：下面的討論假設操作系統是Linux，Unix，MacOSX。

2.1您的操作系統如何定位可執行文件？

當您正在使用的終端輸入如下命令python，jupyter，ipython，pip，conda，你的操作系統包含一個定義良好的機制，他可以找到可執行文件的名稱。

在Linux和Mac系統上，系統將首先檢查與命令匹配的別名，如果失敗，則引用$PATH
環境變數：

!echo $PATH
/Users/jakevdp/anaconda/envs/python3.6/bin:/Users/jakevdp/anaconda/envs/python3.

$PATH列出目錄，按順序，將搜索任何可執行文件：例如，如果我python在上面鍵入我的系統$PATH，它將首先查找/Users/jakevdp/anaconda/envs/python3.6/bin/python，如果不存在，它將查找/Users/jakevdp/anaconda/bin/python，依此類推。

2.2Python如何查找包

Python使用類似的機制來定位導入的包。Python在導入時搜索的路徑列表位於：

默認情況下，Python查找模塊的第一個地方是一個空路徑，即當前的工作目錄。如果沒有找到該模塊，則將它放在位置列表中，直到找到該模塊。您可以使用__path__導入的模塊的屬性找出哪個位置已被使用：

import numpy
numpy.__path__
["/Users/jakevdp/anaconda/lib/python3.6/site-packages/numpy"]

在大多數情況下，你安裝了Python包pip或conda將被放置在一個名為目錄site-packages。要認識到的重要一點是每個Python可執行文件都有自己的site-packages。這意味著當你安裝一個軟體包時，它與特定的python可執行文件相關聯，並且默認只能用於Python安裝。

我們可以列印sys.path每個可用python可執行文件的變數來看到這一點，使用Jupyter令人愉快的是將Python和bash命令混合在一個代碼塊中的功能：

paths = !type -a python
for path in set(paths):
path = path.split()[-1]
print(path)
!{path} -c "import sys; print(sys.path)"
print()

這裡的全部細節並不是特別重要，但是需要強調的是，每個Python可執行文件都有自己獨特的路徑，除非您修改sys.path，否則不能導入安裝在不同Python環境中的軟體包。

我將再次強調：Jupyter中的shell環境必須與啟動它的Python版本相匹配。

2.3： Jupyter如何執行代碼：Jupyter內核

下一個相關的問題是Jupyter如何選擇執行Python代碼，這使我們想到了Jupyter內核的概念。

Jupyter內核是指Jupyter在內執行代碼的一系列文件。對於Python內核，這將指向一個特定的Python版本，但Jupyter被設計得更通用：Jupyter有幾十個可用的內核，包括Python 2，Python 3，Julia，R，Ruby，Haskell，甚至C ++和Fortran。

如果您使用Jupyter，則可以隨時使用內核→選擇內核菜單項來更改內核。

要查看您的系統上可用的內核，可以在shell中運行以下命令：

!jupyter kernelspec list
Available kernels:
python3 /Users/jakevdp/anaconda/envs/python3.6/lib/python3.6/site-packages/ipykernel/resources
conda-root /Users/jakevdp/Library/Jupyter/kernels/conda-root
python2.7 /Users/jakevdp/Library/Jupyter/kernels/python2.7
python3.5 /Users/jakevdp/Library/Jupyter/kernels/python3.5
python3.6 /Users/jakevdp/Library/Jupyter/kernels/python3.6

這些列出的內核中的每一個都是一個包含名為kernel.json的文件的目錄，其中指定了內核應該使用哪種語言和可執行文件。例如：

!cat /Users/jakevdp/Library/Jupyter/kernels/conda-root/kernel.json
{
"argv": [
"/Users/jakevdp/anaconda/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "python (conda-root)",
"language": "python"
}

如果你想創建一個新的內核，可以使用jupyter ipykernel命令來完成。例如，我使用以下內容作為模板，為我的conda環境創建了上述內核：

$ source activate myenv
$ python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

3.一些建議

所以，綜上所述，安裝在Jupyter的包是從根本上說Jupyter的shell環境和Python的內核不匹配，這意味著你必須做的不是簡單地多了解pip install或conda install做事情的工作。

我有一些想法，其中一些可能是有用的：

3.1：Jupyter 的潛在策略

正如我所提到的，根本問題是Jupyter的shell環境和計算內核之間的不匹配。那麼，我們是否可以按照內核規範來強制二者匹配呢？

也許，這個github問題展示了一種在內核啟動時修改shell變數的方法。

基本上，在你的內核目錄中，你可以添加一個如下所示的腳本kernel-startup.sh（並確保你改變了許可權以便它是可執行的）：

#!/usr/bin/env bash
# activate anaconda env
source activate myenv
# this is the critical part, and should be at the end of your script:
exec python -m ipykernel $@

3.2新的Jupyter Magic函數

我們可以通過在Jupyter中引入%pip和%conda魔術功能來簡化用戶體驗，從而檢測當前的內核並使某些軟體包安裝在正確的位置。

PIP Magic

例如，下面是如何定義一個%pip在當前內核中工作的魔術函數：

from IPython.core.magic import register_line_magic
@register_line_magic
def pip(args):
"""Use pip from the current kernel"""
from pip import main
main(args.split())

請注意，Jupyter開發者Matthias Bussonnier已經在他的pip_magic倉庫中發布了基本的內容，所以你可以做的是：

$ python -m pip install pip_magic

Conda Magic

同樣，我們可以定義一個conda Magic，如果你輸入的話會做正確的事情%conda install XXX。這比pip Magic更有意義，因為它必須首先確認環境是conda兼容的，然後（與缺少有關的python -m conda install）必須調用一個子進程來執行相應的shell命令：

from IPython.core.magic import register_line_magic
import sys
import os
from subprocess import Popen, PIPE
def is_conda_environment():
"""Return True if the current Python executable is in a conda env"""
# TODO: make this work with Conda.exe in Windows
conda_exec = os.path.join(os.path.dirname(sys.executable), "conda")
conda_history = os.path.join(sys.prefix, "conda-meta", "history")
return os.path.exists(conda_exec) and os.path.exists(conda_history)
@register_line_magic
def conda(args):
"""Use conda from the current kernel"""
# TODO: make this work with Conda.exe in Windows
# TODO: fix string encoding to work with Python 2
if not is_conda_environment():
raise ValueError("The python kernel does not appear to be a conda environment. "
"Please use ``%pip install`` instead.")
conda_executable = os.path.join(os.path.dirname(sys.executable), "conda")
args = [conda_executable] + args.split()
# Add --prefix to point conda installation to the current environment
if args[1] in ["install", "update", "upgrade", "remove", "uninstall", "list"]:
if "-p" not in args and "--prefix" not in args:
args.insert(2, "--prefix")
args.insert(3, sys.prefix)
# Because the notebook does not allow us to respond "yes" during the
# installation, we need to insert --yes in the argument list for some commands
if args[1] in ["install", "update", "upgrade", "remove", "uninstall", "create"]:
if "-y" not in args and "--yes" not in args:
args.insert(2, "--yes")
# Call conda from command line with subprocess & send results to stdout & stderr
with Popen(args, stdout=PIPE, stderr=PIPE) as process:
# Read stdout character by character, as it includes real-time progress updates
for c in iter(lambda: process.stdout.read(1), b""):
sys.stdout.write(c.decode(sys.stdout.encoding))
# Read stderr line by line, because real-time does not matter
for line in iter(process.stderr.readline, b""):
sys.stderr.write(line.decode(sys.stderr.encoding))

在提出了今天可以使用的一些簡單解決方案之後，我詳細解釋了為什麼這些解決方案是必要的：歸結起來，在Jupyter中，內核與外殼斷開連接。內核環境可以在運行時更改，而shell環境是在筆記本啟動時確定的。

最後：對於創建Python數據科學生態系統基礎的Jupyter，conda，pip和相關工具的開發人員忠心的感謝。這篇文章寫在一個Jupyter筆記本裡面。您可以查看靜態版本在這裡或下載完整的在這裡。

阿里云云棲社區組織翻譯。

文章原標題《installing-python-packages-from-jupyter》，作者：Jake VanderPlas.

個人博客：http://jakevdp.github.io/pages/about.html ，Python Data Science Handbook的作者。

其博客地址，可以免費閱讀本書。

譯者：虎說八道，審閱：

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自 雲棲社區 的精彩文章:

TAG:雲棲社區 |