Amazon PollyをPythonから使ってみる

f:id:ueponx:20170831234755p:plain

前回のエントリーではAWS CLIからAmazon Pollyをコールしてみましたが、今度はpythonから呼び出してみたいと思います。

uepon.hatenadiary.com

Pollyに関しては各言語のドキュメントもサンプルもそろっているので行けると思います。.netからも触れるので自分の場合にはそっちからやればよかったなとは少し思いました。

【PollyのSDKのページ】

開発者用リソース - Amazon Polly | AWS

Amazon Polly（Python編）

こう書くとこのエントリー中にほかの言語も入りそうですが、Pythonのみですｗ。

インストール

github.com

Boto is the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2. Boto provides an easy to use, object-oriented API as well as low-level direct service access.

BotoはAWSのPythonSDKで、Lowレベルと同等のアクセスができるようになってるってことのようです。

一応、インストールの項目通りにやってみると…

PS C:\> python --version
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
PS C:\> pip -V
pip 9.0.1 from C:\Users\xxx\Anaconda3\lib\site-packages (python 3.6)
PS C:\> pip install boto
Requirement already satisfied: boto in c:\users\xxx\anaconda3\lib\site-packages

メッセージをみると既にAnaconda環境には入っているようなのですが、サンプルソース上でimportされているのってboto3じゃなかったっけ？と思い、スタートメニューから【Anaconda Prompt】を起動して、以下のようにインストールしてみました。

(C:\Users\xxx\Anaconda3) C:\Users\xxx>pip install boto3
Collecting boto3
  Downloading boto3-1.4.7-py2.py3-none-any.whl (128kB)
    100% |████████████████████████████████| 133kB 2.2MB/s
Collecting s3transfer<0.2.0,>=0.1.10 (from boto3)
  Downloading s3transfer-0.1.10-py2.py3-none-any.whl (54kB)
    100% |████████████████████████████████| 61kB ...
Collecting jmespath<1.0.0,>=0.7.1 (from boto3)
  Downloading jmespath-0.9.3-py2.py3-none-any.whl
Collecting botocore<1.8.0,>=1.7.0 (from boto3)
  Downloading botocore-1.7.0-py2.py3-none-any.whl (3.6MB)
    100% |████████████████████████████████| 3.6MB 225kB/s
Requirement already satisfied: docutils>=0.10 in c:\users\xxx\anaconda3\lib\site-packages (from botocore<1.8.0,>=1.7.0->boto3)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in c:\users\xxx\anaconda3\lib\site-packages (from botocore<1.8.0,>=1.7.0->boto3)
Requirement already satisfied: six>=1.5 in c:\users\xxx\anaconda3\lib\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.8.0,>=1.7.0->boto3)
Installing collected packages: jmespath, botocore, s3transfer, boto3
Successfully installed boto3-1.4.7 botocore-1.7.0 jmespath-0.9.3 s3transfer-0.1.10

予想通りboto3はインストールされてなかった…サンプルソースでは普通に使用しているのに。

先程のページにはこんな記述が、今後はこっちをみてねってことっぽい。

Boto3, the next version of Boto, is now stable and recommended for general use. It can be used side-by-side with Boto in the same project, so it is easy to start using Boto3 in your existing projects as well as new projects. Going forward, API updates and all new feature work will be focused on Boto3.

github.com

C:\Users\xxx> python
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from boto3 import Session
>>>

一応、Anaconda環境でないPowerShellのREPLでも動作したので大丈夫です。

使ってみる

ドキュメントはこちら↓

Boto 3 Documentation — Boto 3 Docs 1.4.7 documentation Amazon Polly のドキュメント

BotoはAWSサービス制御用のSDKなのでドキュメントもS3やEC2などの項目も入っています（というかそっちがメイン）。

上記のリンクからPollyの項目をみつけます。

Polly — Boto 3 Docs 1.4.7 documentation

公式ドキュメントをみると以下のような記載する旨が書かれています。

# Create a client using the credentials and region defined in the [adminuser]
# section of the AWS credentials file (~/.aws/credentials).
session = Session(profile_name="adminuser")
polly = session.client("polly")

ここでいう~/.aws/credentialsはaws configure実行時にいれた情報が入ります。credentialsは認証情報、同じディレクトリにある~/.aws/configはregion情報が含まれています。初期値はdefaultユーザになっているのでドキュメントに従うのであれば

【変更前】

session = Session(profile_name="adminuser")

【変更後】

session = Session(profile_name="default")

になるのですが、defalutの場合には省略できるようで、

polly = boto3.client("polly")

これでもいいのかなと思います。【参考】 Session — Boto 3 Docs 1.4.7 documentation

もし、リージョン設定が複数跨る場合には（例えば、同時使用するS3とリージョンが異なる場合など）

from boto3 import Session
...
session = Session(region_name="us-west-2")
polly = session.client("polly")

となります。引数で渡しても問題ないようです。大体の基本形はこんな感じになると思います。

from boto3 import Session
...
# 個人的にはリージョンは指定する方が無難かなと思います。
session = Session(region_name="us-west-2")
polly = session.client("polly")
try:
  # TTSの機能をここで呼び出す。CLIの引数はここで指定するイメージ
  response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3", VoiceId="Joanna")
except (BotoCoreError, ClientError) as error:
 ...

サンプルコードを眺めながら触ってこんな感じに編集しました。 mainの部分はなくてもいいかなと思います。

【ドキュメントサンプル写経版・改】

"""Getting Started Example for Python 2.7+/3.3+"""
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys
import subprocess
from tempfile import gettempdir
# Create a client using the credentials and region defined in the [adminuser]
# section of the AWS credentials file (~/.aws/credentials).
session = Session(region_name="us-west-2")
polly = session.client("polly")
try:
    # Request speech synthesis
    response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3", VoiceId="Joanna")
except (BotoCoreError, ClientError) as error:
    # The service returned an error, exit gracefully
    print(error)
    sys.exit(-1)
# Access the audio stream from the response
if "AudioStream" in response:
    # Note: Closing the stream is important as the service throttles on the
    # number of parallel connections. Here we are using contextlib.closing to
    # ensure the close method of the stream object will be called automatically
    # at the end of the with statement's scope.
    with closing(response["AudioStream"]) as stream:
        # output = os.path.join(gettempdir(), "speech.mp3") # サンプルではtmpフォルダになるけど、わかりにくい場所になるので下記に変更
        output = "speech.mp3"
        try:
            # Open a file for writing the output as a binary stream
            with open(output, "wb") as file:
                file.write(stream.read())
        except IOError as error:
            # Could not write to file, exit gracefully
            print(error)
            sys.exit(-1)
        print("create OK>>" + output)
else:
    # The response didn't contain audio data, exit gracefully
    print("Could not stream audio")
    sys.exit(-1)

ちょっと戸惑ったのはpsynthesize_speech()の戻り値がストリーム（AudioStream）になっているので、それをファイルに書き込む必要があるという点かなと思います。with closing()...の下りはC#とかでいうところのIDisposableインターフェイスを持ったオブジェクトのusing (FileStream fs = new FileStream(…)と同じような表現のようです。

あとサンプルではoutput = os.path.join(gettempdir(), "speech.mp3")のようにテンポラリ領域にファイルを入れるようにしているのですが。Windows10の場合にはかなりわかりにくいところに保存されます。C:\Users\【ユーザ名】\AppData\Local\Tempに格納されます。さすがにわかりにくいので今回はテンポラリは使用せずにカレントに保存するように変えています。

コメントがあるので冗長に感じますが、コメントを除いて簡略化するとこんな感じになります。

【シンプル版サンプル・日本語版】

from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys
import subprocess

session = Session(region_name="us-west-2")
polly = session.client("polly")
try:
    response = polly.synthesize_speech(Text="ひとっ走り付き合えよ", OutputFormat="mp3", VoiceId="Mizuki")
except (BotoCoreError, ClientError) as error:
    print(error)
    sys.exit(-1)
if "AudioStream" in response:
    with closing(response["AudioStream"]) as stream:
        output = "speech.mp3"
        try:
            with open(output, "wb") as file:
                file.write(stream.read())
        except IOError as error:
            print(error)
            sys.exit(-1)
        print("synthesize_speech OK ->>" + output)
else:
    print("Could not stream audio")
    sys.exit(-1)

そこそこわかりやすくなったような気がします。あとはmpg321などを外部コマンドで呼び出すか、pygameにmp3を渡せばそのまま音声出力ができるかなと思います。（後述）

【参考】 uepon.hatenadiary.com

おまけ【RaspberryPiでもPython経由でAmazon Polly】

RaspberryPiからも使ってみます。

インストール

$ pip install boto3
Collecting boto3
  Downloading boto3-1.4.7-py2.py3-none-any.whl (128kB)
    100% |????????????????????????????????| 133kB 1.0MB/s
Collecting botocore<1.8.0,>=1.7.0 (from boto3)
  Downloading botocore-1.7.1-py2.py3-none-any.whl (3.6MB)
    100% |????????????????????????????????| 3.6MB 40kB/s
Collecting jmespath<1.0.0,>=0.7.1 (from boto3)
  Using cached jmespath-0.9.3-py2.py3-none-any.whl
Collecting s3transfer<0.2.0,>=0.1.10 (from boto3)
  Using cached s3transfer-0.1.10-py2.py3-none-any.whl
Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.8.0,>=1.7.0->boto3)
  Using cached python_dateutil-2.6.1-py2.py3-none-any.whl
Collecting docutils>=0.10 (from botocore<1.8.0,>=1.7.0->boto3)
  Using cached docutils-0.14-py2-none-any.whl
Collecting futures<4.0.0,>=2.2.0; python_version == "2.6" or python_version == "2.7" (from s3transfer<0.2.0,>=0.1.10->boto3)
  Using cached futures-3.1.1-py2-none-any.whl
Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.8.0,>=1.7.0->boto3)
  Using cached six-1.10.0-py2.py3-none-any.whl
Installing collected packages: six, python-dateutil, docutils, jmespath, botocore, futures, s3transfer, boto3
Successfully installed boto3-1.4.7 botocore-1.7.1 docutils-0.14 futures-3.1.1 jmespath-0.9.3 python-dateutil-2.6.1 s3transfer-0.1.10 six-1.10.0

実行

$ python polly_simple.py
synthesize_speech OK ->>speech.mp3
$ ls speech.mp3
speech.mp3

ただし、このやり方はpython2.7で動作させているのでShebangとマジックコメントをつけないと日本語で死ぬとエラーがでると思います。下記のようなエラーが出た場合対応する必要があります。

$ python polly_simple.py
  File "polly_simple.py", line 11
SyntaxError: Non-ASCII character '\xe3' in file polly_simple.py on line 11, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

【python2なコード】

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys

session = Session(region_name="us-west-2")
polly = session.client("polly")
try:
    response = polly.synthesize_speech(Text="ひとっ走り付き合えよ", OutputFormat="mp3", VoiceId="Mizuki")
except (BotoCoreError, ClientError) as error:
    print(error)
    sys.exit(-1)
if "AudioStream" in response:
    with closing(response["AudioStream"]) as stream:
        output = "speech.mp3"
        try:
            with open(output, "wb") as file:
                file.write(stream.read())
        except IOError as error:
            print(error)
            sys.exit(-1)
        print("synthesize_speech OK ->>" + output)
else:
    print("Could not stream audio")
    sys.exit(-1)

python3系での実行

もしPython3系で動作させるならインストールはpipではなくpip3で行います。実行もpythonではなくpython3で実行します。

【インストール】

$ pip3 install boto3
Collecting boto3
  Using cached boto3-1.4.7-py2.py3-none-any.whl
Collecting s3transfer<0.2.0,>=0.1.10 (from boto3)
  Using cached s3transfer-0.1.10-py2.py3-none-any.whl
Collecting jmespath<1.0.0,>=0.7.1 (from boto3)
  Using cached jmespath-0.9.3-py2.py3-none-any.whl
Collecting botocore<1.8.0,>=1.7.0 (from boto3)
  Using cached botocore-1.7.1-py2.py3-none-any.whl
Collecting docutils>=0.10 (from botocore<1.8.0,>=1.7.0->boto3)
  Downloading docutils-0.14-py3-none-any.whl (543kB)
    100% |????????????????????????????????| 552kB 489kB/s
Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.8.0,>=1.7.0->boto3)
  Using cached python_dateutil-2.6.1-py2.py3-none-any.whl
Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.8.0,>=1.7.0->boto3)
  Using cached six-1.10.0-py2.py3-none-any.whl
Installing collected packages: docutils, jmespath, six, python-dateutil, botocore, s3transfer, boto3
Successfully installed boto3-1.4.7 botocore-1.7.1 docutils-0.14 jmespath-0.9.3 python-dateutil-2.6.1 s3transfer-0.1.10 six-1.10.0

$ python3 polly_simple3.py
synthesize_speech OK ->>speech.mp3

【Python3用のソース】…このソースではpygameを使って再生まで行っています。pygameではイベントハンドラ（pygame.event.get():）を使って再生中の状態をイベントで取得できますが、画面のない場合にはエラーとなってしまします（コンソールでは実行できない）。そのため、pygame.mixer.music.get_busy()を使って再生中であるか否かを判別しています。イベントが使用できない点は以下に記載があります。

Pygame handles all it’s event messaging through an event queue. The routines in this module help you manage that event queue. The input queue is heavily dependent on the pygame display module. If the display has not been initialized and a video mode not set, the event queue will not really work. pygame.event — Pygame v1.9.2 documentation

【Python3用のソース】

from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys
import pygame.mixer

session = Session(region_name="us-west-2")
polly = session.client("polly")
try:
    response = polly.synthesize_speech(Text="ひとっ走り付き合えよ", OutputFormat="mp3", VoiceId="Mizuki")
except (BotoCoreError, ClientError) as error:
    print(error)
    sys.exit(-1)
if "AudioStream" in response:
    with closing(response["AudioStream"]) as stream:
        output = "speech.mp3"
        try:
            with open(output, "wb") as file:
                file.write(stream.read())
        except IOError as error:
            print(error)
            sys.exit(-1)
        print("synthesize_speech OK ->>" + output)
else:
    print("Could not stream audio")
    sys.exit(-1)
pygame.init()
pygame.mixer.init()
pygame.mixer.music.load("speech.mp3")
pygame.mixer.music.play()
while pygame.mixer.music.get_busy() == True:
    continue
pygame.mixer.music.stop()
pygame.mixer.quit()
pygame.quit()

【実行】

from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys
import pygame.mixer

session = Session(region_name="us-west-2")
polly = session.client("polly")
try:
    response = polly.synthesize_speech(Text="ひとっ走り付き合えよ", OutputFormat="mp3", VoiceId="Mizuki")
except (BotoCoreError, ClientError) as error:
    print(error)
    sys.exit(-1)
if "AudioStream" in response:
    with closing(response["AudioStream"]) as stream:
        output = "speech.mp3"
        try:
            with open(output, "wb") as file:
                file.write(stream.read())
        except IOError as error:
            print(error)
            sys.exit(-1)
        print("synthesize_speech OK ->>" + output)
else:
    print("Could not stream audio")
    sys.exit(-1)
pygame.init()
pygame.mixer.init()
pygame.mixer.music.load("speech.mp3")
pygame.mixer.music.play()
while pygame.mixer.music.get_busy() == True:
    continue
pygame.mixer.music.stop()
pygame.mixer.quit()
pygame.quit()