カメラで画像認識してみる

・Googleは様々なプログラムを安価で提供してくれていますが、ここではGoogle Cloud Vision APIを使って画像認識するロボットを作ってみます。
・Vision APIを使う方法はREST方式とCloud SDKを使う方法がありますが、ここではより簡単に始められるREST方式を説明します。

事前の準備

・カメラ、スピーカー、マイクを使いますので、すべて正常に動作している状態にしておいてください。

Google Developers Consoleでの設定

・googleのアカウントを持っていない場合は作ってください。[Google アカウントの作成]
・ [Google Developers Console]にログインしてください。
・ここでプロジェクトを作成し、課金を有効化し、「Cloud Vision API」を有効化し、APIキーを生成します。
・日本語でとても分かりやすく説明しているページがありますので、こちらを参考に進めると良いのではないかと思います。[syncer.jp : Cloud Vision APIの使い方まとめ]
・ここで作成したAPIキーはこのあと使うのでコピーしておいてください。
・登録開始から６０日間は３万円まで無料で使えます。

APIキーの埋め込み

・先ほど取得したAPIキーをサンプルプログラムに書き込みます。
・サンプルプログラムをテキストエディターで開いてください。

$ sudo nano sample_vision1.py

1	$ sudo nano sample_vision1.py

・１５行目にある「API_KEY = ” 」のクォーテーションの間にAPIキーをペーストしてください。
・エディターを保存終了してください。

サンプルプログラムの実行

・ディスプレイは必須ではありませんが、カメラの視界に対象物をしっかり入れるためには、あったほうがよいです。

$ python sample_vision1.py

1	$ python sample_vision1.py

・「これなあに？」「これなんだ？」「これは何？」などと話しかけると、ロボットが「どれどれ」と言うので、カメラの前に対象物を掲げてください。
・最大３つの解答候補が英語で発声および画面に文字表示されます。
・Ctrl+Cで終了させることができます。

ソースコード

#!/usr/bin/python
# -*- coding: utf-8 -*-
# demo Code for Raspberry Pi : Label Recognition of Google Vision API

from time import sleep
import subprocess
import picamera
import requests
import base64
import json
import bezelie

# Variables
GOOGLE_CLOUD_VISION_API_URL = 'https://vision.googleapis.com/v1/images:annotate?key='
API_KEY = ''  # Google Cloud Platform Consoleで登録したAPIキー
jpgFile = '/home/pi/Pictures/capture.jpg'  # キャプチャー画像ファイル

# Functions
def request_cloud_vison_api(image_base64):
    api_url = GOOGLE_CLOUD_VISION_API_URL + API_KEY
    req_body = json.dumps({
        'requests': [{
            'image': {
                'content': image_base64.decode('utf-8') # base64でencodeする。
            },
            'features': [{
                'type': 'LABEL_DETECTION',
#                'type': 'TEXT_DETECTION',
#                'type': 'LOGO_DETECTION',
                'maxResults': 3,
            }]
        }]
    })
    res = requests.post(api_url, data=req_body)
    return res.json()

# Setting
bez = bezelie.Control()               # べゼリー操作インスタンスの生成
bez.moveCenter()                      # サーボをセンタリング

# Main Loop
def main():
  try:
    print "カメラの前に何かをかざしてください"
    with picamera.PiCamera() as camera:
      camera.resolution = (640, 480)   # Change this number for your display
      camera.rotation = 180            # comment out if your screen upside down
      camera.start_preview()
      sleep(1)
      while True:
        print "--------------------"
        camera.stop_preview()
        camera.capture(jpgFile)
        sleep(0.1)
        with open(jpgFile, 'rb') as img:
          img_byte = img.read()
        img_base64 = base64.b64encode(img_byte)
        result = request_cloud_vison_api(img_base64)
        for i in range(3):
          try:
            answer = result['responses'][0]['labelAnnotations'][i]['description'].encode('utf-8')
            print (answer)
            subprocess.call('flite -voice "kal16" -t "'+ answer +'"', shell=True)
            # Other English Voices :kal awb_time kal16 awb rms slt
          except:
            print ("no answer")
        camera.start_preview()
        sleep (3)

  except KeyboardInterrupt:
    # CTRL+Cで終了
    print "  終了しました"

if __name__ == "__main__":
    main()

#!/usr/bin/python

# -*- coding: utf-8 -*-

# demo Code for Raspberry Pi : Label Recognition of Google Vision API

from time import sleep

import subprocess

import picamera

import requests

import base64

import json

import bezelie

# Variables

GOOGLE_CLOUD_VISION_API_URL = 'https://vision.googleapis.com/v1/images:annotate?key='

API_KEY = '' # Google Cloud Platform Consoleで登録したAPIキー

jpgFile = '/home/pi/Pictures/capture.jpg' # キャプチャー画像ファイル

# Functions

def request_cloud_vison_api(image_base64):

api_url = GOOGLE_CLOUD_VISION_API_URL + API_KEY

req_body = json.dumps({

'requests': [{

'image': {

'content': image_base64.decode('utf-8') # base64でencodeする。

'features': [{

'type': 'LABEL_DETECTION',

# 'type': 'TEXT_DETECTION',

# 'type': 'LOGO_DETECTION',

'maxResults': 3,

}]

})

res = requests.post(api_url, data=req_body)

return res.json()

# Setting

bez = bezelie.Control() # べゼリー操作インスタンスの生成

bez.moveCenter() # サーボをセンタリング

# Main Loop

def main():

try:

print "カメラの前に何かをかざしてください"

with picamera.PiCamera() as camera:

camera.resolution = (640, 480) # Change this number for your display

camera.rotation = 180 # comment out if your screen upside down

camera.start_preview()

sleep(1)

while True:

print "--------------------"

camera.stop_preview()

camera.capture(jpgFile)

sleep(0.1)

with open(jpgFile, 'rb') as img:

img_byte = img.read()

img_base64 = base64.b64encode(img_byte)

result = request_cloud_vison_api(img_base64)

for i in range(3):

try:

answer = result['responses'][0]['labelAnnotations'][i]['description'].encode('utf-8')

print (answer)

subprocess.call('flite -voice "kal16" -t "'+ answer +'"', shell=True)

# Other English Voices :kal awb_time kal16 awb rms slt

except:

print ("no answer")

camera.start_preview()

sleep (3)

except KeyboardInterrupt:

# CTRL+Cで終了

print " 終了しました"

if __name__ == "__main__":

main()

応用

・[公式ドキュメント（日本語）]
・このサンプルでは画像に映っている物のカテゴリー名を表示する「LABEL_DETECTION」という機能を使っていますが、４２行目あたりでコメントアウトしてある「TEXT_DETECTION」「LOGO_DETECTION」も、ほぼおなじプログラムで使うことができます。「TEXT_DETECTION」は文字認識（OCR）、「LOGO_DETECTION」はロゴ認識です。
・このサンプルでは認識結果を３つ言わせていますが、１つでいいやという場合は７２行目あたりにある「for i in range(3):」の数字を１にしてください。