brunch

You can make anything
by writing

C.S.Lewis

by 김범준 Jul 21. 2017

이미지 캡션 API

CloudSight, Clarifai

사실 처음 원했던 건 tensorflow로 show and tell을 구현하는 것이었는데.. GitHub에서 여러 코드를 봐봤지만 도무지 제대로 돌아가는 코드가 없어서 포기했다.. 그래서 다른 방법이 없을까 했는데, 그냥 머신러닝 학습할 필요 없이 API에 이미지 쏴주면 바로 결과를 받을 수 있는 방법이 있었다. 물론 머신러닝 공부하는 입장에서는 좋은 방법은 아니지만, 그냥 편의상..ㅎㅎ

흔히 MS Azure Computer Vision API 혹은 Google Cloud Vision API가 많이 쓰이는 편이다. Azure에서는 URL을 통해 이미지를 쏴주고 결과를 받는데, 왠지 모르게 local image를 쏴주는 방법에 대한 설명이 없고.. Google Cloud Vision API는 필자의 무료 기간이 끝나버렸고.. 그래서 다른 API가 없을까 했는데, 생각보다 이 분야의 API가 다양하고, 그에 따라 한번 국내에 잘 알려져 있지 않은 이미지 캡션 API를 2가지를 소개하고자 한다. 아래의 내용은 파이썬을 기준으로 작성되었다. 참고로 파이썬 제대로 배워본 적 없는 초짜라서 코드 공유하는 게 너무 부끄럽다..ㅋㅋ

1) CloudSight AI

이미지를 쏴주면 문장 형태로 결과를 얻을 수 있는 API이다. 무료 버전은 이미지 request를 500개까지 지원해서, 조금 아쉬운 느낌. 로그인해서 API 키를 받고, 설치는 pip install cloudsight으로 가능하다. API 문서는 이곳에서 확인 가능하며, GitHub 예제도 지원하고 있다. 필자가 썼던 코드를 공유하면 아래와 같다.

import cloudsight

import os

import csv

auth = cloudsight.SimpleAuth('{api-key}')

api = cloudsight.API(auth)

def test(each_file, falselist, directory):

print(each_file)

with open(directory+each_file, 'rb') as f:

response = api.image_request(f, directory+each_file, {

'image_request[locale]': 'en-US',

})

status = api.wait(response['token'], timeout=30)

return status

def allfiles(path):

res = []

for root, dirs, files in os.walk(path):

rootpath = os.path.join(os.path.abspath(path), root)

for file in files:

#filepath = os.path.join(rootpath, file)

#res.append(filepath)

res.append(file)

return res

falselist = []

totallist = []

directory = 'C:/Users/kmbmjn/Downloads/tokyo street/to100/' # 수정 필요

file_list = allfiles(directory)

num_image = len(file_list)

for i in range(0,num_image):

totallist.append(test(file_list[i], falselist, directory))

print(i)

with open (r'cloudsight_result.csv', 'w', newline='') as write_file:

write=csv.writer(write_file)

write.writerows([r] for r in total_conv)

결과를 잘 csv로 정리하면 이렇다.

2) Clarifai

이 API에서는 caption이라는 게 detection에 가까운 의미로 쓰인다. 이미지를 분석해서 20개 단어를 return 해준다. 뿐만 아니라 custum dataset에 대한 retrain이나, search도 지원하고 있다. 가입해서 API 키를 받고, 무료로 쓴다고 하면 한달에 5,000개 가능하다고 한다. 설치 pip install clarifai로 가능하고, API 문서는 이곳에서 확인 가능하다. GitHub 예제는 이곳에서 확인할 수 있다. 필자가 썼던 코드를 공유하면..

from clarifai.rest import ClarifaiApp

from clarifai.rest import Image as ClImage

import os

import csv

app = ClarifaiApp(api_key='{api-key}')

def test(each_file, falselist, directory):

print(each_file)

model = app.models.get('general-v1.3')

image = ClImage(file_obj=open(directory+each_file, 'rb'))

result = model.predict([image])

mylist = []

r2 = result['outputs'][0]['data']['concepts']

length = len(r2)

for i in range(0,length):

mylist.append(str(r2[i]['name']))

# print(result) # 첫번째 테스트용

return mylist

def allfiles(path):

res = []

for root, dirs, files in os.walk(path):

rootpath = os.path.join(os.path.abspath(path), root)

for file in files:

#filepath = os.path.join(rootpath, file)

#res.append(filepath)

res.append(file)

return res

falselist = []

totallist = []

directory = '/Users/kmbmjn/Downloads/tokyo street/to100/' # 수정 필요

file_list = allfiles(directory)

num_image = len(file_list)

for i in range(0,num_image):

mylist = test(file_list[i], falselist, directory)

totallist.append(mylist)

print(i)

with open("clarifai_result.csv", "wb") as f:

writer = csv.writer(f)

writer.writerows(totallist)

결과는 이렇다. 한번 자세히 보자.

- CloudSight: Tokyo Street LED screen over people walking inside building

- Clarifai: stock, commerce, shopping, business, indoors, illuminated, bar, city, horizontal plane, billboard, shop, market, restaurant, neon, people, competition, travel, nightlife, exhibition, tourist

- CloudSight: kanji script lighter tower

- Clarifai: gambling, tourism, outdoors, neon, design, traditional, building, light, business, tourist, lantern, art, street, theme, fun

- CloudSight: blonde haired woman in blue denim jacket, yellow-white-brown floral dress during daytime

- Clarifai: people, street, wear, city, woman, adult, festival, portrait, fashion, outdoors, parade, road, urban, shopping, man, costume, exhibition, celebration, strange, umbrella

두 API 모두 그럭저럭 괜찮은 묘사를 해주고 있다. CloudSight의 경우 문장으로서 직관적으로 이해하기 쉽게 설명해주고 있고, Clarifai의 경우 여러 키워드를 알맞게 추출하고 있는 것을 확인할 수 있을 것이다. 관리하기 힘든 수많은 이미지를 일일이 확인하기 힘들 때 이를 활용해서 의미 있는 분류를 만들어내는데 꽤 유용할 것으로 보인다.

브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari