brunch

AI 9탄-5. PDF파일로 만드는RAG 챗봇-5/10

by Master Seo

<16> PDF파일로 만드는 RAG 챗봇



1

소스 다운로드


7장

https://github.com/chatgpt-kr/openai-api-tutorial/blob/main/ch07/ch07_PDF_CHATBOT.ipynb




2

# 가상환경 생성하기

command 창에서


C:\0ai\07-ai\ch07\>python -m venv ch07_env


c:\0ai\07-ai\ch07> cd ch07_env\Scripts


C:\0ai\07-ai\ch07\ch07_env\Scripts>activate.bat


(ch07_env) c:\0ai\07-ai\ch07\ch07_env\Scripts>




3

# VSCODE 로 풀더 열기



#크로마 DB 설치 실패

building 'hnswlib' extension error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ [end of output]




!pip install openai

!pip install langchain_openai

!pip install langchain

!pip install langchain-community

!pip install gradio


!pip install pypdf

!pip install pymupdf

!pip install chromadb

!pip install faiss-cpu

!pip install tiktoken




import os

from langchain_openai import ChatOpenAI

from langchain.prompts import PromptTemplate

from langchain.vectorstores import Chroma

from langchain.embeddings import OpenAIEmbeddings

from langchain.chains import RetrievalQA

from langchain.document_loaders import PyPDFLoader

import urllib.request

import gradio as gr




os.environ['OPENAI_API_KEY'] = "여러분들의 Key 값"



urllib.request.urlretrieve("https://github.com/chatgpt-kr/openai-api-tutorial/raw/main/ch07/2020_%EA%B2%BD%EC%A0%9C%EA%B8%88%EC%9C%B5%EC%9A%A9%EC%96%B4%20700%EC%84%A0_%EA%B2%8C%EC%8B%9C.pdf", filename="2020_경제금융용어 700선_게시.pdf")



loader = PyPDFLoader("/content/2020_경제금융용어 700선_게시.pdf") texts = loader.load_and_split()



print('문서의 수 :', len(texts))


# 0번 문서는 머리말

print(texts[0].page_content)




print(texts[5].page_content)



# 12번 문서까지는 목차

print(texts[12].page_content)


# 13번 문서부터는 금융 용어 설명

print(texts[13].page_content)




texts = texts[13:]

print('줄어든 texts의 길이 :', len(texts))



print('첫번째 문서 출력 :', texts[0])



print(texts[-1])





다음

https://brunch.co.kr/@topasvga/4160


keyword
매거진의 이전글AI 9탄-4. 랭체인,RAG 사용-4/10