brunch

You can make anything
by writing

C.S.Lewis

by Master Seo Nov 24. 2024

AI 9탄-5. PDF파일로 만드는RAG 챗봇-5/10

<16> PDF파일로 만드는 RAG 챗봇



1

소스 다운로드


7장

https://github.com/chatgpt-kr/openai-api-tutorial/blob/main/ch07/ch07_PDF_CHATBOT.ipynb




2

# 가상환경 생성하기

command 창에서


C:\0ai\07-ai\ch07\>python -m venv ch07_env


c:\0ai\07-ai\ch07> cd ch07_env\Scripts


C:\0ai\07-ai\ch07\ch07_env\Scripts>activate.bat


(ch07_env) c:\0ai\07-ai\ch07\ch07_env\Scripts>




3

# VSCODE 로 풀더 열기



#크로마 DB 설치 실패

 building 'hnswlib' extension       error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/       [end of output]




!pip install openai

!pip install langchain_openai

!pip install langchain

!pip install langchain-community

!pip install gradio


!pip install pypdf

!pip install pymupdf

!pip install chromadb

!pip install faiss-cpu

!pip install tiktoken




import os

from langchain_openai import ChatOpenAI

from langchain.prompts import PromptTemplate

from langchain.vectorstores import Chroma

from langchain.embeddings import OpenAIEmbeddings

from langchain.chains import RetrievalQA

from langchain.document_loaders import PyPDFLoader

import urllib.request

import gradio as gr




os.environ['OPENAI_API_KEY'] =  "여러분들의 Key 값"



urllib.request.urlretrieve("https://github.com/chatgpt-kr/openai-api-tutorial/raw/main/ch07/2020_%EA%B2%BD%EC%A0%9C%EA%B8%88%EC%9C%B5%EC%9A%A9%EC%96%B4%20700%EC%84%A0_%EA%B2%8C%EC%8B%9C.pdf", filename="2020_경제금융용어 700선_게시.pdf") 



 loader = PyPDFLoader("/content/2020_경제금융용어 700선_게시.pdf") texts = loader.load_and_split()



print('문서의 수 :', len(texts))


# 0번 문서는 머리말 

print(texts[0].page_content)




print(texts[5].page_content)



# 12번 문서까지는 목차 

print(texts[12].page_content)      


# 13번 문서부터는 금융 용어 설명 

print(texts[13].page_content)      




texts = texts[13:]

print('줄어든 texts의 길이 :', len(texts))



print('첫번째 문서 출력 :', texts[0])



print(texts[-1])





다음

https://brunch.co.kr/@topasvga/4160


브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari