6. Embedding Content#
Maeser currently works with locally saved FAISS vectorstores.
For complete information about creating vectorstores using LangChain, read the docs.
The example directory contains two scripts that show how an individual Wikipedia page could be vectorized:
create_byu_vectorstore.py
and create_maeser_vectorstore
.
The principles in these scripts could be applied to any content you would like to vectorize:
Preprocess the content and turn it into plaintext.
# Extract the text from the Karl G. Maeser Wikipedia page import wikipediaapi wiki_wiki = wikipediaapi.Wikipedia( user_agent='Maeser AI Example', language='en', extract_format=wikipediaapi.ExtractFormat.WIKI ) p_wiki = wiki_wiki.page("Karl G. Maeser") text = p_wiki.text
Chunk the data strategically.
# Split the text into chunks from langchain_openai import OpenAIEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) documents = text_splitter.create_documents([text])
Vectorize each chunk and save as FAISS database.
# Save the vectorized text to a local FAISS vectorstore from langchain_community.vectorstores import FAISS db = FAISS.from_documents(documents, OpenAIEmbeddings()) db.save_local("example/vectorstores/maeser")