class documentation

A class for storing and querying document embeddings using FAISS and SentenceTransformer.

Class Method from_bytes Deserializes a VectorStore from bytes.
Method __init__ Initializes the VectorStore with optional documents.
Method query Query the vector store for the most similar document chunks.
Instance Variable documents Undocumented
Instance Variable ids Undocumented
Instance Variable metadatas Undocumented
Instance Variable model Undocumented
Instance Variable store Undocumented
Property as_bytes Serializes the vector store to bytes.
Property metas_json Returns a JSON string of ids, metas, and documents.
Method _setup Splits documents into chunks of maximum 1000 characters.
@classmethod
def from_bytes(cls, raw: bytes) -> VectorStore:

Deserializes a VectorStore from bytes. Args: raw (bytes): Serialized vector store. Returns: VectorStore: The deserialized instance.

def __init__(self, documents: list[dict[str, str]] = None):

Initializes the VectorStore with optional documents. Args: documents (list[dict[str, str]], optional): List of documents with 'title' and 'content'.

def query(self, query: str) -> list[dict]:

Query the vector store for the most similar document chunks. Args: query (str): The query string. Returns: list[dict]: List of results with document, metadata, and distance. Raises: ValueError: If the vector store is not initialized. RuntimeError: If the query fails.

documents =

Undocumented

ids =

Undocumented

metadatas =

Undocumented

model =

Undocumented

store =

Undocumented

@property
as_bytes: bytes =

Serializes the vector store to bytes. Returns: bytes: Serialized vector store.

@property
metas_json: str =

Returns a JSON string of ids, metas, and documents. Returns: str: JSON representation.

def _setup(self, documents: list[dict[str, str]]) -> tuple[faiss.IndexFlatL2, list[str], list[str], list[dict[str, str]]]:

Splits documents into chunks of maximum 1000 characters. For each chunk: create a unique ID string made of the document title and chunk index, and a metadata dict with the document title. Creates a FAISS index for the embeddings. Args: documents (list[dict[str, str]]): List of documents. Returns: tuple: (FAISS index, list of chunks, list of ids, list of metas)