Indexes

This document outlines the process for scraping text from internet sources and document files, splitting text, and inserting documents into a vector database using the Prompt Engineers platform.

Step 1: Scraping Text

Select Source: Choose 'Files' or other sources like 'Text', 'YouTube', 'URLs', etc., from the 'Select Loader' dropdown.
Upload Files: Click 'Choose Files' to upload documents (supports .txt, .pdf, .csv).
View Document: If you're uploading a PDF, it can be previewed in the interface.

Step 2: Text Splitting

Select Splitter: From the 'Select Splitter' dropdown, choose a method for splitting text (e.g., 'Spacy', 'Python', 'Markdown').
Set Overlap: Configure 'Chunk Overlap' to determine how much overlap each chunk of text should have (useful for ensuring context in analysis).

Step 3: Upserting Documents

Collect Text: Click 'Collect' to start the scraping process. The system may display progress, such as "Splitting page 12 into chunks."
Review Chunks: Once the document is processed, review the extracted text chunks.
Upsert Documents:
- New Index: Select 'New', enter an 'Index Name', and click 'Create'.
- Existing Index: Select 'Existing', choose the index, and upsert the text.

Notes

Ensure the proper selection of the splitter and loader to match the document type and desired chunking strategy.
The 'Upsert' process allows for adding new documents or updating existing ones in the vector database.
Monitor the progress bar or messages to ensure that the scraping and upserting are proceeding correctly.

This workflow enables users to efficiently process and store large amounts of text data, preparing it for advanced search and retrieval in vector databases.

PreviousInteract with Assistant NextAPI

Last updated 1 year ago