- Create a chat engine withLlamaIndexto answer question based on a set of pre-selected documents.
- LeverageStreamlitfor file uploads and interactive communication with the engine.
-
Clone the repo
-
You can run the docker-compose command to launch the app with docker containers, and then type a question in the chat interface.
docker-compose up --build
- Start Xinference cluster
xinference --log-level debug
- Launch an Embedding model and a LLM model, get their model_uids. For example,
launching
bge-large-zh
(embedding) andchatglm3
(LLM):
fromxinference.clientimportClient
client=Client("http://127.0.0.1:9997")
model_uid=client.launch_model(model_name="bge-large-zh",model_type="embedding")
model_uid2=client.launch_model(model_name="chatglm3",quantization=None,model_format='pytorch',model_size_in_billions=6)
print(model_uid,model_uid2)
- Modify
docker-compose.yml
using the above model_uids, for example:
version:"2"
services:
app:
build:.
network_mode:"host"
ports:
-"8501:8501"
volumes:
-./app:/app/app
environment:
-LLM=xinference
-EMBEDDING=xinference
-XINFERENCE_SERVER_ENDPOINT=http://127.0.0.1:9997
-XINFERENCE_EMBEDDING_MODEL_UID=<model_uid>
-XINFERENCE_LLM_MODEL_UID=<model_uid2>
-HISTORY_KEEP_CNT=10
- Deploy this application:
docker-compose up --build
In you want to run a local dev environment, the following command will let you test the application with OpenAI API.
poetry install
LLM=openai EMBEDDING=openai streamlit run app/main.py
- If you want to use OpenAI, check that you've created an.env file that contains your valid (and working) API keys.