GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
-
Updated
Oct 21, 2024 - C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
A programming framework for agentic AI 🤖
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
Bổn hạng mục chỉ tại phân hưởng đại mô hình tương quan kỹ thuật nguyên lý dĩ cập thật chiến kinh nghiệm ( đại mô hình công trình hóa, đại mô hình ứng dụng lạc địa )
Official inference library for Mistral models
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Standardized Serverless ML Inference Platform on Kubernetes
Sparsity-aware deep learning inference runtime for CPUs
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Code examples and resources for DBRX, a large language model developed by Databricks
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."