Neural Magic
Pinned
Loading
Repositories
-
OmniQuant Public Forked from OpenGVLab/OmniQuant [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs. neuralmagic/OmniQuant’s past year of commit activity -
compressed-tensors Public A safetensors extension to efficiently store sparse quantized tensors on disk neuralmagic/compressed-tensors’s past year of commit activity -
nm-vllm-certs Public General Information, model certifications, and benchmarks for nm-vllm enterprise distributions neuralmagic/nm-vllm-certs’s past year of commit activity -
nm-vllm Public Forked from vllm-project/vllm A high-throughput and memory-efficient inference and serving engine for LLMs neuralmagic/nm-vllm’s past year of commit activity -
transformers Public Forked from huggingface/transformers 🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0. neuralmagic/transformers’s past year of commit activity -
quant_kernel_benchmarks Public Benchmarking code for running quantized kernels from vLLM and other libraries neuralmagic/quant_kernel_benchmarks’s past year of commit activity -
flash-attention Public Forked from vllm-project/flash-attention Fast and memory-efficient exact attention neuralmagic/flash-attention’s past year of commit activity -
lm-evaluation-harness Public Forked from EleutherAI/lm-evaluation-harness A framework for few-shot evaluation of language models. neuralmagic/lm-evaluation-harness’s past year of commit activity