all-hands_openhands-lm-7b-v0.1-GGUF

Maintained By
bartowski

OpenHands LM 7B GGUF Quantizations

PropertyValue
Original ModelOpenHands LM 7B v0.1
Quantization Authorbartowski
Size Range2.78GB - 15.24GB
FormatGGUF (llama.cpp compatible)

What is all-hands_openhands-lm-7b-v0.1-GGUF?

This is a comprehensive collection of GGUF quantizations for the OpenHands LM 7B model, offering various compression levels using advanced quantization techniques. The collection provides options ranging from high-precision BF16 format to highly compressed IQ2_M variants, allowing users to balance between model size and performance based on their hardware constraints.

Implementation Details

The quantizations were created using llama.cpp release b5010 with imatrix optimization. The collection features multiple quantization methods including standard K-quants (Q2-Q8) and newer I-quants (IQ2-IQ4), with special variants utilizing Q8_0 for embedding and output weights to maintain quality in critical model components.

  • Includes 24 different quantization variants
  • Implements online repacking for ARM and AVX CPU optimization
  • Supports both standard and advanced quantization techniques
  • Uses specific prompt format with im_start/im_end tokens

Core Capabilities

  • Multiple compression levels suitable for different hardware configurations
  • Optimized performance on both CPU and GPU systems
  • Special quantizations for enhanced ARM/AVX performance
  • Compatible with LM Studio and any llama.cpp-based project

Frequently Asked Questions

Q: What makes this model unique?

This collection stands out for its comprehensive range of quantization options, from high-quality Q8_0 to highly compressed IQ2_M formats, allowing users to find the perfect balance between model size and performance for their specific needs. The implementation of both K-quants and I-quants, along with special handling of embedding weights, provides optimal performance across different hardware configurations.

Q: What are the recommended use cases?

For most users, the Q4_K_M variant (4.68GB) is recommended as a balanced option. Users with limited RAM should consider IQ3_XXS (3.11GB) or IQ2_M (2.78GB). For maximum quality, the Q6_K_L variant (6.52GB) is recommended. GPU users should choose a variant with a file size 1-2GB smaller than their available VRAM.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.