SicariusSicariiStuff X-Ray Alpha GGUF
Property | Value |
---|---|
Original Model | X-Ray Alpha |
Author | bartowski |
Quantization Method | llama.cpp imatrix |
Size Range | 1.54GB - 7.77GB |
What is SicariusSicariiStuff_X-Ray_Alpha-GGUF?
This is a comprehensive collection of GGUF quantizations of the X-Ray_Alpha model, created using llama.cpp's imatrix quantization technique. The collection offers various compression levels to accommodate different hardware capabilities and performance requirements, ranging from full BF16 weights (7.77GB) to highly compressed IQ2_M format (1.54GB).
Implementation Details
The model uses a specific prompt format: <bos><start_of_turn>user {prompt}<end_of_turn> <start_of_turn>model <end_of_turn>. It's notable that the model doesn't support System prompts. The quantizations were created using llama.cpp release b4925.
- Multiple quantization options offering different quality-size tradeoffs
- Special versions with Q8_0 for embed and output weights
- Support for online repacking for ARM and AVX CPU inference
- Optimized versions for different hardware configurations
Core Capabilities
- High-quality compression with Q6_K_L and Q5_K variants
- Efficient memory usage with IQ3 and IQ4 variants
- Automatic weight repacking for ARM and AVX systems
- Flexible deployment options across different hardware configurations
Frequently Asked Questions
Q: What makes this model unique?
This model collection stands out for its comprehensive range of quantization options, allowing users to choose the perfect balance between model size and quality for their specific hardware constraints. The implementation of imatrix quantization and special handling of embedding/output weights provides optimal performance across different scenarios.
Q: What are the recommended use cases?
For most users, the Q4_K_M variant (2.49GB) is recommended as the default choice, offering good quality and reasonable size. For high-end systems, Q6_K_L (3.35GB) provides near-perfect quality, while users with limited RAM can opt for Q3_K_M (2.10GB) or IQ3_M (1.99GB) variants.