Commit Graph

26 Commits

Author SHA1 Message Date
Mathieu Croquelois
17a42e5877
Add BF16 to GGUF (#2877) 2025-05-19 00:06:23 -04:00
DenOfEquity
5e1dcd35a8
some extra lora support, inc. new glora (#2715)
* support new glora (via ComfyUI)
* support BFL FluxTools loras (mostly via ComfyUI)
* also support using loras (like Hyper, Turbo) with FluxTools models
2025-03-04 00:26:43 +00:00
Lucas Freire Sangoi
75120d02f3
Restore lines for DoRA TE keys fix (#2240) 2024-11-06 20:20:57 +00:00
layerdiffusion
44eb4ea837 Support T5&Clip Text Encoder LoRA from OneTrainer
requested by #1727
and some cleanups/licenses
PS: LoRA request must give download URL to at least one LoRA
2024-09-08 01:39:29 -07:00
layerdiffusion
a8a81d3d77 fix offline quant lora precision 2024-08-31 13:12:23 -07:00
layerdiffusion
3a9cf1f8e5 Revert partially "use safer codes" 2024-08-31 11:07:28 -07:00
layerdiffusion
70a555906a use safer codes 2024-08-31 10:55:19 -07:00
layerdiffusion
4c9380c46a Speed up quant model loading and inference ...
... based on 3 evidences:
1. torch.Tensor.view on one big tensor is slightly faster than calling torch.Tensor.to on multiple small tensors.
2. but torch.Tensor.to with dtype change is significantly slower than torch.Tensor.view
3. “baking” model on GPU is significantly faster than computing on CPU when model load.

mainly influence inference of Q8_0, Q4_0/1/K and loading of all quants
2024-08-30 00:49:05 -07:00
layerdiffusion
0abb6c4686 Second Attempt for #1502 2024-08-28 08:08:40 -07:00
layerdiffusion
25662974f8 try to test #1502 2024-08-27 18:42:00 -07:00
layerdiffusion
acf99dd74e fix old version of pytorch 2024-08-26 06:51:48 -07:00
layerdiffusion
82dfc2b15b Significantly speed up Q4_0, Q4_1, Q4_K
by precomputing all possible 4bit dequant into a lookup table and use pytorch indexing to get dequant, rather than really computing the bit operations.
This should give very similar performance to native CUDA kernels, while being LoRA friendly and more flexiable
2024-08-25 16:49:33 -07:00
layerdiffusion
e60bb1c96f Make Q4_K_S as fast as Q4_0
by baking the layer when model load
2024-08-25 15:02:54 -07:00
layerdiffusion
868f662eb6 fix 2024-08-25 14:44:01 -07:00
layerdiffusion
13d6f8ed90 revise GGUF by precomputing some parameters
rather than computing them in each diffusion iteration
2024-08-25 14:30:09 -07:00
layerdiffusion
8fd889dcad fix #1336 2024-08-20 08:04:09 -07:00
layerdiffusion
4e8ba14dd0 info 2024-08-19 05:13:28 -07:00
layerdiffusion
d38e560e42 Implement some rethinking about LoRA system
1. Add an option to allow users to use UNet in fp8/gguf but lora in fp16.
2. All FP16 loras do not need patch. Others will only patch again when lora weight change.
3. FP8 unet + fp16 lora are available (somewhat only available) in Forge now. This also solves some “LoRA too subtle” problems.
4. Significantly speed up all gguf models (in Async mode) by using independent thread (CUDA stream) to compute and dequant at the same time, even when low-bit weights are already on GPU.
5. View “online lora” as a module similar to ControlLoRA so that it is moved to GPU together with model when sampling, achieving significant speedup and perfect low VRAM management simultaneously.
2024-08-19 04:31:59 -07:00
layerdiffusion
e5f213c21e upload some GGUF supports 2024-08-19 01:09:50 -07:00
layerdiffusion
243952f364 wip qx_1 loras 2024-08-15 17:07:41 -07:00
layerdiffusion
1bd6cf0e0c Support LoRAs for Q8/Q5/Q4 GGUF Models
what a crazy night of math
2024-08-15 05:34:46 -07:00
layerdiffusion
fd0d25ba8a fix type hints 2024-08-15 03:08:25 -07:00
layerdiffusion
2690b654fd reimplement q8/q85/q4 and review and match official gguf 2024-08-15 02:41:15 -07:00
layerdiffusion
358277e7a0 remove unused files 2024-08-15 01:47:59 -07:00
layerdiffusion
3acb50c40e integrate llama3's GGUF 2024-08-15 01:45:29 -07:00
layerdiffusion
00f1cd36bd multiple lora implementation sources 2024-08-13 07:13:32 -07:00