CUDA error: device kernel image is invalid 原因与解决方法

今天一位群友在运行AI绘画软件“Stable Diffusion”的时候,结果报错类似如下:

RuntimeError: CUDA error: device kernel image is invalid

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

中文翻译

运行时错误:CUDA错误:设备内核映像无效

CUDA内核错误可能会在其他一些API调用中异步报告,因此下面的 stacktrace 可能不正确。

对于调试,请考虑传递 CUDA_LAUNCH_BLOCKING=1 。

使用“TORCH_USE_CUDA_DSA”进行编译以启用设备端断言。

CUDA error: device kernel image is invalid

 

原因

通过上面的报错,也能知道一个最主要的原因就是:CUDA错误:设备内核映像无效

 

后来,我又向这位朋友要到了整个“控制台”的代码,如下:

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
Installing requirements for Web UI

Launching Web UI with arguments: --medvram --theme dark --xformers --api --autolaunch
AUTOMATIC1111/stable-diffusion-webui packed by bilibili@秋葉aaaki 
本整合包完全免费,严禁倒卖。若您付费获得本软件请立刻举报商家。
[AddNet] Updating model hashes...
[AddNet] Updating model hashes...
Loading weights [7f96a1a9ca] from G:\SD\sd-webui-aki-v4\models\Stable-diffusion\anything-v5-PrtRE.safetensors
Creating model from config: G:\SD\sd-webui-aki-v4\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: G:\SD\sd-webui-aki-v4\models\VAE\animevae.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(1): EasyNegative
Model loaded in 71.2s (load weights from disk: 1.6s, create model: 1.1s, apply weights to model: 55.3s, apply half(): 5.6s, load VAE: 6.8s, hijack: 0.1s, load textual inversion embeddings: 0.5s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 98.8s (import torch: 4.3s, import gradio: 3.9s, import ldm: 1.8s, other imports: 4.8s, setup codeformer: 0.3s, load scripts: 3.1s, load SD checkpoint: 71.3s, create ui: 8.2s, gradio launch: 0.8s, scripts app_started_callback: 0.1s).
Error completing request
Arguments: ('task(q4qks2qdumkiupn)', '', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, 'MultiDiffusion', False, 10, 1, 1, 64, False, True, 1024, 1024, 96, 96, 48, 1, 'None', 2, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, False, True, True, False, 1536, 96, False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', <scripts.external_code.ControlNetUnit object at 0x0000012F44116620>, <scripts.external_code.ControlNetUnit object at 0x0000012F44116650>, <scripts.external_code.ControlNetUnit object at 0x0000012F441166E0>, <scripts.external_code.ControlNetUnit object at 0x0000012F44116770>, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0, None, False, None, False, None, False, None, False, 50) {}
Traceback (most recent call last):
  File "G:\SD\sd-webui-aki-v4\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "G:\SD\sd-webui-aki-v4\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "G:\SD\sd-webui-aki-v4\modules\processing.py", line 503, in process_images
    res = process_images_inner(p)
  File "G:\SD\sd-webui-aki-v4\modules\processing.py", line 642, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "G:\SD\sd-webui-aki-v4\modules\processing.py", line 587, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "G:\SD\sd-webui-aki-v4\modules\prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "G:\SD\sd-webui-aki-v4\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward
    return self.text_model(
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\transformers\models\clip\modeling_clip.py", line 708, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\transformers\models\clip\modeling_clip.py", line 223, in forward
    inputs_embeds = self.token_embedding(input_ids)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack.py", line 234, in forward
    inputs_embeds = self.wrapped(input_ids)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\sparse.py", line 162, in forward
    return F.embedding(
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device kernel image is invalid
提示:Python 运行时抛出了一个异常。请检查疑难解答页面。
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Error completing request
Arguments: ('task(12np73tygyxboil)', '猪', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, 'MultiDiffusion', False, 10, 1, 1, 64, False, True, 1024, 1024, 96, 96, 48, 1, 'None', 2, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, False, True, True, False, 1536, 96, False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', <scripts.external_code.ControlNetUnit object at 0x0000012F44116620>, <scripts.external_code.ControlNetUnit object at 0x0000012F44116650>, <scripts.external_code.ControlNetUnit object at 0x0000012F441166E0>, <scripts.external_code.ControlNetUnit object at 0x0000012F44116770>, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0, None, False, None, False, None, False, None, False, 50) {}
Traceback (most recent call last):
  File "G:\SD\sd-webui-aki-v4\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "G:\SD\sd-webui-aki-v4\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "G:\SD\sd-webui-aki-v4\modules\processing.py", line 503, in process_images
    res = process_images_inner(p)
  File "G:\SD\sd-webui-aki-v4\modules\processing.py", line 642, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "G:\SD\sd-webui-aki-v4\modules\processing.py", line 587, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "G:\SD\sd-webui-aki-v4\modules\prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "G:\SD\sd-webui-aki-v4\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward
    return self.text_model(
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\transformers\models\clip\modeling_clip.py", line 708, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\transformers\models\clip\modeling_clip.py", line 223, in forward
    inputs_embeds = self.token_embedding(input_ids)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\modules\sd_hijack.py", line 234, in forward
    inputs_embeds = self.wrapped(input_ids)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\modules\sparse.py", line 162, in forward
    return F.embedding(
  File "G:\SD\sd-webui-aki-v4\py310\lib\site-packages\torch\nn\functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device kernel image is invalid
提示:Python 运行时抛出了一个异常。请检查疑难解答页面。
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

 

 

解决方法

既然说是“CUDA错误:设备内核映像无效。”,那我们就去手工安装CUDA。

 

具体的操作步骤如下:

 

1、命令行运行命令:nvidia-smi,查看自己显卡支持的 CUDA版本,升级显卡驱动有可能会让你支持更高版本的 CUDA。nvidia-smi

 

2、然后前往英伟达 CUDA 官网,下载对应版本。

英伟达 CUDA 官方下载网址:https://developer.nvidia.com/cuda-downloads

英伟达CUDA各版本官方下载地址:https://developer.nvidia.com/cuda-toolkit-archive

注意:

你对应的版本号最高的版本,比如我的是12.0的,那就下12.0.1(这里最后的.1意思是,11.7版本的1号升级版),总之一句话,下载的版本比12.0低就行了。

英伟达 CUDA 官网,下载对应版本

 

3、选你自己的操作系统版本,选择离线安装包“exe [local]”。

(这张图忘记截了,从网上找了一张类似的,区别只是图中版本是11.7.1)

NVIDIA离线安装包

 

4、后面只需要安装即可!

 

如果你还有不懂的,具体操作请参考:CUDA下载与安装教程

付杰
  • ¥ 1999.9元
  • 市场价:8999元
  • ¥ 298.0元
  • 市场价:398.0元
  • ¥ 98.0元
  • 市场价:298.0元
  • ¥ 999.0元
  • 市场价:1599.0元

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: