[mmdetection 에러] 그놈의 mmcv 버전..

mm시리즈가 편하긴 한데 불편한 것 중 하나가, 공식코드로 지원하지는 않는데 SOTA 모델들이 mm시리즈 기반으로 코드를 지원할떄이다.

사용가능한 mmcv 버전이 달라서 같은 mmdetection 기반인데 도커 호환이 안되더라.

[상황]

공식 mmdetection 코드(현재 시각 기준)를 사용하다가 CBNetV2를 돌리려는데 이게 공식지원이 안된다.

CBNetV2 코드도 mmdetection 기반이지만 구버전에서 구현을 해서 그런지 mmcv가 1.3.8~1.4.0밖에 지원이 안된다. (현재 최신버전은 1.6.1)

=> 그래서 도커 이미지 다시 만드는 중.

GitHub - VDIGPKU/CBNetV2

Contribute to VDIGPKU/CBNetV2 development by creating an account on GitHub.

github.com

apex is not installed

libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.

libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs6

libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs8

해결1

- apex git clone해다가 설치

apex 없다는 말은 이제 안나오지만 여전히 아래에 있는건 뜸.

해결 2

https://ndb796.tistory.com/744 이분 글 참고

torch와 torchvision cuda버전 맞춰주기 (다르면 안됨)

하지만 그냥 하면 에러뜬다. CUDA 10.1이 A100이랑 안맞아서.

NVIDIA A100-SXM-80GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.

The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.

If you want to use the NVIDIA A100-SXM-80GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

cuda 버전 올려줘도 되고... 나같은 경우에는 그냥 원래 쓰던 다른 도커 (torch 1.7.1)에다가 apex랑 mmcv==1.4.0만 설치해서 정상작동 시켰다.

아무튼 성공적으로 셋업된 도커.

불필요한것들 잔뜩 있을수 있음

pip list

하나 누락되었는데 apex. 얘도 리스트에 있음

apex 0.1 /raid/sghong/Detection/apex

nvcc --version

[2022.12.22] 텐서플로우... 버전... 고통... 디버깅... 못함... (0)	2022.12.22
[2022.08.18] 공부 내용 (wandb) (0)	2022.08.18
[삽질] 내친김에 mmsegmentation도 다시 설치한다 (0)	2022.04.11
[삽질] mmaction2 업뎃 왜 안되나 (feat. 그냥 갈아 엎겠습니다) (0)	2022.04.11
[github 에러] fatal: Authentication failed for XXX (0)	2022.02.28

딥러닝을 해보아요