Unexpectedly long `PodInitializing` state after `Started container notebook` – Binder

Hi all,

I am running into a problem where singleuser pod for one particular binderhub-built image takes forever to complete PodInitializing state (Message: Started container notebook), for example:

jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj8bc29h6   0/1     PodInitializing   0             54m

Eventually after a looong wait, it changes to running, but a latency of 1hr+ is not really acceptable. I am not running into this with other binder-built images. Any idea how to debug this problem to find the culprit? I’d be glad if you could help. Below are the logs and pod descriptions. Thank you!

Logs:

Error from server (BadRequest): container "notebook" in pod "jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4" is waiting to start: PodInitializing

Logs (-v 8):

0516 14:30:11.714967  770558 request.go:1181] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"notebook\" in pod \"jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4\" is waiting to start: PodInitializing","reason":"BadRequest","code":400}
I0516 14:30:11.715427  770558 helpers.go:219] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "container \"notebook\" in pod \"jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4\" is waiting to start: PodInitializing",
  "reason": "BadRequest",
  "code": 400
}]
F0516 14:30:11.715502  770558 helpers.go:118] Error from server (BadRequest): container "notebook" in pod "jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4" is waiting to start: PodInitializing

Describe pod:

Name:         jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4
Namespace:    binderhub
Priority:     0
Node:         neurolibre-node1/192.168.73.213
Start Time:   Tue, 16 May 2023 13:56:50 -0400
Labels:       app=jupyterhub
              chart=jupyterhub-1.1.2
              component=singleuser-server
              heritage=jupyterhub
              hub.jupyter.org/network-access-hub=true
              hub.jupyter.org/servername=
              hub.jupyter.org/username=simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4
              release=binderhub
Annotations:  hub.jupyter.org/username: simexp-fmriprep-noise-benchmark-j2h9ifm4
Status:       Pending
IP:           10.244.1.42
IPs:
  IP:  10.244.1.42
Init Containers:
  block-cloud-metadata:
    Container ID:  docker://f2df1a86f1cb342ce70ab408828dea61886cfa8a09ccbc040117daa5c58b97eb
    Image:         jupyterhub/k8s-network-tools:1.1.2
    Image ID:      docker-pullable://jupyterhub/k8s-network-tools@sha256:5238f5fc8281cb6e27ca22b5887fe6af97780a3696bf304fad35d207b2114c66
    Port:          <none>
    Host Port:     <none>
    Command:
      iptables
      -A
      OUTPUT
      -d
      169.254.169.254
      -j
      DROP
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 16 May 2023 13:56:56 -0400
      Finished:     Tue, 16 May 2023 13:56:56 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:         <none>
Containers:
  notebook:
    Container ID:  
    Image:         binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29
    Image ID:      
    Port:          8888/TCP
    Host Port:     0/TCP
    Args:
      python3
      -c
      import os
      import sys
      
      try:
          import jupyterlab
          major = int(jupyterlab.__version__.split(".", 1)[0])
      except Exception:
          have_lab = False
      else:
          have_lab = major >= 3
      
      if have_lab and "NotebookApp.default_url" not in " ".join(sys.argv):
          # if recent-enough lab is available, make it the default UI
          sys.argv.insert(1, "--NotebookApp.default_url=/lab/")
      
      # launch the notebook server
      os.execvp("jupyter-notebook", sys.argv)
      
      --ip=0.0.0.0
      --port=8888
      --NotebookApp.base_url=/jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/
      --NotebookApp.token=1TTJnFrDRLmqU1arwoNLkg
      --NotebookApp.trust_xheaders=True
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  8589934592
    Requests:
      cpu:     1
      memory:  4294967296
    Environment:
      BINDER_LAUNCH_HOST:             https://test.conp.cloud/
      BINDER_PERSISTENT_REQUEST:      v2/gh/SIMEXP/fmriprep-denoise-benchmark/26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      BINDER_REF_URL:                 https://github.com/SIMEXP/fmriprep-denoise-benchmark/tree/26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      BINDER_REPO_URL:                https://github.com/SIMEXP/fmriprep-denoise-benchmark
      BINDER_REQUEST:                 v2/gh/SIMEXP/fmriprep-denoise-benchmark/main
      CPU_GUARANTEE:                  1.0
      JPY_API_TOKEN:                  ead19dfcbebe4699bb5a04116fdd4a2b
      JUPYTERHUB_ACTIVITY_URL:        http://hub:8081/jupyter/hub/api/users/simexp-fmriprep-noise-benchmark-j2h9ifm4/activity
      JUPYTERHUB_ADMIN_ACCESS:        1
      JUPYTERHUB_API_TOKEN:           ead19dfcbebe4699bb5a04116fdd4a2b
      JUPYTERHUB_API_URL:             http://hub:8081/jupyter/hub/api
      JUPYTERHUB_BASE_URL:            /jupyter/
      JUPYTERHUB_CLIENT_ID:           jupyterhub-user-simexp-fmriprep-noise-benchmark-j2h9ifm4
      JUPYTERHUB_HOST:                
      JUPYTERHUB_OAUTH_CALLBACK_URL:  /jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/oauth_callback
      JUPYTERHUB_OAUTH_SCOPES:        ["access:servers!server=simexp-fmriprep-noise-benchmark-j2h9ifm4/", "access:servers!user=simexp-fmriprep-noise-benchmark-j2h9ifm4"]
      JUPYTERHUB_SERVER_NAME:         
      JUPYTERHUB_SERVICE_PREFIX:      /jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/
      JUPYTERHUB_SERVICE_URL:         http://0.0.0.0:8888/jupyter/user/simexp-fmriprep-noise-benchmark-j2h9ifm4/
      JUPYTERHUB_USER:                simexp-fmriprep-noise-benchmark-j2h9ifm4
      JUPYTER_IMAGE:                  binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      JUPYTER_IMAGE_SPEC:             binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29
      MEM_GUARANTEE:                  4294967296
      MEM_LIMIT:                      8589934592
    Mounts:
      /home/jovyan/data from shared-data (ro)
      /mnt/books from book-data (rw)
      /mnt/data from repo2data (rw)
      /usr/local/share/fill_submission_metadata.bash from files (rw,path="fill_submission_metadata")
      /usr/local/share/jb_build.bash from files (rw,path="jb_build")
      /usr/local/share/repo2data.bash from files (rw,path="repo2data")
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  files:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  singleuser
    Optional:    false
  shared-data:
    Type:          HostPath (bare host directory volume)
    Path:          /DATA
    HostPathType:  
  book-data:
    Type:          HostPath (bare host directory volume)
    Path:          /DATA/book-artifacts
    HostPathType:  
  repo2data:
    Type:          HostPath (bare host directory volume)
    Path:          /DATA
    HostPathType:  
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       hub.jupyter.org/dedicated=user:NoSchedule
                   hub.jupyter.org_dedicated=user:NoSchedule
                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From                      Message
  ----    ------     ----  ----                      -------
  Normal  Scheduled  24m   binderhub-user-scheduler  Successfully assigned binderhub/jupyter-simexp-2dfmriprep-2dnoise-2dbenchmark-2dj2h9ifm4 to neurolibre-node1
  Normal  Pulled     24m   kubelet                   Container image "jupyterhub/k8s-network-tools:1.1.2" already present on machine
  Normal  Created    24m   kubelet                   Created container block-cloud-metadata
  Normal  Started    24m   kubelet                   Started container block-cloud-metadata
  Normal  Pulled     24m   kubelet                   Container image "binder-registry.conp.cloud/binder-registry.conp.cloud/binder-simexp-2dfmriprep-2ddenoise-2dbenchmark-d0ef34:26e5cfea22cbd763feca0fbeed29ee0db96c0b29" already present on machine
  Normal  Created    24m   kubelet                   Created container notebook
  Normal  Started    24m   kubelet                   Started container notebook

Read more here: Source link