Hello.
Recently I installed eks cluster with torchserve following the tutorial github.com/pytorch/serve/tree/master/kubernetes/EKS, but having troubles uploading a motel.
When I try to upload a model via:
curl -X POST "http://$HOST:8081/models?url=http%3A//54.190.129.247%3A8222/model_ubuntu_2dd0aac04a22d6a0.mar"
curl -X POST "http://$HOST:8081/models?url=http://54.190.129.247/8222/model_ubuntu_2dd0aac04a22d6a0.mar"
I am getting the following error:
{
"code": 400,
"type": "DownloadArchiveException",
"message": "Failed to download archive from: http://54.190.129.247:8222/model_ubuntu_2dd0aac04a22d6a0.mar"
}
Although http://54.190.129.247:8222/model_ubuntu_2dd0aac04a22d6a0.mar
is a valid url.
kubectl describe pod -n default torchserve-6d4d5c8c89-zmnp9:
Name: torchserve-6d4d5c8c89-zmnp9
Namespace: default
Priority: 0
Node: ip-192-168-57-45.us-west-2.compute.internal/192.168.57.45
Start Time: Thu, 26 Aug 2021 13:13:21 -0700
Labels: app=torchserve
pod-template-hash=6d4d5c8c89
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.38.125
IPs:
IP: 192.168.38.125
Controlled By: ReplicaSet/torchserve-6d4d5c8c89
Containers:
torchserve:
Container ID: docker://a64f5ef418c569249c1c05fe3056d808c2e22b79c203aed05017580bea132cc0
Image: pytorch/torchserve:latest
Image ID: docker-pullable://pytorch/torchserve@sha256:3c290c60cb89bca38fbf1d6a36ea99554b3dbb9d32cb89ed434828c5b3fd2c73
Ports: 8080/TCP, 8081/TCP, 8082/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
torchserve
--start
--model-store
/home/model-server/shared/model-store/
--ts-config
/home/model-server/shared/config/config.properties
State: Running
Started: Thu, 26 Aug 2021 13:13:22 -0700
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 4Gi
nvidia.com/gpu: 0
Requests:
cpu: 1
memory: 1Gi
nvidia.com/gpu: 0
Environment: <none>
Mounts:
/home/model-server/shared/ from persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-z8vb9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: model-store-claim
ReadOnly: false
default-token-z8vb9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-z8vb9
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned default/torchserve-6d4d5c8c89-zmnp9 to ip-192-168-57-45.us-west-2.compute.internal
Normal Pulled 17m kubelet Container image "pytorch/torchserve:latest" already present on machine
Normal Created 17m kubelet Created container torchserve
Normal Started 17m kubelet Started container torchserve
config.properties:
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
Output of the access.log
2021-08-30 01:28:19,091 [INFO ] epollEventLoopGroup-3-12 ACCESS_LOG - /192.168.24.51:2472 "POST /models?url=http://34.219.222.97:8221/model_ubuntu_09888c953c68c1fa.mar%26model_name=aivanou HTTP/1.1" 400 6
2021-08-30 01:28:19,091 [INFO ] epollEventLoopGroup-3-12 TS_METRICS - Requests4XX.Count:1|#Level:Host|#hostname:torchserve-69494c8469-8f8z8,timestamp:null
2021-08-30 01:28:20,568 [INFO ] epollEventLoopGroup-3-13 ACCESS_LOG - /192.168.32.146:61380 "POST /models?url=http://34.219.222.97:8221/model_ubuntu_09888c953c68c1fa.mar&model_name=aivanou HTTP/1.1" 400 7
2021-08-30 01:28:20,568 [INFO ] epollEventLoopGroup-3-13 TS_METRICS - Requests4XX.Count:1|#Level:Host|#hostname:torchserve-69494c8469-8f8z8,timestamp:null
ts log output:
2021-08-30 03:00:50,425 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2021-08-30 03:00:50,609 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.4.2
TS Home: /usr/local/lib/python3.6/dist-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 2
Max heap size: 2048 M
Python executable: /usr/bin/python3
Config file: /home/model-server/shared/config/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/model-server/shared/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 2
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/shared/model-store
Model config: N/A
2021-08-30 03:00:50,618 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2021-08-30 03:00:50,660 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2021-08-30 03:00:50,740 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2021-08-30 03:00:50,740 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2021-08-30 03:00:50,742 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2021-08-30 03:00:50,742 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2021-08-30 03:00:50,743 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
2021-08-30 03:03:28,587 [DEBUG] epollEventLoopGroup-3-18 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2021-08-30 03:03:28,588 [DEBUG] epollEventLoopGroup-3-18 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2021-08-30 03:03:28,588 [INFO ] epollEventLoopGroup-3-18 org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2021-08-30 03:06:44,068 [DEBUG] epollEventLoopGroup-3-13 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1 for model tiny_image_net_aivanou_8df333374e4d115f
2021-08-30 03:06:44,069 [DEBUG] epollEventLoopGroup-3-13 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1 for model tiny_image_net_aivanou_8df333374e4d115f
2021-08-30 03:06:44,069 [INFO ] epollEventLoopGroup-3-13 org.pytorch.serve.wlm.ModelManager - Model tiny_image_net_aivanou_8df333374e4d115f loaded.
Read more here: Source link