Hi, I wanted to raise awareness on this (please direct me if this is not the place to do so). I created a SageMaker endpoint and pass an image through the endpoint. It causes the error I’ve attached below. I’ve attached the CloudWatch image which indicates a function is missing in the pynvml library. I created a requirements.txt which installs the nvgpu and pynvml, but the log displayed that they already exist. The relevant topic I could find is here: github.com/pytorch/serve/issues/1813. For comprehension sake, I checked the logs and the Torchserve version is 0.7.1. The last activity on that github was last year so I was curious if anyone has found a solution. I appreciate any help!
I created an endpoint in SageMaker as such:
from sagemaker.pytorch.model import PyTorchModel
pytorch_model = PyTorchModel(
model_data= model_bucket,
role=role,
entry_point='inference.py',
source_dir='code',
py_version="py39",
framework_version="1.13",
)
predictor = pytorch_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge",
)
I then call the endpoint to predict:
# Load and encode the image
import base64
with open('zebra.jpg', 'rb') as img:
image = img.read()
image_base64 = base64.b64encode(image).decode('utf-8')
response = predictor.predict(image_base64, initial_args={'ContentType': 'application/x-image'})
The error message I receive specifically is the following which directs me to CloudWatch.
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary and could not load the entire response body.
Read more here: Source link