Demo
Pre Requisite: Note the ARN details from AWS Sagemaker
- You would need to note down the ARN details from Sagemaker. A sample of how a ARN looks like is like ->
arn:aws:sagemaker:us-east-1:679004966033:model-package/gpt-neo
Steps to import model file from AWS Sage maker
- Navigate to your desired workspace in Inferless and Click on
"Add a custom model"
button that you see on the top right. An import wizard will open up.
Click on Add Model
Step 1: Choose the framework on which your model was trained.
-
Choose your
**model training framework.**
-
In case your desired framework is not available, we have provided detailed how-to guides for your reference
Choose the training framework used for your model
Step 2: Choose the source of your model.
-
Since we are using a model from AWS Sagemaker in this example, select
File
as the method of upload. -
To proceed with the upload, you will need to connect your
AWS Sagemaker account
. This is a mandatory step as this helps us in downloading the file from your Sagemaker account.
Select Model Repository -> AWS Sagemaker.
- If you are using Inferless for the first time and have not added any providers before, click on the
"Add Provider"
button (+).
Click on Add provider to add your HF and Github accounts
- This would open a new tab with
"My Integrations"
. You can click connect on the necessary accounts there.
Click Connect account under AWS Sagemaker
- Once you have completed, you can come back to this page and click the
refresh
button provided. You would be able to see the added accounts then.
Step 3: Enter the model details
-
Model Name: Desired name of the model that you wish to give.
-
Sample Input: Enter an example of how the input should be formatted for the model in JSON format.
-
Sample Output: Enter an example of how the output would be formatted for the model in JSON format.
- After clicking next, you may notice a “Please Wait” status, which means that we are validating your model for requirements. If the validation is successful, the next step will open up. If there are any validation issues, they will be notified to you.
Step 4: Model Configuration
After successful validation, you will move to the next step where you will be asked to choose the configuration for the model.
GPU and Machine Type selection
GPU Type: We provide all types of GPUs, you can choose your required GPU from the dropdown list.
Machine Type: Choose between our Shared
and Dedicated
GPU options. Opt for a Shared GPU to optimize costs and provide you with 50% of the vRAM. Alternatively, the Dedicated GPU provides exclusive resources, making it the ideal choice for deploying large models.
Choose the Minimum and Maximum replicas
Min replica - The number of inference workers to keep on at all times.
Max replica - The maximum number of inference workers to allow at any point in time
In case you would like to set up an Automatic rebuild
for your model, enable it. You would need to set up a webhook for this method. Click here for more details.
Using the Custom Runtime
In the Advance configuration
, we have the option to select the custom runtime. First, click on the Add runtime to upload the config.yaml
file, give any name and save it. Choose the runtime from the drop-down menu and then click on continue.
Using Volume
Volume allows you to store the model files on NFS-like writeable volumes, which can store model parameters to help with cold start. Click on the Add volume to create a volume and select it from the dropdown menu.
Step 5: Review your model details
-
Once you click “Continue,” you will be able to review the details added for the model.
-
If you would like to make any changes, you can go back and make the changes.
-
Once you have reviewed everything, click “Submit” to start the model import process.
Step 6: Run your model
- Once you click submit, the model import process will start.
-
It may take some time to complete the import process, and during this time, you will be redirected to your workspace and can see the status of the import under
"In Progress/Failed"
tab. -
If you encounter any errors during the model import process or if you want to view the build logs for any reason, you can click on the three dots menu and select “View build logs”. This will show you a detailed log of the import process, which can help you troubleshoot any issues you may encounter.
-
Post upload, the model will be available under “My Models”
-
You can then select the model and go to
-> API -> Inference Endpoint details.
Here you would find the API endpoints that can be called. You can click on the copy button on the right and can call your model after modifying the input details.
-
Extra Step : API key details
-
In case you need help with API Keys:
-
Click on settings, available on the top, next to your Workspace Name
-
Click on “Workspace API keys”
-
You can view the details of your key or generate a new one
Sample for now