Import Models from GitHub

The GitHub model definition allows you to import models from a GitHub repository. Your model files can be tracked with Git LFS or DVC.

#Feature

Currently, Instill Model supports importing models from

  • ✅ Public GitHub repository
  • 🚧 Private GitHub repository (coming soon)

#Release Stage

Alpha

#Configuration

FieldTypeNote
repository*stringName of a public GitHub repository, e.g., instill-ai/model-yolov7

#Getting Started

#Requirements

  • A public GitHub repository where model files are stored
  • The repository has at lease one tag

#Prepare a GitHub Repository and Track Large Model Files by Git LFS

GitHub limits the size of files (max 100 MB) allowed in repositories. But the size of the model files can be large. To track large model files beyond the limit, you can use Git LFS.

Assume Git LFS is installed, this guideline publishes model files in a repository on GitHub.

INFO

In general, importing models via this approach is not recommended. GitHub has limited quotas of storage and bandwidth for Git LFS files, and the usage will count against the repository owner's quotas leading you to purchase more when reaching the cap.

Instead, consider using GitHub DVC or ArtiVC approaches.

Step 1: Create a GitHub repository

Go to GitHub and create a new public repository and set up on the command line


# Create a folder
mkdir model-yolov7
cd model-yolov7
# Set a new remote
git init
git branch -M main
git remote add origin https://github.com/user/repo.git

Replace https://github.com/user/repo.git with your repository's remote URL.

Step 2: Download sample model data

Having initialized the project, let's download the sample model files


# Download sample model
curl -o yolov7.zip https://artifacts.instill.tech/vdp/sample-models/yolov7.zip
tar -xvf yolov7.zip
rm yolov7.zip

The extracted model files should look like:


.
├── README.md
├── model.onnx // <--- large model file
└── model.py

In this case, we use the Object Detection model YOLOv7 as sample data. Among all model files, the size of model.onnx is 141 MB that beyonds the GitHub file uploading limit.

Step 3: Track large files with Git LFS

To associate a file type with Git LFS, enter git lfs track followed by the name of the file extension.


# Install Git LFS
git lfs install
# Associate onnx files to Git LFS
git lfs track "*.onnx"
# List the currently tracked paths
git lfs track
# Output
Listing tracked patterns
*.onnx (.gitattributes)
Listing excluded patterns

This commands amends the repository's .gitattributes file and associates every .onnx files with Git LFS.


*.onnx filter=lfs diff=lfs merge=lfs -text

Then, let's push all the other files to GitHub as you normally would:


# Update remote
git add --all
git commit -m "feat: add model files"
# List Git LFS tracked paths
git lfs ls-files
# Output
1881fe9c50 * model.onnx
# Update remote
git push -u origin main

INFO

As the official GitHub Docs suggested, please commit the local .gitattributes file into your repository.

After uploading all files successfully, go to your GitHub repository. You should see all model files are uploaded with the .onnx file in Git LFS.

Show YOLOv4 tracked by Git LFS
Show YOLOv4 tracked by Git LFS

Step 4: Create a Git tag

Git tags mark specific points in the repository's history. They are deployable software iterations for share and re-use. When importing a model from a GitHub repository, Instill Model creates one model according to the specified tag.


git tag <tagname>
git push origin --tags

🎉 This repository is ready. Follow the Import and import the repository to Instill Model.

#Prepare a GitHub Repository and Manage Large Model Files by DVC

Besides Git LFS, a good alternative is to use DVC within a Github repository.

By using DVC, you can be sure not to bloat your repositories with large volumes of data or huge models. These large docs-assets reside in the cloud or other remote storage locations. You will simply track their version info in Git.

—— From DVC doc

Supported DVC remote storage

  • ✅ Public Google Cloud Storage (GCS)

Assuming DVC is installed, this guideline publishes a repository on GitHub and uploads tracked large model files remotely with DVC.

Follow Step 1-2 of the Prepare a GitHub repository and track large model files by Git LFS guideline.

Step 1: Create a GitHub repository

Go to GitHub and create a new public repository and set up on the command line


# Create a folder
mkdir model-yolov7
cd model-yolov7
# Set a new remote
git init
git branch -M main
git remote add origin https://github.com/user/repo.git

Replace https://github.com/user/repo.git with your repository's remote URL.

Step 2: Download sample model data

Having initialized the project, let's download the sample model files


# Download sample model
curl -o yolov7.zip https://artifacts.instill.tech/vdp/sample-models/yolov7.zip
tar -xvf yolov7.zip
rm yolov7.zip

Step 3: Initialize DVC in the repository


dvc init

A few DVC internal directories and files are created. Let's track them with Git.


git status
# Output
...
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .dvc/.gitignore
new file: .dvc/config
new file: .dvcignore
git commit -m "chore: initialize DVC"

Step 4: Track large model files with DVC

Let's use dvc add to track the ONNX model file:


dvc add model.onnx
# Output
To track the changes with git, run:
git add model.onnx.dvc .gitignore
...

DVC stores the model file information in a .dvc metadata file and lists it in .gitignore. The .dvc file is a placeholder for the original large file.


cat model.onnx.dvc
# Output
outs:
- md5: 304ebe08d8a00e9cb03147b756e26d3a
size: 147727196
path: model.onnx

dvc add moves the large file into .dvc/cache:


.dvc/cache
└── 30
└── 4ebe08d8a00e9cb03147b756e26d3a

The hash value of the ONNX file we just added (2e0eeb4...) determines the above cache path.

Follow the instruction and track these files with Git


git add model.onnx.dvc .gitignore
git commit -m "feat: add model file"

Step 5: Push the large model files to DVC remote storage

Currently, Instill Model supports fetching models from public Google Cloud Storage (GCS). Let's set up the remote storage location with a public GCS bucket:

INFO

Prepare the GCS bucket:

  • Create a storage bucket before adding DVC remote
  • Make sure to run gcloud auth application-default login or other ways to authenticate and access GCS.

# Create a new data remote
dvc remote add -d myremote gs://my-public-bucket/yolov7
# Record changes
git add .dvc/config
git commit -m "chore: set up dvc remote storage"

Instead of storing the DVC-tracked large files in the repository, we can store them remotely (usually with a cloud storage service) with dvc push.


dvc push

dvc push copies the local cached data to the remote storage we set up earlier. The remote bucket directory should look like:


.../yolov7
└── 30
└── 4ebe08d8a00e9cb03147b756e26d3a

Let's push all files including dvc files to GitHub


git add --all
git commit -m "feat: add model files"
git push -u origin main

Step 6: Create a Git tag

Follow Step 4 of the Prepare a GitHub repository and track large model files by Git LFS guideline and tag the current model.


git tag <tagname>
git push origin --tags

🎉 If you've followed the above steps to store the model in remote storage and version it within a GitHub repository using DVC, just run the setup guide below, Instill Model will import the model accordingly.

TIP

Use dvc pull to retrieve DVC-tracked files from remote storage. See here for more information.

#Import

#No-code Setup

To import a model from GitHub in the Console, do the following:

  1. Go to the Model page and click Add new model
  2. In the Set Up New Model page, fill an ID for your model, this will be the unique identifier of this model
  3. [Optional] Give a short description of your model in the Description field
  4. Click the Model source ▾ drop-down and choose GitHub
  5. Fill the GitHub repository URL and the Git tag that stores the model files and click Set up
  6. Now go to the Model page, the corresponding model should be there. Note that it may take some time for the model to be deployed online.

#Low-code Setup

  1. Send a HTTP request to the Instill Model model-backend to import a model from a GitHub repository.
cURL
Copy

curl -X POST http://localhost:8080/model/v1alpha/users/admin/models \
--header 'Authorization: Bearer instill_sk_***' \
--data '{
"id": "yolov7-v1-cpu",
"model_definition": "model-definitions/github",
"configuration": {
"repository": "instill-ai/model-yolov7-dvc",
"tag": "v1.0-cpu"
}
}'

  1. Deploy the imported model yolov7-v1-cpu.
cURL
Copy

curl -X POST http://localhost:8080/model/v1alpha/users/admin/models/yolov7-v1-cpu/deploy \
--header 'Authorization: Bearer instill_sk_***'

  1. Perform an inference to test the model
cURL(url)
cURL(base64)
cURL(multipart)
Copy

curl -X POST http://localhost:8080/model/v1alpha/users/admin/models/yolov7-v1-cpu/trigger \
--header 'Authorization: Bearer instill_sk_***' \
--data '{
"task_inputs": [
{
"classification": {
"image_url": "https://artifacts.instill.tech/imgs/dog.jpg"
}
},
{
"classification": {
"image_url": "https://artifacts.instill.tech/imgs/bear.jpg"
}
}
]
}'

#Limitations

Current implementation does not support real-time GitHub sync: after you import a model from a specific tag of a GitHub repository, new releases of this GitHub tag won't be synced in Instill Model.

Last updated: 4/30/2024, 7:31:30 AM