The GitHub
model definition allows you to import models from a GitHub repository. Your model files can be tracked with Git LFS or DVC.
#Feature
Currently, Instill Model supports importing models from
- ✅ Public GitHub repository
- 🚧 Private GitHub repository (coming soon)
#Release Stage
Alpha
#Configuration
Field | Type | Note |
---|---|---|
repository * | string | Name of a public GitHub repository, e.g., instill-ai/model-yolov7 |
#Getting Started
#Requirements
- A public GitHub repository where model files are stored
- The repository has at lease one tag
#Prepare a GitHub Repository and Track Large Model Files by Git LFS
GitHub limits the size of files (max 100 MB) allowed in repositories. But the size of the model files can be large. To track large model files beyond the limit, you can use Git LFS.
Assume Git LFS is installed, this guideline publishes model files in a repository on GitHub.
In general, importing models via this approach is not recommended. GitHub has limited quotas of storage and bandwidth for Git LFS files, and the usage will count against the repository owner's quotas leading you to purchase more when reaching the cap.
Instead, consider using GitHub DVC or ArtiVC approaches.
Step 1: Create a GitHub repository
Go to GitHub and create a new public repository and set up on the command line
# Create a foldermkdir model-yolov7cd model-yolov7# Set a new remotegit initgit branch -M maingit remote add origin https://github.com/user/repo.git
Replace https://github.com/user/repo.git
with your repository's remote URL.
Step 2: Download sample model data
Having initialized the project, let's download the sample model files
# Download sample modelcurl -o yolov7.zip https://artifacts.instill.tech/vdp/sample-models/yolov7.ziptar -xvf yolov7.ziprm yolov7.zip
The extracted model files should look like:
.├── README.md├── model.onnx // <--- large model file└── model.py
In this case, we use the Object Detection model YOLOv7 as sample data. Among all model files, the size of model.onnx
is 141 MB that beyonds the GitHub file uploading limit.
Step 3: Track large files with Git LFS
To associate a file type with Git LFS, enter git lfs track
followed by the name of the file extension.
# Install Git LFSgit lfs install# Associate onnx files to Git LFSgit lfs track "*.onnx"# List the currently tracked pathsgit lfs track# OutputListing tracked patterns *.onnx (.gitattributes)Listing excluded patterns
This commands amends the repository's .gitattributes
file and associates every .onnx
files with Git LFS.
*.onnx filter=lfs diff=lfs merge=lfs -text
Then, let's push all the other files to GitHub as you normally would:
# Update remotegit add --allgit commit -m "feat: add model files"# List Git LFS tracked pathsgit lfs ls-files# Output1881fe9c50 * model.onnx# Update remotegit push -u origin main
As the official GitHub Docs suggested, please commit the local .gitattributes
file into your repository.
After uploading all files successfully, go to your GitHub repository. You should see all model files are uploaded with the .onnx
file in Git LFS.
Step 4: Create a Git tag
Git tags mark specific points in the repository's history. They are deployable software iterations for share and re-use. When importing a model from a GitHub repository, Instill Model creates one model according to the specified tag.
git tag <tagname>git push origin --tags
🎉 This repository is ready. Follow the Import and import the repository to Instill Model.
#Prepare a GitHub Repository and Manage Large Model Files by DVC
Besides Git LFS, a good alternative is to use DVC within a Github repository.
By using DVC, you can be sure not to bloat your repositories with large volumes of data or huge models. These large docs-assets reside in the cloud or other remote storage locations. You will simply track their version info in Git.
—— From DVC doc
Supported DVC remote storage
- ✅ Public Google Cloud Storage (GCS)
Assuming DVC is installed, this guideline publishes a repository on GitHub and uploads tracked large model files remotely with DVC.
Follow Step 1-2 of the Prepare a GitHub repository and track large model files by Git LFS guideline.
Step 1: Create a GitHub repository
Go to GitHub and create a new public repository and set up on the command line
# Create a foldermkdir model-yolov7cd model-yolov7# Set a new remotegit initgit branch -M maingit remote add origin https://github.com/user/repo.git
Replace https://github.com/user/repo.git
with your repository's remote URL.
Step 2: Download sample model data
Having initialized the project, let's download the sample model files
# Download sample modelcurl -o yolov7.zip https://artifacts.instill.tech/vdp/sample-models/yolov7.ziptar -xvf yolov7.ziprm yolov7.zip
Step 3: Initialize DVC in the repository
dvc init
A few DVC internal directories and files are created. Let's track them with Git.
git status# Output...Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: .dvc/.gitignore new file: .dvc/config new file: .dvcignoregit commit -m "chore: initialize DVC"
Step 4: Track large model files with DVC
Let's use dvc add
to track the ONNX model file:
dvc add model.onnx# OutputTo track the changes with git, run: git add model.onnx.dvc .gitignore ...
DVC stores the model file information in a .dvc
metadata file and lists it in .gitignore
. The .dvc
file is a placeholder for the original large file.
cat model.onnx.dvc# Outputouts:- md5: 304ebe08d8a00e9cb03147b756e26d3a size: 147727196 path: model.onnx
dvc add
moves the large file into .dvc/cache
:
.dvc/cache└── 30 └── 4ebe08d8a00e9cb03147b756e26d3a
The hash value of the ONNX file we just added (2e0eeb4...
) determines the above cache path.
Follow the instruction and track these files with Git
git add model.onnx.dvc .gitignoregit commit -m "feat: add model file"
Step 5: Push the large model files to DVC remote storage
Currently, Instill Model supports fetching models from public Google Cloud Storage (GCS). Let's set up the remote storage location with a public GCS bucket:
Prepare the GCS bucket:
- Create a storage bucket before adding DVC remote
- Make sure to run
gcloud auth application-default login
or other ways to authenticate and access GCS.
# Create a new data remotedvc remote add -d myremote gs://my-public-bucket/yolov7# Record changesgit add .dvc/configgit commit -m "chore: set up dvc remote storage"
Instead of storing the DVC-tracked large files in the repository, we can store them remotely (usually with a cloud storage service) with dvc push
.
dvc push
dvc push copies the local cached data to the remote storage we set up earlier. The remote bucket directory should look like:
.../yolov7└── 30 └── 4ebe08d8a00e9cb03147b756e26d3a
Let's push all files including dvc files to GitHub
git add --allgit commit -m "feat: add model files"git push -u origin main
Step 6: Create a Git tag
Follow Step 4 of the Prepare a GitHub repository and track large model files by Git LFS guideline and tag the current model.
git tag <tagname>git push origin --tags
🎉 If you've followed the above steps to store the model in remote storage and version it within a GitHub repository using DVC, just run the setup guide below, Instill Model will import the model accordingly.
Use dvc pull
to retrieve DVC-tracked files from remote storage. See
here for
more information.
#Import
#No-code Setup
To import a model from GitHub in the Console, do the following:
- Go to the Model page and click Add new model
- In the Set Up New Model page, fill an ID for your model, this will be the unique identifier of this model
- [Optional] Give a short description of your model in the Description field
- Click the Model source ▾ drop-down and choose GitHub
- Fill the GitHub repository URL and the Git tag that stores the model files and click Set up
- Now go to the Model page, the corresponding model should be there. Note that it may take some time for the model to be deployed online.
#Low-code Setup
- Send a HTTP request to the Instill Model
model-backend
to import a model from a GitHub repository.
- Deploy the imported model
yolov7-v1-cpu
.
- Perform an inference to test the model
#Limitations
Current implementation does not support real-time GitHub sync: after you import a model from a specific tag of a GitHub repository, new releases of this GitHub tag won't be synced in Instill Model.