2025-03-22 /
@syui
openai
自分だけのAIを作ろう
openaiのchatgptでも一応課金は発生しますので、自分だけのAIを作ろうということで今回はやっていきます。
まずはlocalで実行することから、最後にLoRAなどfinetuningするところまで。
modelはgemma3
がおすすめです。
model
- gemma3:1b
- deepseek-r1:12b
$ brew install ollama
$ brew services restart ollama
$ ollama pull gemma3:1b
$ ollama run gemma3:1b "hello"
n8n
n8n
でAIエージェントを作成できます。
# https://github.com/n8n-io/n8n/
$ docker volume create n8n_data
$ docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n
webui
$ winget install ollama.ollama
$ ollama server
$ ollama run gemma3:1b
$ winget install --id Python.Python.3.11 -e
$ python --version
$ python -m venv webui
$ cd webui
$ .\Scripts\activate
$ pip install open-webui
$ open-webui serve
http://localhost:8080
LoRA
apple siliconでLoRA(finetuning)するにはmlx_lm
を使用します。
$ brew install --cask anaconda
$ brew info anaconda
$ cd /opt/homebrew/Caskroom/anaconda/*
$ ./Anaconda3*.sh
google/gemma-3-1b-it
を承認しておきます。
$ pip install -U "huggingface_hub[cli]"
# https://huggingface.co/settings/tokens
# Repositories permissions:Read access to contents of selected repos
$ huggingface_hub login
$ conda create -n finetuning python=3.11
$ conda activate finetuning
$ pip install mlx-lm
$ echo "{ \"model\": \"https://huggingface.co/google/gemma-3-1b-it\", \"data\": \"https://github.com/ml-explore/mlx-examples/tree/main/lora/data\" }"|jq .
$ git clone https://github.com/ml-explore/mlx-examples
$ model=google/gemma-3-1b-it
$ data=mlx-examples/lora/data
$ mlx_lm.lora --train --model $model --data $data --batch-size 3
unsloth
windowsでLoRA(finetuning)するにはunsloth
を使います。
$ nvidia-smi
$ nvcc --version
# https://github.com/unslothai/notebooks/blob/main/unsloth_windows.ps1
cuda: 12.4
python: 3.11
$ winget install --scope machine nvidia.cuda --version 12.4.1
$ winget install curl.curl
# https://docs.unsloth.ai/get-started/installing-+-updating/windows-installation
$ curl -sLO https://raw.githubusercontent.com/unslothai/notebooks/refs/heads/main/unsloth_windows.ps1
$ powershell.exe -ExecutionPolicy Bypass -File .\unsloth_windows.ps1
$ vim custom.py
上記はpwshでunsolthを使う方法ですが、wslを使ったほうがいいです。
# https://docs.unsloth.ai/get-started/fine-tuning-guide
from unsloth import FastModel
import torch
fourbit_models = [
# 4bit dynamic quants for superior accuracy and low memory use
# https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-gemma-3
# https://huggingface.co/unsloth/gemma-3-4b-it
"unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
"unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
"unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
"unsloth/gemma-3-27b-it-unsloth-bnb-4bit",
# Other popular models!
"unsloth/Llama-3.1-8B",
"unsloth/Llama-3.2-3B",
"unsloth/Llama-3.3-70B",
"unsloth/mistral-7b-instruct-v0.3",
"unsloth/Phi-4",
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastModel.from_pretrained(
model_name = "unsloth/gemma-3-4b-it",
max_seq_length = 2048, # Choose any for long context!
load_in_4bit = True, # 4 bit quantization to reduce memory
load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
full_finetuning = False, # [NEW!] We have full finetuning now!
# token = "hf_...", # use one if using gated models
)
model = FastModel.get_peft_model(
model,
finetune_vision_layers = False, # Turn off for just text!
finetune_language_layers = True, # Should leave on!
finetune_attention_modules = True, # Attention good for GRPO
finetune_mlp_modules = True, # SHould leave on always!
r = 8, # Larger = higher accuracy, but might overfit
lora_alpha = 8, # Recommended alpha == r at least
lora_dropout = 0,
bias = "none",
random_state = 3407,
)