2025-03-22 / @syui

openai

自分だけのAIを作ろう

openaiのchatgptでも一応課金は発生しますので、自分だけのAIを作ろうということで今回はやっていきます。

まずはlocalで実行することから、最後にLoRAなどfinetuningするところまで。

modelはgemma3がおすすめです。

model

  1. gemma3:1b
  2. deepseek-r1:12b
$ brew install ollama
$ brew services restart ollama
$ ollama pull gemma3:1b
$ ollama run gemma3:1b "hello"

n8n

n8nでAIエージェントを作成できます。

# https://github.com/n8n-io/n8n/
$ docker volume create n8n_data
$ docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n

webui

$ winget install ollama.ollama
$ ollama server
$ ollama run gemma3:1b

$ winget install --id Python.Python.3.11 -e
$ python --version
$ python -m venv webui
$ cd webui
$ .\Scripts\activate
$ pip install open-webui
$ open-webui serve

http://localhost:8080

LoRA

apple siliconでLoRA(finetuning)するにはmlx_lmを使用します。

$ brew install --cask anaconda
$ brew info anaconda
$ cd /opt/homebrew/Caskroom/anaconda/*
$ ./Anaconda3*.sh

google/gemma-3-1b-itを承認しておきます。

$ pip install -U "huggingface_hub[cli]"
# https://huggingface.co/settings/tokens
# Repositories permissions:Read access to contents of selected repos

$ huggingface_hub login
$ conda create -n finetuning python=3.11
$ conda activate finetuning
$ pip install mlx-lm
$ echo "{ \"model\": \"https://huggingface.co/google/gemma-3-1b-it\", \"data\": \"https://github.com/ml-explore/mlx-examples/tree/main/lora/data\" }"|jq .
$ git clone https://github.com/ml-explore/mlx-examples
$ model=google/gemma-3-1b-it
$ data=mlx-examples/lora/data
$ mlx_lm.lora --train --model $model --data $data --batch-size 3

unsloth

windowsでLoRA(finetuning)するにはunslothを使います。

$ nvidia-smi
$ nvcc --version

# https://github.com/unslothai/notebooks/blob/main/unsloth_windows.ps1
cuda: 12.4
python: 3.11
$ winget install --scope machine nvidia.cuda --version 12.4.1
$ winget install curl.curl
# https://docs.unsloth.ai/get-started/installing-+-updating/windows-installation
$ curl -sLO https://raw.githubusercontent.com/unslothai/notebooks/refs/heads/main/unsloth_windows.ps1
$ powershell.exe -ExecutionPolicy Bypass -File .\unsloth_windows.ps1
$ vim custom.py

上記はpwshでunsolthを使う方法ですが、wslを使ったほうがいいです。

# https://docs.unsloth.ai/get-started/fine-tuning-guide
from unsloth import FastModel
import torch

fourbit_models = [
    # 4bit dynamic quants for superior accuracy and low memory use
    # https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-gemma-3
    # https://huggingface.co/unsloth/gemma-3-4b-it
    "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-27b-it-unsloth-bnb-4bit",

    # Other popular models!
    "unsloth/Llama-3.1-8B",
    "unsloth/Llama-3.2-3B",
    "unsloth/Llama-3.3-70B",
    "unsloth/mistral-7b-instruct-v0.3",
    "unsloth/Phi-4",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3-4b-it",
    max_seq_length = 2048, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)
model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = False, # Turn off for just text!
    finetune_language_layers   = True,  # Should leave on!
    finetune_attention_modules = True,  # Attention good for GRPO
    finetune_mlp_modules       = True,  # SHould leave on always!

    r = 8,           # Larger = higher accuracy, but might overfit
    lora_alpha = 8,  # Recommended alpha == r at least
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
)