3 minutes
Code Generation Pipelines: Local LLMs in Dev Workflows
Introduction
I’ve been tinkering with using a locally hosted code-capable LLM to speed up my development workflow—everything from autocompleting boilerplate to generating documentation and even suggesting refactors. Instead of relying on cloud-based code assistants (which can be pricey for heavy use), I’m running a small model locally. Here’s my pipeline to integrate it into my IDE or CI pipeline.
Why I Went Local for Code Generation
- Security: No proprietary code ever leaves my machine or CI servers.
- Cost Savings: I call the model as often as I like without per-token billing.
- Customization: I can fine-tune or prompt-engineer the model to follow my team’s style guides.
Pipeline Overview
- Choose or Fine-Tune a Code Model – Grab a code-specialized LLM (e.g., CodeGen, StarCoder) and optionally fine-tune on your codebase.
- Setup a Local API Service – Wrap the model in a simple REST or gRPC server.
- IDE Integration – Connect VSCode or JetBrains via plugin or LSP.
- CI Integration – Use the service in pre-commit hooks or CI jobs to enforce docs coverage or style.
I’ll break down each step with config snippets and handy tips.
1. Choosing & Fine-Tuning the Code Model
I started with starcoder
since it’s open and supports Python, JavaScript, and more. If you want to fine-tune on your code, I recommend using LoRA to keep the process lightweight.
pip install transformers peft accelerate datasets
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
base = "bigcode/starcoder"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
# LoRA setup
peft_config = LoraConfig(
task_type="CAUSAL_LM",
r=4, alpha=16, dropout=0.1
)
model = get_peft_model(model, peft_config)
# Load your code dataset
ds = load_dataset('path/to/your/code', split='train')
def tokenize_fn(examples):
return tokenizer(examples['code'], truncation=True, padding=True)
ts = ds.map(tokenize_fn, batched=True)
args = TrainingArguments(
output_dir='starcoder-finetuned',
per_device_train_batch_size=2,
num_train_epochs=2,
logging_steps=100
)
trainer = Trainer(
model=model,
args=args,
train_dataset=ts
)
trainer.train()
model.save_pretrained('starcoder-finetuned')
Now I have a model that’s read my entire codebase’s patterns and naming.
2. Setting Up a Local API Service
I wrap the model in a FastAPI server so IDEs and CI can hit it via HTTP.
from fastapi import FastAPI, Body
from pydantic import BaseModel
from transformers import pipeline
app = FastAPI()
tokenizer = AutoTokenizer.from_pretrained('starcoder-finetuned')
model = AutoModelForCausalLM.from_pretrained('starcoder-finetuned')
predictor = pipeline('text-generation', model=model, tokenizer=tokenizer, device=0)
class CodeRequest(BaseModel):
prompt: str
max_length: int = 128
@app.post('/generate')
def generate_code(req: CodeRequest):
out = predictor(req.prompt, max_length=req.max_length, do_sample=False)
return {'code': out[0]['generated_text']}
Running uvicorn service:app --host 0.0.0.0 --port 8000
spins up the API. I secure it behind my company’s VPN.
3. IDE Integration
VSCode Example: I use the REST Client extension to test, then switched to writing a small VSCode extension that calls my service on Ctrl+Space
. In package.json
:
"contributes": {
"commands": [{
"command": "extension.generateCode",
"title": "Generate Code from LLM"
}],
"menus": {
"editor/context": [{
"command": "extension.generateCode",
"when": "editorTextFocus"
}]
}
}
In extension.js
, I fetch from http://localhost:8000/generate
and insert the returned code at the cursor.
4. CI Integration
Pre-commit Hook: I added a hook that checks for missing docstrings by generating doc stubs: if the function has no docstring, call generate
with a prompt like "Write a docstring for this function: <code>"
and insert it.
# .pre-commit-config.yaml
- repo: local
hooks:
- id: code-doc-gen
name: Generate Docstrings via LLM
entry: python hooks/doc_gen.py
language: python
In doc_gen.py
, I parse files for functions missing docstrings and call my API to generate them.
Wrapping Up
With this pipeline, I’ve got a private, customizable code assistant that lives on my machine. No more worrying about sensitive code in the cloud, and I can fine-tune on new patterns as my code evolves. Next up: deploying this as a Docker container so I can run it anywhere.