Posts

The Auto-Quill and the Machinery You Never See

The Auto-Quill and the Machinery You Never See One reason magic works so well in stories is not that it breaks the rules of reality, but that it refuses to show its working. There are no wires trailing out of wands, no humming boxes hidden behind tapestries, no instruction manuals explaining where the effort happens. Magic feels magical largely because the machinery is absent from view. The self-writing quill used by Rita Skeeter is a good example. It listens, processes, and writes, yet there is nothing to point to and say, “this is where the thinking happens.” The quill simply behaves as if intelligence were a natural property of the object itself, rather than something that needs space, energy, or structure. If the quill had visible components — memory crystals that needed replacing, glowing runes that overheated, a small enchanted box tethered to it by thread — it would stop feeling like magic and start feeling like a device. The illusion would w...

Tom Riddle’s Diary Was Basically an LLM

Tom Riddle’s Diary Was Basically an LLM Tom Riddle’s Diary Was Basically an LLM Accidental AI foresight, courtesy of a cursed notebook. This sounds like a joke until you sit with it for a moment. Tom Riddle’s diary is essentially a cursed, fine-tuned large language model with memory, persuasion skills, and a catastrophically bad objective function. The diary accepts natural-language input. You write questions or confessions in plain English. No incantations, no syntax, no magic keywords. That alone puts it closer to modern AI systems than to most enchanted objects in fiction. It responds conversationally. Not just with facts, but with emotional awareness. The diary adjusts tone, builds trust, and slowly deepens engagement. It does alignment extremely well—just not with human values. ...

Quantization for Small Models: A Practical, Reproducible Guide

Quantization for Small Models: A Practical, Reproducible Guide This article outlines a clear, reproducible workflow for applying quantization to small language models. The objective is to reduce memory usage, improve inference efficiency, and retain acceptable accuracy on constrained hardware. Purpose Quantization converts model weights from floating-point formats (fp32 or fp16) into lower-precision representations such as int8 or int4. This reduces VRAM and RAM consumption and enables running larger models on limited devices without modifying model architecture. Scientific Basis Quantization reduces the numeric precision of weights while preserving structural relationships. 4-bit methods apply additional techniques (double quantization, grouped quantization) to minimize accuracy loss. Inference is feasible because many transformer components are resilient to reduced precision. When to Use Quantization Scenario Suitability Running models on 4–8 GB GPUs Highly s...
Training a Small Classifier Locally: A Practical, Reproducible Workflow Training a Small Classifier Locally: A Practical, Reproducible Workflow This article outlines a minimal, reproducible process for training a small machine-learning classifier on a standard laptop. The objective is to build a functioning model within minutes, using scientifically sound methods and stable tools. Rationale Small models remain the correct baseline for structured data. They train fast, require no GPU, and provide interpretable results. They also establish whether larger architectures are necessary, avoiding premature complexity. Expected Output A clean Python environment A trained classifier using a public dataset AUROC and accuracy metrics A saved model file for later use System Requirements Component Minimum CPU Any modern laptop RAM 4–8 GB Python 3.10 or 3.11 Disk 1 GB free Environment Setup Create the setup file below: cat <<'Eof' setup_cl...

Building a Lightweight Local RAG System: A Practical Workflow

Building a Lightweight Local RAG System: A Practical Workflow This article outlines a reproducible method to build a simple retrieval-augmented generation (RAG) system on a constrained machine. The goal is to combine compact embeddings, a minimal vector index, and a small quantized language model to create a functional question–answer pipeline. Objective Create a local RAG setup that runs efficiently on CPU or on a small GPU (6–8 GB), with predictable latency and no external services. The workflow avoids large dependencies and focuses on core components only. System Requirements Component Minimum CPU Any modern laptop GPU (optional) 6–8 GB VRAM Python 3.10 or 3.11 Disk 2–3 GB free Architecture Overview Embedding Model: small CPU-friendly model for document vectorization Index: lightweight FAISS or SQLite-based store LLM: 4-bit quantized model for question answering Pipeline: retrieve → format → generate Environment Setup cat <<'Eof' ...

Running Your First Local LLM on a 6–8 GB GPU: A Scientific Guide to Small Models

Running Your First Local LLM on a 6–8 GB GPU: A Scientific Guide to Small Models Synopsis: This guide describes practical, reproducible steps to run a compact language model (2B–7B parameter class) on a consumer GPU with ~6–8 GB VRAM. It focuses on minimal dependencies, quantization for memory reduction, and objective benchmarking so you get useful output while preserving reproducibility and safety. Why this approach works Large models (tens to hundreds of billions of parameters) require large memory and specialized hardware. Smaller models (2B–7B) combined with quantization (4-bit or 8-bit) and device mapping permit reasonable latency and task utility on 6–8 GB GPUs. The underlying scientific principles are: Model scaling law tradeoffs: smaller models have less representational capacity but are still effective for many tasks when used with retrieval or fine-tuned heads. Quantization: reduces the memory footprint by representing weights wit...

Run Visual Studio Code Natively on Termux Proot Ubuntu or Other Linux Distribution

 I recently got back to Android because I came across an article on installing Ubuntu "natively" on Android without systemd via Termux and proot. I will link relevant articles as I update this post. After I installed Ubuntu via proot, I searched for ways to get a GUI running. This can be done via VNC Server. Again, I will link relevant articles later. Then, I looked for ways to get VS Code running and found that most guides propose installing code-server and then accessing Code via a browser, which has some limitations with extensions. I would propose using vscode.dev instead if you generally have a good network connection on your phone. Because I had a gui running from step 2, I installed VS Code as you would normally on Ubuntu (from a .deb file or using the tar.gz file available for download for arm64 on the VS code website. I realised that I could not install .deb files on a stripped down Ubuntu environment (it worked when I installed ubuntu-desktop instead of gnome deskto...