Local Ollama

Learn how to build an intelligent note-taking application using RAG (Retrieval-Augmented Generation) with Spring Boot 4.1, Spring AI 2.0, and Qdrant vector database. Includes Tool Calling for accurate expense calculations.

From Sticky Notes to Smart Answers: RAG Implementation with Spring Boot 4.1 & Spring AI 2.0

In previous articles, we integrated local LLMs for test generation and set up semantic search with Qdrant. Now it’s time to tie these ideas together and build a proper RAG application. We’ll create a service that can retrieve relevant notes, pass them into an LLM context, and — crucially — use Tool Calling to perform exact calculations. Using monthly expense tracking as our example, we’ll see how Spring AI 2.0 and Spring Boot 4.1 help unify search, generation, and external tools into a cohesive, reliable solution. ...

Lessons Learned: Spring AI vs Handcrafted Integration

WebClient Was Fast. Spring AI Was Easy. I Chose .... Here's What Happened.

This note is a continuation of the previous part. The motivation was to compare approaches to implementing the integration with AI: Ollama: responsible to calculate embeddings vectors Vector database, Qdrant in our case: responsible to keep and search Notes Let’s start with a summary. This conclusion was obvious even before touching the code, but after playing around, it has been confirmed. Better to start with Spring AI if integration is needed. It provides a good abstraction layer, including: ...

Semantic search: Qdrant + qwen3-embedding:4b + Local Ollama

Building a Local Semantic Search Engine with Qdrant, Qwen3-Embedding (4b), and Spring Boot

Today we will help Dipper manage his notes and search through the vector database Qdrant. Embeddings will be evaluated using local Ollama with model qwen3-embedding:4b. A Spring Boot WebFlux application will enable creating new notes and later finding relevant entries for a given request. How semantic search works Some LLMs process prompts, while others help generate embeddings. An embedding is just a vector — a sequence of numbers evaluated based on your input (note content in our example) ...

Part 1: Running Local LLM for Java Tests — Ollama + Gemma 4b + Devoxx Genie

Maintaining high code coverage is essential but often tedious — especially when you need to cover controllers, services, repositories, and edge cases individually. AI‑powered test generation can automate most of this work, cutting hours of manual effort (in some cases, up to 40–50% of testing time). However, many enterprises — especially in finance, healthcare, or regulated industries — are not willing to share their codebase with third‑party LLM providers like OpenAI or Anthropic. Security policies, intellectual property concerns, and compliance requirements (e.g., GDPR, SOC2) often mandate that no data leaves the corporate network. ...