Skip to main content

2 posts tagged with "Spring"

View All Tags

· 7 min read
Huseyin BABAL

What is a Vector Database?

A vector database is a specialized type of database optimized for storing, retrieving, and performing operations on vector data. Vectors, in this context, are typically arrays of numerical values that represent data in a multi-dimensional space. These are widely used in machine learning and AI for tasks like similarity search, where the goal is to find data points that are close to a given query point in this multi-dimensional space. Vector databases provide efficient indexing and querying capabilities for such operations, often leveraging advanced mathematical and computational techniques to ensure fast and accurate results.

What is PGVector?

pgvector is an extension for PostgreSQL that adds support for storing and querying vector data. It allows users to leverage PostgreSQL's powerful database capabilities while adding specialized functionality for vector operations. With pgvector, you can store high-dimensional vectors, perform similarity searches, and integrate vector operations seamlessly with your existing PostgreSQL databases.

In this article, we will be using PostgreSQL as our database. You can maintain your database in any database management system. For a convenient deployment option, consider cloud-based solutions like Rapidapp, which offers managed PostgreSQL databases, simplifying setup and maintenance.

tip

Create a free database with pgvector support in Rapidapp in seconds here

How Spring Integrates with Vector Databases

Spring, a popular framework for building Java applications, provides robust support for integrating with various types of databases, including vector databases like pgvector. Using Spring AI PGVector Store, developers can easily manage data access and integrate vector operations into their applications. Spring AI offers additional capabilities to enhance machine learning and AI integrations, making it a powerful choice for applications that require advanced data handling and analytics.

Creating a Spring Project

To get started, we'll create a new Spring project. This can be done using Spring Initializr or any other method you prefer. For simplicity, we'll use Spring Initializr here.

  1. Navigate to Spring Initializr: Open your browser and go to Spring Initializr.
  2. Project Settings: Set the following options:
    • Project: Maven Project
    • Language: Java
    • Spring Boot: (select the latest stable version)
    • Dependencies: Add Spring Web Download the project and unzip it. Open the pom.xml file and add the following dependencies to the dependencies section:
pom.xml
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-transformers-spring-boot-starter</artifactId>
</dependency>

Application YAML Configuration

Next, we need to configure our application to use PostgreSQL as a vector store. Update your application.yaml file as follows:

application.yaml
spring:
datasource:
url: jdbc:postgresql://<host>:<port>/<db>?application_name=rapidapp_spring_ai
username: <user>
password: <password
ai:
ollama:
embedding:
enabled: false
vectorstore:
pgvector:
index-type: hnsw
distance-type: cosine_distance
dimensions: 384

index-type: Specifies the type of index to be used for vector data. Common options include ivfflat and hnsw.

dimension: Indicates the dimensionality of the vectors being stored.

distance-type: Defines the distance metric used for similarity search, such as l2 (Euclidean distance) or ip (Inner Product).

Index Types

You can see the brief descriptions of index types used in pgvector below, but if you want to know more, you can refer here

HNSW

HNSW (Hierarchical Navigable Small World) is an advanced indexing algorithm designed for efficient approximate nearest neighbor search in high-dimensional spaces. It builds a graph structure where each node represents a vector, and edges represent connections to other vectors. The graph is navigable through multiple layers, allowing for fast and scalable searches by traversing the most relevant nodes. HNSW is known for its high accuracy and low search latency, making it suitable for real-time applications requiring quick similarity searches.

IVF Flat

IVF Flat (Inverted File Flat) is a popular indexing method that partitions the vector space into clusters using a coarse quantizer. Each vector is assigned to a cluster, and an inverted list is maintained for each cluster containing the vectors assigned to it. During a search, only the clusters closest to the query vector are examined, significantly reducing the number of comparisons needed. IVF Flat provides a good balance between search speed and accuracy, and it is especially effective when dealing with large datasets, as it limits the scope of the search to relevant clusters.

Distance Types

Distance types are metrics used to measure the similarity or dissimilarity between vectors in a vector database. Different applications and data types may require different distance metrics to ensure accurate and meaningful results. Here are some commonly used distance types

Euclidean Distance (L2)

This is the most widely used distance metric, measuring the straight-line distance between two points in a multi-dimensional space. It's calculated as the square root of the sum of the squared differences between corresponding elements of the vectors. Euclidean distance is suitable for general-purpose similarity searches and is often used in clustering algorithms.

Cosine Similarity

This metric measures the cosine of the angle between two vectors, providing a value between -1 and 1. Cosine similarity is particularly useful when the magnitude of the vectors is not important, focusing instead on the direction. It's commonly used in text mining and natural language processing to measure the similarity of documents or word embeddings.

Inner Product (Dot Product)

This metric calculates the sum of the products of corresponding elements of two vectors. It's often used in neural networks and machine learning models to measure the alignment between vectors. Inner product similarity is useful when comparing vectors where higher values indicate greater similarity.

Manhattan Distance (L1)

Also known as the city block distance, it measures the sum of the absolute differences between corresponding elements of two vectors. Manhattan distance is useful in scenarios where differences in individual dimensions are more significant than the overall geometric distance, such as in certain types of image processing.

Hamming Distance

This metric counts the number of positions at which the corresponding elements of two vectors are different. It's mainly used for binary vectors or strings of equal length, making it suitable for applications in error detection and correction, as well as DNA sequence analysis.

Choosing the right distance type depends on the specific requirements of your application and the nature of your data. Each distance metric has its strengths and weaknesses, and understanding these can help optimize the performance and accuracy of similarity searches in your vector database.

Implementing a Document Controller

Create a new controller that will manage vector data operations. Start by defining a service to handle vector store interactions.

DocumentController.java
@RestController
@RequestMapping("/documents")
class DocumentController {

@Autowired
private VectorStore vectorStore;

@PostMapping
public void create(@RequestBody CreateDocumentRequest request) {
vectorStore.add(List.of(new Document(request.text(), request.meta())));
}

@GetMapping
public String list(@RequestParam("query") String query) {
List<Document> results = vectorStore.similaritySearch(SearchRequest.query(query).withTopK(5));
return results.toString();
}
}

Create Documents

# Document 1
curl \
-H "Content-Type: application/json" \
-d '{"text": "Prometheus collects metrics from targets by scraping metrics HTTP endpoints. Since Prometheus exposes data in the same manner about itself, it can also scrape and monitor its own health.", "meta": {"category": "getting-started"}}' \
http://localhost:8080/documents

# Document 2
curl \
-H "Content-Type: application/json" \
-d '{"text": "Prometheus local time series database stores data in a custom, highly efficient format on local storage.", "meta": {"category": "storage"}}' \
http://localhost:8080/documents

Search Documents

curl http://localhost:8080/documents?query="scrape"

Conclusion

Integrating Spring AI with vector databases like pgvector provides powerful capabilities for handling vector data and performing advanced similarity searches. By leveraging Spring's robust framework and pgvector's specialized vector operations, developers can build sophisticated applications that effectively manage and analyze high-dimensional data. Rapidapp further enhances this setup with its user-friendly interface and built-in vector store support, making it easier than ever to develop and maintain vector-based applications..

tip

You can find the complete source code for this project on GitHub.

· 7 min read
Huseyin BABAL

Introduction

In today’s fast-paced development landscape, creating robust and scalable applications quickly is essential. Leveraging jHipster, PostgreSQL, and Elasticsearch can streamline this process. This article walks you through the steps of building a demo project, showcasing the integration of these powerful tools in just 10 minutes.

Why jHipster?

jHipster accelerates application development by providing a complete stack, including front-end and back-end technologies. It generates high-quality code, follows best practices, and offers extensive tooling, making it a go-to solution for developers seeking efficiency and reliability.

Prerequisites

PostgreSQL

In this article, we will be using PostgreSQL as our database. You can maintain your database in any database management system. For a convenient deployment option, consider cloud-based solutions like Rapidapp, which offers managed PostgreSQL databases, simplifying setup and maintenance.

tip

Create a free database in Rapidapp in seconds here

Elasticsearch

We will be using Elasticsearch for the search engine. Elasticsearch is an open-source, distributed, and scalable search engine which you can deploy on-premises or in the cloud. You can use Elastic Cloud if you don't want to maintain your own instance.

jHipster CLi

To get started with jHipster, you'll need to install jHipster CLi.

Getting Started

You can simply run jhipster command in your terminal and follow the prompts to get started as shown below. Do not forget to provide your own namings to fields like application name, package name etc.

? What is the base name of your application? demo
? Which *type* of application would you like to create? Monolithic application (recommended for simple projects)
? What is your default Java package name? com.huseyinbabal.demo
? Would you like to use Maven or Gradle for building the backend? Maven
? Do you want to make it reactive with Spring WebFlux? No
? Which *type* of authentication would you like to use? JWT authentication (stateless, with a token)
? Besides JUnit, which testing frameworks would you like to use?
? Which *type* of database would you like to use? SQL (H2, PostgreSQL, MySQL, MariaDB, Oracle, MSSQL)
? Which *production* database would you like to use? PostgreSQL
? Which *development* database would you like to use? PostgreSQL
? Which cache do you want to use? (Spring cache abstraction) Ehcache (local cache, for a single node)
? Do you want to use Hibernate 2nd level cache? Yes
? Which other technologies would you like to use? Elasticsearch as search engine
? Which *framework* would you like to use for the client? React
? Besides Jest/Vitest, which testing frameworks would you like to use?
? Do you want to generate the admin UI? Yes
? Would you like to use a Bootswatch theme (https://bootswatch.com/)? Default JHipster
? Would you like to enable internationalization support? No
? Please choose the native language of the application English

This will generate a full-stack application where PostgreSQL and Elasticsearch is configured and enabled during application application startup. Now that you have the basic setup, you can start configuring the datasource.

PostgreSQL Configuration

Once you open project folder in your favourite IDE, you can see the generated application*.yaml files under src/main/resources/config folder. Since we are doing local development for now, you can open application-dev.yaml and configure datasource as follows.

application-dev.yaml
spring:
datasource:
url: jdbc:postgresql://<host>:<port>/<db_name>?sslmode=require&application_name=rapidapp_jhipster # You can find details on Rapidapp db details page.
username: <username>
password: <password>

Line 3: You can find the DB connection details on Rapidapp db details page.

Elasticsearch Configuration

We will configure Elasticsearch as a search engine in our application. Once you create your own Elasticsearch instance, or create one in Elastic Cloud, note your Elasticsearch credentials to use them in the following configuration section.

application-dev.yaml
spring:
elasticsearch:
uris: https://elastic:<password>@<host>:<port>

Running the Application

Now that you have configured your database and search engine, you can start the application with the following command:

./mvn

Above command will do the followings;

  • Build the frontend and backend projects
  • Start the backend project while running liquibase asynchronously. Liquibase will prepare the database schema by using your entities.
  • Start the frontend project. If everything goes well, you will see an output as follows;
2024-06-19T17:01:11.056+03:00  INFO 65684 --- [  restartedMain] com.huseyinbabal.jdemo.JDemoApp          :
----------------------------------------------------------
Application 'jDemo' is running! Access URLs:
Local: http://localhost:8080/
External: http://192.168.1.150:8080/
Profile(s): [dev, api-docs]
----------------------------------------------------------

You can simply navigate to http://localhost:8080/ to access the application. It will show you the default credentials for users with admin and user rights, you can login with admin:admin credentials to see how admin UI looks like. You can see the critical components below;

  • Entities: Entities used in this application. We will see this soon to create our own entities to use in the application.
  • Administration > Metrics: You can see several metrics like JVM, Cache, HTTP statistics.
  • Administration > Health: You can see the health information of the application like db, disk health.
  • Administration > Logs: You can see the log configuration of the application where you can set log level in root or package level. Feel free to walk through the menus in Admin UI menu to get familiar with them, meanwhile, let's see how we can add our own entities to application.

Adding Entities

We will add our own entities to the application. Let's create a new entity called Product and add it to the application with the following command

jhipster entity product

It will prompt you to add fields for this entity. You can use following fields;

  • title: String
  • description: String
  • price: Float Once it is done, it will create necessary entity in codebase and related controller for the CRUD operations. src/main/java/<package>/domain/Product.java contains the generated entity class and src/main/java/<package>/repository/ProductRepository.java contains the generated repository class. In order to access resource information which is the presentation layer, you can take a look at src/main/java/<package>/web/rest/ProductResource.java. Let's take a look at how it creates a product as shown below.
ProductResource.java
@PostMapping("")
public ResponseEntity<Product> createProduct(@RequestBody Product product) throws URISyntaxException {
log.debug("REST request to save Product : {}", product);
if (product.getId() != null) {
throw new BadRequestAlertException("A new product cannot already have an ID", ENTITY_NAME, "idexists");
}
product = productRepository.save(product);
productSearchRepository.index(product);
return ResponseEntity.created(new URI("/api/products/" + product.getId()))
.headers(HeaderUtil.createEntityCreationAlert(applicationName, false, ENTITY_NAME, product.getId().toString()))
.body(product);
}

Line 8: productSearchRepository.index(product) is used to index the product in Elasticsearch. You see how easy it is to store product data in elasticsearch. We haven't written any code for that, but since we have added elasticsearch config, jHipster becomes an elasticsearch-aware system where it also generates required functions for you.

In same way, let's see how it searches product as shown below.

ProductResource.java
@GetMapping("/_search")
public ResponseEntity<List<Product>> searchProducts(
@RequestParam("query") String query,
@org.springdoc.core.annotations.ParameterObject Pageable pageable
) {
log.debug("REST request to search for a page of Products for query {}", query);
try {
Page<Product> page = productSearchRepository.search(query, pageable);
HttpHeaders headers = PaginationUtil.generatePaginationHttpHeaders(ServletUriComponentsBuilder.fromCurrentRequest(), page);
return ResponseEntity.ok().headers(headers).body(page.getContent());
} catch (RuntimeException e) {
throw ElasticsearchExceptionMapper.mapException(e);
}
}

Line 8: productSearchRepository.search(query, pageable) is used to search product in elasticsearch.

Product Entity in Admin UI

As you can see, we have created a product entity in Admin UI. You can see it in Entities menu. Once you navigate to Product module, you can create new Product, list products, view details of a product or delete any of them. Also, there is a search bar where you can search products with the help of Elasticsearch on the backend side.

Conclusion

We have seen that you can leverage jHipster's rapid development capabilities along with the robust data management of PostgreSQL and powerful search functionality of Elasticsearch. jHipster simplifies and accelerates application creation with its comprehensive toolset and best practices, while PostgreSQL ensures reliable and efficient data handling. Elasticsearch adds advanced search capabilities, making your application both scalable and responsive. Utilizing Rapidapp's PostgreSQL as a service further streamlines database management, allowing you to focus on developing high-quality applications quickly and effectively.

tip

You can find the complete source code for this project on GitHub.