Skip to main content

Into the Rabbit Hole

Vector Dimensionsโ€‹

The dimension of a vector refers to how many numbers it contains. Understanding dimensions is crucial for working with vectors effectively!

Common Dimensions in AIโ€‹

  • Low-dimensional: 2D-100D (simple features, basic embeddings)
  • Medium-dimensional: 100D-1000D (word embeddings like Word2Vec)
  • High-dimensional: 1000D+ (modern language models like OpenAI's embeddings are 1536D)

Example ๐Ÿ’ก: A 3D vector [0.5, 0.8, 0.2] has 3 dimensions, while GPT-3 embeddings have 1536 dimensions!

The Curse of Dimensionalityโ€‹

As dimensions increase, vectors become sparse and similarity calculations become more challenging. However, high dimensions allow capturing more nuanced relationships!

How Vectors Capture Meaning (Embeddings)โ€‹

Embeddings are vectors that represent semantic meaning. The key insight: similar concepts have similar vectors!

Word Embeddings Exampleโ€‹

"king" - "man" + "woman" โ‰ˆ "queen"

This famous example shows how vector math can capture relationships between concepts!

Creating Embeddingsโ€‹

  1. Training: AI models learn to create vectors where similar items are close together
  2. Context: Models like BERT consider surrounding words to create better embeddings
  3. Fine-tuning: Embeddings can be specialized for specific domains (medical, legal, etc.)

Measuring Vector Similarityโ€‹

To find similar vectors, we need mathematical ways to measure distance or similarity!

1. Cosine Similarityโ€‹

  • Range: -1 to 1 (1 = identical, 0 = orthogonal, -1 = opposite)
  • Best for: Text and semantic similarity
  • Why: Focuses on direction, not magnitude

2. Euclidean Distanceโ€‹

  • Range: 0 to โˆž (0 = identical, larger = more different)
  • Best for: Spatial data, image features
  • Why: Measures straight-line distance

3. Dot Productโ€‹

  • Range: -โˆž to โˆž
  • Best for: Quick similarity checks
  • Why: Fast computation, captures both direction and magnitude

Practical Exampleโ€‹

# Two document vectors 
doc1 = [0.8, 0.2, 0.6] # "AI and machine learning"
doc2 = [0.9, 0.1, 0.7] # "Artificial intelligence and ML"

# These vectors are similar (both about AI/ML)
# Cosine similarity would be close to 1

Vector Operationsโ€‹

Vectors support mathematical operations that enable powerful AI applications!

Key Operationsโ€‹

  • Addition: [1,2] + [3,4] = [4,6] - Combines concepts
  • Subtraction: [5,6] - [1,2] = [4,4] - Finds differences
  • Scalar Multiplication: 2 ร— [1,2] = [2,4] - Amplifies magnitude

Real Applicationsโ€‹

  • Analogy: king - man + woman = queen
  • Clustering: Group similar vectors together
  • Interpolation: Find vectors "between" two concepts

Getting Startedโ€‹

Quick Implementationโ€‹

# Example: Simple vector similarity
import numpy as np

def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare two text embeddings
text1_vector = [0.1, 0.8, 0.3] # "I love AI"
text2_vector = [0.2, 0.7, 0.4] # "AI is amazing"

similarity = cosine_similarity(text1_vector, text2_vector)
print(f"Similarity: {similarity:.2f}") # Higher = more similar!
Next Steps
  1. Experiment with OpenAI's embedding API to create your own vectors
  2. Try a simple vector database like pgvector
  3. Build a basic semantic search application
  4. Explore advanced topics like fine-tuning embeddings

Key Takeawaysโ€‹

  • Vectors convert any data type into numbers that computers can process
  • Dimensions matter - more dimensions = more nuanced relationships
  • Similarity measures help find related content automatically
  • Vector databases make similarity search fast and scalable
  • Applications span from search to recommendations to AI assistants

Understanding vectors unlocks the ability to build intelligent applications that can understand meaning, not just exact matches.