Milvus Online Store Dimension Mismatch Error in Push API and Materialization

## Summary
Feast's Milvus online store integration has a critical dimension mismatch bug that affects both the push API and materialization approaches. When storing embeddings with correct dimensions (384), Feast internally transforms the data incorrectly, causing Milvus to reject the data with dimension errors.

## Environment
- **Feast version**: 0.51.0
- **Python version**: 3.12.11
- **pymilvus version**: 2.3.0+
- **OS**: macOS (Darwin 24.5.0)
- **Milvus**: milvus-lite (via `path: data/online_store.db`)

## Bug Description

### Error Message
```
ERROR:pymilvus.decorators:RPC error: [upsert_rows], <MilvusException: (code=65535, message=the length(7695) of float data should divide the dim(384): )>
```

### Expected Behavior
- Input: 5 embeddings × 384 dimensions = 1920 total elements
- Feast should store these embeddings correctly in Milvus
- Expected elements sent to Milvus: 1920

### Actual Behavior
- Input: 5 embeddings × 384 dimensions = 1920 total elements  
- Feast transforms this to 7695 elements (factor of ~4x)
- Milvus rejects the data because 7695 ÷ 384 = 20.04... (not integer)

## Steps to Reproduce

### 1. Feature Store Configuration
```yaml
# feast_feature_repo/feature_store.yaml
project: rag
provider: local
registry: data/registry.db
online_store:
  type: milvus
  path: data/online_store.db
  vector_enabled: true
  embedding_dim: 384
  index_type: "FLAT"
  metric_type: "L2"
offline_store:
  type: file
entity_key_serialization_version: 3
auth:
  type: no_auth
```

### 2. Feature Definitions
```python
from feast import Entity, FeatureView, Field, FileSource, PushSource
from feast.types import Array, Float32, String, Int64
from feast.value_type import ValueType
from datetime import timedelta

document = Entity(
    name="document_id",
    value_type=ValueType.STRING,
    description="Unique identifier for document chunks"
)

document_embeddings_source = FileSource(
    name="document_embeddings_source",
    path="data/document_embeddings.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp",
)

document_embeddings_push_source = PushSource(
    name="document_embeddings_push_source",
    batch_source=document_embeddings_source,
)

document_embeddings = FeatureView(
    name="document_embeddings",
    entities=[document],
    ttl=timedelta(days=365),
    schema=[
        Field(name="embedding", dtype=Array(Float32), vector_index=True),
        Field(name="chunk_text", dtype=String),
        Field(name="document_title", dtype=String),
        Field(name="chunk_index", dtype=Int64),
        Field(name="file_path", dtype=String),
        Field(name="chunk_length", dtype=Int64),
    ],
    online=True,
    source=document_embeddings_push_source,
    tags={"team": "rag", "version": "v3"},
)
```

### 3. Reproduce with Push API
```python
import pandas as pd
import numpy as np
from datetime import datetime
from sentence_transformers import SentenceTransformer
from feast import FeatureStore
from feast.data_format import PushMode

# Generate test embeddings (384 dimensions)
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = [
    'Test document 1',
    'Test document 2', 
    'Test document 3',
    'Test document 4',
    'Test document 5'
]
embeddings = model.encode(texts)  # Shape: (5, 384)

# Create DataFrame
feature_data = []
for i, (text, embedding) in enumerate(zip(texts, embeddings)):
    feature_data.append({
        "document_id": f"test_doc_{i}",
        "embedding": embedding.tolist(),  # Convert to list as per docs
        "chunk_text": text,
        "document_title": "test_document.md",
        "chunk_index": i,
        "file_path": "test_path",
        "chunk_length": len(text),
        "event_timestamp": pd.Timestamp.now(tz='UTC'),
        "created_timestamp": pd.Timestamp.now(tz='UTC')
    })

df = pd.DataFrame(feature_data)
print(f"Input data: {len(df)} rows, {len(df) * 384} total elements")

# Initialize Feast store
fs = FeatureStore(repo_path="feast_feature_repo")

# This will fail with dimension mismatch
fs.push(
    push_source_name="document_embeddings_push_source",
    df=df,
    to=PushMode.ONLINE_AND_OFFLINE
)
```

### 4. Reproduce with Materialization
```python
# Save to parquet file
df.to_parquet('feast_feature_repo/data/document_embeddings.parquet', index=False)

# Try materialization
from datetime import timedelta
end_time = datetime.now()
start_time = end_time - timedelta(hours=1)

# This will also fail with same dimension mismatch
fs.materialize(
    start_date=start_time,
    end_date=end_time,
    feature_views=["document_embeddings"]
)
```

## Investigation Results

### Data Validation
Our debugging confirmed:
- ✅ Input embeddings are exactly 384 dimensions each
- ✅ DataFrame contains 5 rows × 384 = 1920 total elements
- ✅ Embeddings converted to Python lists correctly
- ✅ Data types are correct (`Array(Float32)`)
- ❌ Feast somehow transforms 1920 → 7695 elements internally

### Affected Methods
1. **Push API**: `store.push()` with `PushMode.ONLINE_AND_OFFLINE`
2. **Materialization**: `store.materialize()` from parquet files
3. Both fail with identical dimension mismatch errors

## Expected Fix
Feast should correctly handle `Array(Float32)` fields when:
1. Pushing data via push API
2. Materializing data from parquet files
3. The dimension transformation logic needs debugging/fixing

## Potential Root Cause
The issue appears to be in Feast's internal serialization/transformation of `Array(Float32)` fields when interfacing with Milvus. The ~4x multiplication factor (1920 → 7695) suggests there might be:
- Incorrect flattening of nested arrays
- Multiple serialization passes
- Data type conversion issues in the Milvus online store adapter

## Workaround
Currently using direct `pymilvus.MilvusClient` integration which works perfectly with the same data, confirming the issue is within Feast's Milvus adapter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Milvus Online Store Dimension Mismatch Error in Push API and Materialization #5551

Summary

Environment

Bug Description

Error Message

Expected Behavior

Actual Behavior

Steps to Reproduce

1. Feature Store Configuration

2. Feature Definitions

3. Reproduce with Push API

4. Reproduce with Materialization

Investigation Results

Data Validation

Affected Methods

Expected Fix

Potential Root Cause

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Milvus Online Store Dimension Mismatch Error in Push API and Materialization #5551

Description

Summary

Environment

Bug Description

Error Message

Expected Behavior

Actual Behavior

Steps to Reproduce

1. Feature Store Configuration

2. Feature Definitions

3. Reproduce with Push API

4. Reproduce with Materialization

Investigation Results

Data Validation

Affected Methods

Expected Fix

Potential Root Cause

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions