Skip to content

DocArray v2 third alpha release note #1153

@samsja

Description

@samsja

DocArray v2 alpha release

DocArray v2 has its third alpha release as planned in the roadmap

What is new ?

Tensorflow support (#1064) (#1098)

In this version we added TensorFlow support to v2.
Most importantly, this includes a TensorFlowTensor class and a corresponding TensorFlowCompBackend.

import tensorflow as tf

from docarray import BaseDocument, DocumentArray
from docarray.typing import TensorFlowTensor


class MyDoc(BaseDocument):
    title: str
    tensor: TensorFlowTensor


da = DocumentArray[MyDoc](
    MyDoc(title=f'hello {i}', tensor=tf.zeros((224, 224, 3))) for i in range(100)
)

Pretty printing with rich (#1043)

Add pretty print and .summary() for Document's as well as DocumentArray's:
For a Document:
image

For a DocumentArray:
image

Display of different multi modal data type (#1113) (#1136)

You can now display your multi modal data with our predefined documents and types from a notebook! This applies to audio, image, video, as well as 3D data.
You can simply call .display() on the Documents url or its tensor(s):

For PointCloud3D:

doc = PointCloud3D(url='tests/toydata/tetrahedron.obj')
doc.tensors = doc.url.load(samples=10000)
doc.tensors.display()
# or via url
doc.url.display()

image

For Mesh3D:

doc = Mesh3D(url='tests/toydata/tetrahedron.obj')
doc.tensors = doc.url.load()
doc.tensors.display()
# or via url
doc.url.display()

image

Pytorch Multi Modal dataset

You can now easily utilise DocumentArrays in PyTorch training scripts using MultiModalDataset.
All you need is a DocumentArray and a dictionary of preprocessing functions and you’re up and running.

from torch.utils.data import DataLoader
from docarray import DocumentArray, BaseDocument
from docarray.data import MultiModalDataset
from docarray.documents import Text

class Thesis(BaseDocument):
    title: Text

class Student(BaseDocument):
    thesis: Thesis

da: DocumentArray[Student] = get_students()
ds: MultiModalDataset[Student] = MultiModalDataset[Student](da, preprocessing={"thesis.title": embed_title, "thesis": normalize_embedding})
loader: DataLoader = DataLoader(ds, batch_size=4, collate_fn=MultiModalDataset[Student].collate_fn)

# Use your loader just like any other dataloader for awesome DL training

More serialization options

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions