12

I am try to find a way to bring back only items in blob storage with metadata that matches a particular piece of data. All fields will have a key called 'FlightNo'.

What I want really want is a way to find all files (listBlobs) that contain a match to the metadata, so one level up, then iterate through that set of data, and find further matches as each file has 5 items of metadata.

Here is my very unfriendly code to date.

 foreach (IListBlobItem item in container.ListBlobs(null, false))
        {
            if (item.GetType() == typeof(CloudBlockBlob))
            {

                CloudBlockBlob blob = (CloudBlockBlob)item;

                blob.FetchAttributes();

                foreach (var metaDataItem in blob.Metadata)
                {
                    dictionary.Add(metaDataItem.Key, metaDataItem.Value);
                }

                if (dictionary.Where(r=>r.Key == "FlightNo" && r.Value == FlightNo).Any())
                {
                    if (dictionary.Where(r => r.Key == "FlightDate" && r.Value == FlightDate).Any())
                    {
                        if (dictionary.Where(r => r.Key == "FromAirport" && r.Value == FromAirport).Any())
                        {
                            if (dictionary.Where(r => r.Key == "ToAirport" && r.Value == ToAirport).Any())
                            {
                                if (dictionary.Where(r => r.Key == "ToAirport" && r.Value == ToAirport).Any())
                                {
                                    retList.Add(new BlobStorage()
                                    {
                                        Filename = blob.Name,
                                        BlobType = blob.BlobType.ToString(),
                                        LastModified = (DateTimeOffset)blob.Properties.LastModified,
                                        ContentType = blob.Properties.ContentType,
                                        Length = blob.Properties.Length,
                                        uri = RemoveSecondary(blob.StorageUri.ToString()),
                                        FlightNo = dictionary.Where(r => r.Key == "FlightNo").Select(r => r.Value).SingleOrDefault(),
                                        Fixture = dictionary.Where(r => r.Key == "FixtureNo").Select(r => r.Value).SingleOrDefault(),
                                        FlightDate = dictionary.Where(r => r.Key == "FlightDate").Select(r => r.Value).SingleOrDefault(),
                                        FromAirport = dictionary.Where(r => r.Key == "FromAirport").Select(r => r.Value).SingleOrDefault(),
                                        ToAirport = dictionary.Where(r => r.Key == "ToAirport").Select(r => r.Value).SingleOrDefault()
                                    });

                                }
                            }
                        }
                    }
                }

                dictionary.Clear();
            }
        }

Thanks. Scott

2
  • 1
    Not exactly sure what your question is. But... searching blob metadata is not an efficient operation, since there's no indexing. You might consider using some type of database to hold your metadata, to facilitate querying. Commented Sep 29, 2017 at 2:47
  • 1
    Indexing Blob metadata and using Azure Search now makes searching blob metadata a perfectly efficient operation. Commented Jun 6, 2018 at 8:16

4 Answers 4

35

The accepted answer is highly inefficient, looping through and loading every single Blob and their associated Metadata to check for values wouldn't perform very well with any reasonable volume of data.

It is possible to search Blob meta data using Azure Search. A search index can be created that includes Blobs custom meta data.

The following comprehensive articles explain it all:

Indexing Documents in Azure Blob Storage with Azure Search
Searching Blob storage with Azure Search

Sign up to request clarification or add additional context in comments.

1 Comment

3

Although still in preview, with Blob Index, you can now do a query search on blob metadata (tags).

You won't need to loop thru all of your blobs until you find what you're looking for.

Here's a snippet from the full article:

Blob Index—a managed secondary index, allowing you to store multi-dimensional object attributes to describe your data objects for Azure Blob storage—is now available in preview. Built on top of blob storage, Blob Index offers consistent reliability, availability, and performance for all your workloads. Blob Index provides native object management and filtering capabilities, which allows you to categorize and find data based on attribute tags set on the data.

1 Comment

this probably shoudl be the answer in 2020. However, blob index does cost additionally.
0

If I understand correctly that you want to search the blobs that contain all of 5 you mentioned items metadata. You could use the following code to do that. I test it on my side, it works correctly.

var connectionString = "storage connection string";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("container");
var blobs = container.ListBlobs();
var blobList = new List<CloudBlockBlob>();
foreach (var item in blobs)
 {
      CloudBlockBlob blob = (CloudBlockBlob)item;

      blob.FetchAttributes();
      if (blob.Metadata.Contains(new KeyValuePair<string, string>("FlightNo", "FlightNoValue")) &&
         blob.Metadata.Contains(new KeyValuePair<string, string>("FlightDate", "FlightDateValue")) &&
         blob.Metadata.Contains(new KeyValuePair<string, string>("FromAirport", "FromAirportValue")) &&
         blob.Metadata.Contains(new KeyValuePair<string, string>("ToAirport", "ToAirportValue")) && 
         blob.Metadata.Contains(new KeyValuePair<string, string>("FixtureNo", "FixtureNoValue")))
      {
          blobList.Add(blob);
      }

3 Comments

thank you. will test out and come back to you. sorry for the delay. appreciated.Scot
If you don't need to check the value of Metadata, we could use blob.Metadata.ContainsKey("KeyName")
works perfect. Sorry for the delay. Many thanks Scott
0

You cannot search for metadata directly, but you can use tags which are sort of the same as metadata from a practical point of view. Tags are indexed by the storage and the code to search for matching blobs are very straight forward:

    var query = $"@container = 'invoices' AND brand = 'volvo'";
    
    await foreach (var blob in blobServiceClient.FindBlobsByTagsAsync(query))
    {
        Console.WriteLine($"Container: {blob.BlobContainerName}, Blob: {blob.BlobName}");
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.