Large Scale Machine Learning and Other Animals: GraphLab-Workshop

I am absolutely excited to report that we are organizing the first Graphlab workshop. The workshop has two goals:

Expose distributed/multicore GraphLab v2 to a wider ML community using demos and tutorials. Meet other GraphLab users and hear what they are working on!
A meeting place for technology companies working on large scale machine learning. Companies are invited to present their related research and discuss future challenges.

Here is the current program committee:

Additional companies who confirmed their participation include:

Universities / research lab participating at the workshop:

Carnegie Mellon University
Johns Hopkins University
University of California, Berkeley
University of California, Santa Cruz
University of Pennsylvania
University of San Francisco
Lawrence Berkeley National Lab
Georgia Tech
Stanford University

Date
Workshop date is July 9th in San Francisco.

Registration
Online registration is now open.

50$ for student
100$ for early bird
150$ for regular registration

If you are interested in getting additional information about the workshop, please email me.

Sunday, March 11, 2012

Open Connectome Project

A couple of days ago I sent out an initial announcement about our planned GraphLab workshop and immediately I started getting a lot of interesting feedback from my blog readers.

Joshua Vogelstein, a researcher at the Dept. of Applied Mathematics & Statistics, Johns Hopkins University just sent me a note a would like very much to participate in our workshop. Joshua is a part of the Open Connectome project, a very interesting project in the area of neuroscience. The project mission is to allow open access for neuro data for researchers worldwide. Here is some examples for the data they are hosting:

I have no clue what the above picture means (although I must admit they look pretty cool)!!.. so I asked Joshua to describe in a little more detail the problems he is working on. This is what I got from him:

We have two very different kinds of data:

1) EM Connectomes - each dataset is a volumetric image of part of some animal brain, ranging in size from 1GB and 10TB. you can look at the data in 2D here
the first project is 10TB. we also designed a RESTful interface to facilitate anybody downloading and processing the data. the instructions for using it are here.
another thing that we have, but haven't yet provided the documentation for, is an annotation database. the idea is that anybody should be able to download some volume, annotate it, and upload it back to the server. we collect and store all the annotations, and can combine them to obtain meta-annotations. i expect that we'll release details for the annotation database in a week or so.

2) MR Connectomes - these are essentially multimodal images of human brains, including both time-varying and non-time-varying. the "multi" part of multimodal means that for each subject we have a number of different kinds of images. our plan for what to do with this stuff is here. Currently, we are organizing the data and pre-processing it. the output of the preprocessed data with be for each subject (there are a few thousand of them), we will have an O(10,000) vertex and O(100,000) edge graph. our vertices are attributed. in particular, each vertex has a 3d position as well as a whole time-series associated with it. we will implement a kind of spectral clustering on each graph (see this manuscript for the theoretical results of our algorithm).

Another interesting aspect in Joshua's work, is that he is part of the Institute for Data Intensive Engineering and Science which has a 5PB "Data-Scope".

Joshua is interested in exploring GraphLab at the first step for spectral clustering of brain image graphs. I promised to help him utilize our SVD and K-mean solvers and try them out on some of his data. I am looking forward to meeting Joshua at our workshop. I also think it is going to be very interesting if he could give a quick talk describing some of the challenges he is facing and what is needed out of GraphLab to help him solve them.

Large Scale Machine Learning and Other Animals

Saturday, April 7, 2012

The first GraphLab workshop is coming up!

Sunday, March 11, 2012

Open Connectome Project

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax