[Booby-Funkybeat-hugo]'s cahier de brouillons

Voici le nouveau support d'écriture de funkybeat qui lui permet de stocker les adresses des pages qu'il a lues, devrait, fait semblant d'avoir lues...

lundi, août 30, 2010

Strategies for online inference of model-based clustering in large and growing networks

In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the U.S. political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off between precision and speed, when estimating parameters for mixture distributions in the context of random graphs.

Paper is published in AOAS and the pdf is available here.

mardi, mars 16, 2010

Clustering based on random graph model embedding vertex features

Large datasets with interactions between objects are common to numerous scientific fields including the social sciences and biology, as well as being a feature of specific phenomena such as the internet. The interactions naturally define a graph, and a common way of exploring and summarizing such datasets is graph clustering. Most techniques for clustering graph vertices use only the topology of connections, while ignoring information about the vertices’ features. In this paper we provide a clustering algorithm that harnesses both types of data, based on a statistical model with a latent structure characterizing each vertex both by a vector of features and by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method using real datasets based on hypertext documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features.

The future of Constellations ? Read the paper here or be crafty ;-)

samedi, juin 06, 2009

Oh my God, it's full of stars !!

To get a closer idea of the MixNet algorithm behavior applied on hypertext documents, just take a look at the Exalead application "Constellations".



Enjoy !

lundi, septembre 15, 2008

MixNet software package

Estimates the parameters and hidden class variable of a Mixture of Erdös renyi Random Graphs. The estimation is performed for a Bernouilli Mixture. Packages also implement the previous post !

Code : native or R wrapper.

vendredi, août 22, 2008

Fast online graph clustering via Erdős–Rényi mixture

In the context of graph clustering, we consider the problem of simultaneously estimating both the partition of the graph nodes and the parameters of an underlying mixture of affiliation networks. In numerous applications the rapid increase of data size over time makes classical clustering algorithms too slow because of the high computational cost. In such situations online clustering algorithms are an efficient alternative to classical batch algorithms. We present an original online algorithm for graph clustering based on a Erdős–Rényi graph mixture. The relevance of the algorithm is illustrated, using both simulated and real data sets. The real data set is a network extracted from the French political blogosphere and presents an interesting community organization.

Read the Pattern Recognition paper.

lundi, décembre 03, 2007

Mathematics for Biological Networks

The "Mathematics for Biological Networks" Conference will be held on December 17-18 2007 at the Institut Henri Poincaré, Paris. It is a free access interdisciplinary conference in the field of network analysis focusing on applications in molecular biology. Students are encouraged to come and take part in discussions. No registration is required.

mardi, mai 23, 2006

Automated Metadata Hierarchy Derivation

L'article présente une méthode automatique pour construire une hierarchie de metadata d'un ensemble de sites web sans passer par utilisation de hierarchies externes déjà prédéfinies.

L'approche, dans nos expérimentation, confirme globalement les informations issues des analyses principalement topologiques (analyse de la connectivité entre les sites) du groupe rtgi : les sites appartenant aux mêmes clusters topologiques partagent en général la même hierarchie de concepts sémantiques.

De l'optimisation et des réglages sont en encore à étudier pour utiliser cette méthode sur des corpus plus importants. Il faudra donc attendre un peu avant de voir une version "flashy" sur l'observatoire présidentielle...

Bravo à Amjad pour son travail et sa présentation à Damas.