We use cookies on this site. By browsing our site you agree to our use of cookies. Close this message Find out more

Home > Computer Science home > Events > Recent seminars 2012
More in this section Events

# Recent seminars 2012

#### 20 November 2012

Steffen Zschaler, King's College London

#### Towards Verifiable Modular Model Transformations

Model-driven engineering (MDE) is based on the idea that we should be designing software systems at a level of abstraction higher than code---models---and that we should then generate the code from such models. To enable this idea, we need to be able to write model transformations---programmes that translate models into code. Current research in this area focuses on systematic ways of developing, maintaining, and validating these transformations.

In this talk, we will first give a very brief introduction to MDE and metamodelling. After that, we will discuss a number of approaches for modularising transformation specifications, with the goal of simplifying transformation maintenance and reuse. We will end our talk by presenting a particular modularisation technique which also allows some form of verification of the transformations produced.

#### 13 November 2012

Dr Martin Churchill, Swansea University

#### Modular Bisimulation Theory for Computations and Values

For structural operational semantics (SOS) of process algebras, various notions of bisimulation have been studied, together with rule formats ensuring that bisimilarity is a congruence. For programming languages, however, SOS generally involves auxiliary entities (e.g.\ stores) and computed values, and the standard bisimulation and rule formats are not directly applicable.

Here, we first introduce a notion of bisimulation based on the distinction between computations and values, with a corresponding liberal congruence format.  We then provide metatheory for a modular variant of SOS (MSOS) which provides a systematic treatment of auxiliary entities. This is based on a higher order form of bisimulation, and we formulate an appropriate congruence format. Finally, we show how algebraic laws can be proved sound for bisimulation with reference only
to the (M)SOS rules defining the programming constructs involved in them.  Such laws remain sound for languages that involve further constructs.

#### 31 October, 1, 2 November 2012

Professor Marcello Pelillo, University of Venice

#### 1: Introduction to the basic concepts of game theory 2: Evolutionary games and data clustering 3: Contextual pattern recognition and graph transduction

The development of game theory in the early 1940 s by John von Neumann was a reaction against the then dominant view that problems in economic theory can be formulated using standard methods from optimization theory. Indeed, most real- world economic problems typically involve conflicting interactions among decision-making agents that cannot be adequately captured by a single (global) objective function, thereby requiring a different, more sophisticated treatment. Accordingly, the main point made by game theorists is to shift the emphasis from optimality criteria to equilibrium conditions. As it provides an abstract theoretically-founded framework to elegantly model complex scenarios, game theory has found a variety of applications not only in economics and, more generally, social sciences but also in different fields of engineering and information technologies. In particular, in the past there have been various attempts aimed at formulating problems in computer vision, pattern recognition and machine learning from a game-theoretic perspective and, with the recent development of algorithmic game theory, the interest in these communities around game-theoretic models and algorithms is growing at a fast pace.

The goal of these three lectures is to offer an introduction to the basic concepts of game theory and to provide an overview of the work we're currently doing in my group on the use of game-theoretic models in pattern recognition, computer vision, and machine learning. I shall assume no pre-existing knowledge of game theory by the audience, thereby making the lectures self-contained and understandable by a non-expert.

The lectures are based on two (broader) tutorials I gave at ICPR 2010 and CVPR 2011 (with A. Torsello).

#### 23 October 2012

Professor Rob Hierons, School of Information Systems, Computing and Mathematics, Brunel University

#### Distributed Testing

Some systems interact with their environment at several physically distributed interfaces, called ports, and when testing such a system it is normal to place a local tester at each port. If the local testers cannot interact with one another during testing and there is no global clock, then each local tester observes only the sequence of inputs and outputs at its interfaces (a local trace). This can make it impossible to reconstruct the global trace that occurred and can also lead to controllability and observability problems in testing.

While there has been interest in test generation algorithms that overcome controllability and observability problems, such algorithms lack generality since this is not always possible. In addition, previous work has typically only considered the testing of deterministic systems based on deterministic models despite distributed systems often being non-deterministic. Crucially, if users are also affected by the restriction (only local traces are observed) then we should recognise this in the implementation relations used. This talk will describe recent work that has explored implementation relations for such distributed systems in the contexts of testing from a possibly non-deterministic finite state machine and testing from an input output transition system. This work has the potential to lead to more general test generation algorithms for distributed systems.

#### 6 July 2012

Javier Pereira, Informatics and Telecommunications Engineering School. Universidad Diego Portales, Santiago, Chile

#### Multicriteria Decision Making Methods: introduction and weights of criteria

Many MCDM  methods have been defined in last decades. Main differences among them concern axiomatics, definition of parameters and the aggregation process. In particular, definition of criteria weights is a critical issue since they intrinsically represent the rationale behind the applied method. In this talk we introduce two general approaches in MCDM and their respective aggregation processes in order to establish the importance among criteria.

#### 28 July 2012

Eun Jung Kim, CNRS, LAMSADE, Paris, France

#### A single-exponential FPT algorithm for the K4-minor cover problem

Given an input graph $G$ and an integer $k$, the parameterized K4-minor cover problem asks whether there is a set $S$ of at most $k$ vertices whose deletion results in a $K_4$-minor-free graph, or equivalently in a graph of treewidth at most $2$. This problem is inspired by two well-studied parameterized vertex deletion   problems, Vertex Cover and Feedback Vertex  Set, which can also be expressed as Treewidth-t Vertex Deletion problem: $t=0$ for  Vertex Cover and $t=1$ for Feedback Vertex Set.

It follows from the celebrated Courcelle's theorem or from the Graph Minor Theorem that, for every constant $t$, Treewidth-t Vertex Deletion problem is FPT. While a   single-exponential FPT algorithm has been known for a long time   for Vertex Cover, such an algorithm for Feedback Vertex Set was devised comparatively   recently. While it is known to be unlikely that Treewidth-t Vertex Deletion can be solved in time c^{o(k)}\cdot n^{O(1)}, it was open whether the K4-minor cover could be solved in single-exponential FPT time. This paper answers this question in the affirmative.
Joint work with Christophe Paul and Geevarghese Philip

#### 26 June 2012

Prof Alexander Shen, CNRS, Marseille

#### Randomness deficiency and the quantitative version of van Lambalgen theorem

The definition of algorithmic randomness classifies all infinite 0-1-sequence into two classes: ''random'' and ''non-random''. This classification is robust: changing one bit, one cannot get a non-random sequence from a random one. Consequences: there exist a random sequence that starts with 10000000 zeros. It would be natural to have a more quantitative classification that would say that this sequence is random, but has high (finite) ''randomness deficiency'' - while non-random sequences have infinite randomness deficiency. Such a definition was suggested long ago by Levin and Gacs, but (unfortunately) did not attract enough attention. Using it, one can get a quantitative versions of theorems about random sequences. For example, van Lambalgen theorem says that the pair (a,b) is random (with respect to the product measure of two uniform measures) if (1) a is random and (2) b is random with oracle a. The quantitative version makes this more precise and establishes relation between randomness deficiencies: d(a,b) is close to d(a)+d(b|a,d(a)).

#### 12 June 2012

Mark Herbster, Department of Computer Science, University College London

#### p-Resistive networks for graph-based transduction

A p-resistive network is an undirected graph with an edge-resistance (a positive scalar) associated with each edge. This may be viewed as a generalization of both an electrical network (p = 2) and of an undirected flow ''pipe'') network (p = 1). Such networks (graphs) are commonly used in graph-based transduction. For transduction, we are given a fixed set of objects, some of which are labeled and some of which are unlabeled, and we wish to predict the unlabeled objects. A graph is then defined where an edge between objects indicates similarity between objects. If the graph is weighted then the weights indicate the degree of similarity. These include the min-cut method  (p = 1) and the harmonic energy minimization (p = 2) procedure. We interpret these methods as specific instances of the minimization of a p-power. When p = 2 the analogy is that the graph is an electrical network; the edges are now resistors whose edge-resistance is the reciprocal of the similarity. The fixed labels from {-1, 1} now correspond to voltage constraints and the algorithm for labeling the graph is then to find the set of consistent voltages which minimize the power and then to predict with the ? sign of the voltages. In the case p = 1 this is equivalent to finding the label-consistent min-cut.  We will show how by choosing a p in (1, 2) this leads to an algorithm that obtains advantageous performance guarantees that are unobtainable for the special cases of p = 1, 2.  Finally, time-permitting we will identify the graph with a network of spins via the well-known Ising model.  We will then show that efficient prediction is obtainable through a "linear" embedding of the graphs.  For both $p$-resistive networks and Ising models we will give relative mistake bounds.

#### 8 June 2012

Anders Yeo, (formerly) Department of Computer Science, Royal Holloway, University of London

#### Different parameterizations of the Test Cover problem

In the Test Cover problem we are given a set  {1,...,n} of items together with a collection, T, of distinct subsets of these items called tests. We assume that T is a test cover, i.e., for each pair of items there is a test in T containing exactly one of the items. The objective is to find a minimum size subcollection of T which is still a test cover.

This problem is NP-hard, so we consider the following parameterizations of the problem, where k is the parameter

1. Is there a solution with at most k tests?
2. Is there a solution with at most n-k tests?
3. Is there a solution with at most m-k tests, where m is the size of T?
4. Is there a solution with at most log(n)+k tests?

The above is of interest as n and m are upper bounds for the size of an optimal solution and log(n) is a lower bounds. We show that the first two parameterizations above lead to the problem being FPT, i.e., there exists algorithms with runtime O(f(k) n^c) to solve them, where f is any function and c is any constant. The last two parameterizations are unlikely to have such algorithms unless there is an unlikely collapse of complexity classes.

#### 22 May 2012

Lorenzo Cavallaro, Information Security Group, Royal Holloway, University of London

#### Mining the Network Behavior of Bots

Bots are malware typically used to steal sensitive information, send SPAM, perform DDoS attacks and other illegal activities.  Research in bot and botnet detection has been prolific, producing detection mechanisms that mostly focus on specific C&C structures or on the correlation between the activities and the communication patterns exhibited the bots.  We present an approach to detect bot-infected hosts that (1) is independent from the botnet structure, (2) detects individually infected hosts, (3) deals with encrypted communications, (4) does not rely on the presence of noisy malicious activities, and (5) produces low false positives. Our technique records the network trace T_b of a bot b.  Subsequently, T_b is summarized into a set of network flows.  Similar flows are grouped and the resulting clusters are analyzed to highlights those traits that best describe b's behavior. Our analysis automatically produces a detection model for b, along with any inferred time-related properties, which is translated into a policy script for a network intrusion detection system (NIDS) to allow for a real-time detection of infected hosts.  We implemented and tested our approach on a large corpus of real-world traffic. The results show that our tool is able to generate effective bot detection profiles reporting very few false positives.

#### 17 May 2012

Alexandre Tsybakov, CREST-ENSAE

#### Aggregation by exponetial weighting and sparsity

Given a collection of estimators, the problem of linear, convex or model selection type aggregation consists in constructing a new estimator, called the aggregate, which is nearly as good as the best among them (or nearly as good as their best linear or convex combination), with respect to a given risk criterion. This problem can be also considered with a given dictionary of functions instead of estimators. When the underlying model is sparse, which means that it is well approximated by a linear combination of a small number of functions in the dictionary, the aggregation techniques turn out to be very useful in taking advantage of sparsity. Aggregates are usually constructed by mixing the initial estimators or functions of the dictionary with data-dependent weights that can be computed is several possible ways. In this talk we consider aggregates with exponential weights. We show that they have some remarkable properties. In particular, they satisfy sharp oracle inequalities that allow one to treat in a unified way the problems of aggregation and of sparse estimation.

#### 15 May 2012

Dijun Chen, Hangzhou University

#### The Chromatin Interactome of miRNAome

MicroRNAs (miRNAs), one class of endogenous ~22 nucleotide (nt) non-coding RNAs, have emerged as critical gene regulators in diverse biological pathways and various diseases. While currently numerous studies focus on defining the regulatory functions of miRNAs, our knowledge about transcriptional regulation of miRNAs, especially in higher-order chromatin context, is far from perfect. In this study, we systematically investigated transcriptional regulation of known miRNAs at three-dimensional (3D) context by integrated analysis of various data sets, including chromatin modifications, expression profiling patterns and chromatin interactions, using deep sequencing. We show that various active histone modifications as well as RNA Polymerase II (RNAPII) binding sites were enriched in the proximal promoters of expressed miRNAs, confirming that miRNAs and protein-coding genes share similar mechanisms of transcriptional regulation. We found that miRNA genes were cell-specifically co-expressed with host (for intragenic miRNAs) or nearby (for intergenic miRNAs) protein-coding genes. Furthermore, chromatin interactions associated with RNAPII showed that many groups of miRNAs and protein-coding genes are co-transcribed from the same transcription factories, shedding light on transcriptional networks of miRNA genes. The interaction networks were found to have both conserved and cell-specific sub-network modules according to the higher-order chromatin organization from different cell types. Interestingly, interacted miRNAs tend to show similar expression pattern and involve in the same disease category. Overall, our study provides insights into transcriptional regulation of miRNAs by three-dimensional chromatin interactions in human cells.

#### 26 April 2012

Katarzyna Wasielewska, Systems Research Institute, Polish Academy of Sciences

#### Software Agents as Resource Brokers in Grid

The aim of the presentation is to introduce the project Agent in Grid (AiG) developed at the Systems Research Institute Polish Academy of Sciences. In particular, to discuss issues concerning application of ontologies and semantic data processing in management of resources in the Grid.
The goal of the AiG project is to utilize software agents as a high-level middleware to facilitate intelligent resource management in the Grid. In the AiG project, we proposed an approach which is based on application of semantic data processing in all aspects of the system i.e. ontologies provide the metadata, that can be used to describe resources, reason about them, and negotiate their usage.
In the presentation, first, assumptions regarding the system design and the two main scenarios will be presented. Second, problems involved in the development of an ontology for Grid resources and the description of the Service Level Agreement (SLA) will be considered. Third, the role of autonomous negotiations in establishing the SLA, as well as, the role of ontologies in the SLA negotiations will be outlined.  Additionally, new and main technical issues met during implementation of the AiG system will be discussed i.e. how an ontology-driven user  interface can be developed to facilitate human-computer communication and a solution to the problem of ontology-based agent-agent communication.

#### 24 April 2012

Dr Wouter Koolen, Department of Computer Science, Royal Holloway, University of London

#### A Free Matrix Lunch

There has been a recent interest in matrix-valued prediction problems, e.g. PCA, subspace learning and matrix regression. Proposed algorithms typically generalise classical vector-valued prediction algorithms, e.g. Hedge, Winnow and Exponentiated Gradient. Surprisingly, for each of these algorithms the matrix regret bound coincides with the classical bound. But intuitively a matrix has more parameters than a vector, and should hence be harder to infer. So the question is: what is going on? Are the classical bounds loose? Or is there a free matrix lunch?

We answer this question affirmatively. To this end, we generalise the simple setting of prediction with multinomial distributions to the matrix case. We then show that not only the bounds, but the actual worst-case regrets are identical. We conclude that overhead is incurred for learning the vector of eigenvalues, but that the eigenvectors are learned for free.

No prior knowledge of matrix mathematics will be assumed. We will not use complex numbers. And "quantum" will only be mentioned once (to impress).

#### 20 March 2012

Prof Massimiliano Pontil, Department of Computer Science, University College Londong>

#### Multi-task Learning: Theory and Practice

We discuss the problem of estimating a structured matrix with a large number of elements. A key motivation for this problem occurs in multi-task learning. In this case, the columns of the matrix correspond to the parameters of different regression or classification tasks, and there is structure due to relations between the tasks. We present a general method to learn the tasks' parameters as well as their structure. Our approach is based on solving a convex optimization problem, involving a data term and a penalty term. We highlight different types of penalty terms which are of practical and theoretical importance. They implement structural relations between the tasks and achieve a sparse representations of parameters. We address computational issues as well as the predictive performance of the method. Finally we discuss how these ideas can be extended to learn non-linear task functions by means of reproducing kernels.

#### 23 February 2012

Prof M.Grazia Speranza, Dipartimento Metodi Quantitativi, Università degli Studi di Brescia, Italy

#### New classes of vehicle routing problems

In this talk I will present and discuss new classes of vehicle routing problems (VRP). The Travelling Salesman Problem is the basic problem of the VRP class. In most of the traditional VRP, all customers are assumed to be known in advance, all require service, all have to be served the same day and each customer must be visited by exactly one vehicle. The goal is to minimize the distance travelled. I will present some less traditional classes of problems, on which the literature available is still limited and on which it is interesting to advance research:

• the split delivery vehicle routing problem, where a customer may be visited by more than one vehicle:
• vehicle routing problems with profits, where a subset of the potential customers has to be identified to maximize the profit collected;
• inventory routing problems, where the service day has to be decided for each customer and inventory costs or inventory capacity are taken into account.

I will introduce the problems and discuss mathematical programming formulations, complexity results, properties, exact and heuristic solution approaches.

#### 21 February 2012

Dr Tony Bellotti, Department of Mathematics, Imperial College London

#### Forecasting and Stress Testing Credit Card Default using Discrete Survival Models

Typically models of credit card default are built on static data, often collected at time of application. We consider alternative models that also include behavioural data about credit card holders and macroeconomic conditions across the credit card lifetime, using discrete survival analysis. We find that discrete survival models that include these behavioural and macroeconomic variables give statistically significant improvements in model fit which translates into better forecasts of default at both account and portfolio level when applied to an out-of-sample data set. By simulating extreme economic conditions, we show how these models can be used to stress test credit card portfolios. We conclude by considering the challenges outstanding and the application of support vector machines to this problem.

### 7 February 2012

Professor Mark van den Brand, Department of Mathematics and Computer Science, Eindhoven University of Technology

Domain Specific Languages are getting more and more attention.The High Tech Industry has developed towards a very softwareintensive industry. The amount of software to develop and maintain is growing fast, maybe even in some case too fast. This industry is investigating alternative ways to develop their software. Next to outsourcing to suppliers DSLs are also explored. In this talk two topics are addressed. First, ongoing research is presented with respect to the development of a DSL for the generation of a virtual proto in order to reduce testing effort.  In the second part a home-made DSL (SLCO) is presented and the way we formalize the semantics of this DSL. The reason for formalizing the semantics is to be able to proof the correctness of model transformations developed for mapping SLCO to various platforms.

### 31 January 2012 Control Operators in Dependently Typed Languages

Dr Robin Adams, Department of Computer Science, Royal Holloway, University of London

We're familiar with types - int, float, bool and all that.  We can give ourselves ways of building more complex types out of simpler ones: given types A and B, we can form the type AxB of pairs, or the type A->B of functions that take input A and give output B.  Type theory studies languages with more complex types still - dependent types. A dream of type theory for some time has been a dependently typed programming language.  With such a language, we could write down "The type of all functions that take a list l, and return an ordered list l' and a bijection between l and l'".  A program that has this type would automatically be a correct sorting algorithm.  The types are a full specification language.  Typechecking is software verification. There are still some fundamental problems to be solved first.  One is how to handle operations that break control flow, such as exceptions and side effects.  There is a way to do this in the lambda calculus, known as control operators.  These work well in simply typed language, too. But make the types a bit more complex, and they break (you can prove 0=1).

I will explain my initial ideas on how to construct a language which incorporates both control operators and dependent types consistently.  I will do this using logic-enriched type theories (LTTs) - languages that separate the dual role that types perform in a dependently typed language.

### 27 January 2012 Tropical Linear Algebra and Game Theory

Prof Alexander E Guterman, Moscow State University

This is a joint work with Marianne Akian and Stephane Gaubert.

Tropical algebra (sometimes called max algebra) is a set of real numbers equipped with the maximum operation instead of usual addition and addition instead of usual multiplication. Under these operations this is an algebraic structure called a semiring. Such structures naturally appear in modern scheduling theory, control theory, optimization, dynamical systems and networks. Tropical arithmetic allows to reduce difficult non-linear problems to the linear problems but over tropical semiring. Therefore, to investigate these problems it is necessary to develop linear algebra in the tropical case. This subject is very actual nowadays. We plan to introduce and investigate tropical linear algebra and to connect it with the game theory. In particular, we will present our recent results on the relationships between the emptiness of tropical polyhedrons and the existence of the winning strategy in two player mean pay off game and other related results.

#### 24 January 2012 Normalized Affymetrix expression data are biased by G-quadruplex formation

Dr Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London

The talk will cover the analysis of a particular type of data that are commonly used by Biologists. Microarrays are employed to determine what types of a particular molecule (RNA) are found in a particular set of cells. Microarrays are small glass chips with large numbers of single stranded DNA attached to them. Individual sequences are called probes which bind (hybridise) to the RNA. This talk will give a pedantic introduction to the topic followed by a more detailed analysis of the following specific issue.

Probes in microarrays with runs of 4 or more guanines (G-stacks) in their sequences can exhibit a level of hybridisation that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming.

We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalisation there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected.  The analysis was repeated for a number of other normalisation pipelines and two, FARMS and PLIER, minimised the bias to some extent. We estimate that approximately 15% of the data sets  deposited in the GEO database are susceptible to the effect.

The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.