Showing posts with label probabilistic graphical models. Show all posts
Showing posts with label probabilistic graphical models. Show all posts

Tuesday, June 05, 2012

Scaling higher education

In the fall of last year, over 100,000 students signed up for Andrew Ng's Machine Learning Class and more than 12,000 of them completed the course. Sebastian Thrun and Peter Norvig taught Artificial Intelligence with similarly impressive numbers.

I was one of the thousands in the Machine Learning class. I had so much fun with that, I also took Daphne Koller's Probabilistic Graphical Models. That one was a quite a bit harder, covering some fairly advanced stuff at least for my few remaining brain cells. But, I finished! For the PGM class, 6702 took the first quiz and 1441 took the final - pretty good retention for such a ball-buster of a class.

This spring, at least two new companies offering online courses were founded. Andrew Ng and Daphne Koller founded Coursera. Partnering with professors from Princeton, Penn, University of Michigan, and Berkeley, they've broadened their course catalog from a base in computer science to include classes in history, mathematics and even poetry. Sebastian Thrun, founder of Udacity, calls his approach University 2.0 and speaks of "democratizing higher eduction" and "empowering students" especially in the developing world where access to higher education is more limited. Almost three quarters of Coursera's students are outside the US, in countries like Brazil, Britain, India and Russia.

The classes are surprisingly fun. The formula boils down to two key elements:

  • short segments
  • interaction

In the mold of Khan academy, the lectures are broken into short segments of 10 to 15 minutes, which fit nicely into busy schedules. Short quizzes test the student's understanding. The courses have social aspect, as well. Online forums provide a place for questions and a sense of camaraderie while struggling through difficult concepts. Meetups and study groups have sprung up in several cities across the world.

The programming exercises are where the real fun begins. Students write code that implements the crux of an operation, filling in the blanks in provided boilerplate code. Grading works a bit like unit testing. Progressing through the assignment by getting tests to pass gives gratifyingly immediate feedback. Completing an assignment results in working code for handwriting recognition, spam classification, image processing or recognizing an action from kinect position sensing data.

Thomas Friedman says, Let the revolution come:

Welcome to the college education revolution. Big breakthroughs happen when what is suddenly possible meets what is desperately necessary.

With the cost of tuition rising, and public funding falling, the timing might be right for some disruptive innovation in higher education. And, the skills on offer are in high demand. One proposed business model is to offer classes for free and charge employers for access to the data.

Refactoring the university classroom to function at internet scale meshes with the building momentum behind open access journals that some are calling an Academic Spring as well as with citizen science projects like Galaxy Zoo.

Increasing openness in academics, in both teaching and research, can only reduce friction in the process of transferring technology from the lab to production and may help engage the public with science. Look for lots of interesting developments in the next few years, as technology knocks a new door into the ivory tower.

More

Updates

Coursera continues to generate lots of news, signing up 12 new universities, including the University of Washington (yay, UW!), attracting the attention of Bill Gate, and being described as The Single Most Important Experiment in Higher Education by the Atlantic and The Beginning of the End for Traditional Higher Education by Fortune and Reshaping Education on the Web by the NYT.

Wednesday, May 02, 2012

Phenotypic constraints drive the architecture of biological networks

Biological networks are often compared to random networks in terms of properties like degree distribution, clustering, robustness and over-representation of network motifs. In a talk this morning, Areejit Samal, a Postdoctoral Fellow in the Price lab at ISB, proposed a new null model based on Markov Chain Monte Carlo (MCMC) sampling to generate realistic benchmark ensembles for metabolic networks and gene regulatory networks. Based on this improved background model, the conclusion is that “phenotypic constraints drive the architecture of biological networks”, which is also the title of the talk.

Applied to the metabolic model of E. coli, the sampling technique involves two steps. In the swap step, a reaction is removed from the network and a new reaction is drawn randomly from the KEGG database to replace it. The proposed new network is then tested by flux-balance analysis. If the new network is viable, it is accepted. If not, it is rejected. So, only networks that are biochemically plausible are sampled.

The technique has implications for systems and synthetic biology as well as evolutionary theory. By approximating the space of all possible networks that might lead to a viable functioning organism, we can better understand what properties these networks have, and maybe better design new networks. By changing the acceptance criteria to enforce viability in a second environment, the algorithm nicely models the evolutionary emergence of modularity.

Samal also showed the same general algorithm applied to the gene regulatory network controlling flowering in Arabidopsis.

I especially appreciated seeing a really cool application of Markov Chain Monte Carlo sampling, a topic covered only a couple weeks back in the Probabilistic Graphical Models class.

Links

Maximum expected utility decision rules

Week 6 of the Daphne Koller's Probabilistic Graphical Models class looks at Decision Theory, which integrates the concept of utility functions from economics into our models. I'm getting flashbacks of Dr. Welsh's Econ 101 in Schwab Auditorium.

In the Influence network below, we're making a decision about whether to found a company. Our success is determined by market conditions, which we can't observe. But, we can observe the results of a survey, which will give us some information on which to make our decision. So, how would you find the optimal decision rule?

The factor fm represents the probability that market conditions (var 1) are good (3), fair (2) or poor (1).

fm.var =  1
fm.card =  3
fm.val = [0.5 0.3 0.2]
PrintFactor(fm)
1 
1 0.500000
2 0.300000
3 0.200000

The factor fsm represents the results of a survey (var 2) given the underlying market conditions (var 1).

fsm.var = [2 1]
fsm.card = [3 3]
fsm.val = [0.6 0.3 0.1 0.3 0.4 0.3 0.1 0.4 0.5]
PrintFactor(fsm)
2 1 
1 1 0.600000
2 1 0.300000
3 1 0.100000
1 2 0.300000
2 2 0.400000
3 2 0.300000
1 3 0.100000
2 3 0.400000
3 3 0.500000

The factor fufm represents the utility function, given the market conditions (var 1) and the decision to found (var 3) a company, yes (2) or no (1).

fufm.var = [3 1]
fufm.card = [2 3]
fufm.val = [0 -7 0 5 0 20]
PrintFactor(fufm)
3 1 
1 1 0.000000
2 1 -7.000000
1 2 0.000000
2 2 5.000000
1 3 0.000000
2 3 20.000000

Compute mu of F, S by taking the factor product of factors representing the market, survey and utility function, then summing out over all possible market conditions (var 1).

PrintFactor(FactorMarginalization(FactorProduct(FactorProduct(fm, fsm), fufm), [1]))
2 3 
1 1 0.000000
2 1 0.000000
3 1 0.000000
1 2 -1.250000
2 2 1.150000
3 2 2.100000

We can build our decision rule by walking down all possible survey results (var 2) and selecting the decision to found or not which maximizes expected utility. See the red circles at the bottom of the diagram.

We can make a decision factor, which depends on the survey (var 2). We'll fill in dummy values, for now.

fd.var = [3 2]
fd.card = [2 3]
fd.val = ones(1,prod(fd.card))

Now, we can use the code from the programming assignment to compute the optimal decision rule and the expected utility it yields.

marketI.RandomFactors = [fm fsm]
marketI.UtilityFactors = [fufm]
marketI.DecisionFactors = [fd]
[meu optdr] = OptimizeMEU(marketI)
meu =  3.2500
optdr =
  scalar structure containing the fields:
    var  =       3   2
    card =       2   3
    val  =       1   0   0   1   0   1

Thursday, March 29, 2012

Probabilistic graphical models

Probabilistic graphical models class

I'm taking Daphne Koller's class on Probabilistic Graphical Models. Wish me luck - it looks tough. So, first off, why graphical models?

The first chapter of the book lays out the rational. PGMS are a general framework that can be used to allow a computer to take the available information about an uncertain situation and reach conclusions, both about what might be true in the world and about how to act. Uncertainty arises because of limitations in our ability to observe the world, limitations in our ability to model it, and possibly even because of innate nondeterminism. We can only rarely (if ever) provide a deterministic specification of a complex system. Probabilistic models make this fact explicit, and therefore often provide a model which is more faithful to reality.

More concretely, our knowledge about the system is encoded the graphical model which helps us exploit efficiencies arising from the structure of the system.

Distributions over many variables can be expensive to represent naively. For example, a table of joint probabilities of n binary variables requires storing O(2n) foating-point numbers. The insight of the graphical modeling perspective is that a distribution over very many variables can often be represented as a product of local functions that each depend on a much smaller subset of variables. This factorization turns out to have a close connection to certain conditional independence relationships among the variables - both types of information being easily summarized by a graph. Indeed, this relationship between factorization, conditional independence, and graph structure comprises much of the power of the graphical modeling framework: the conditional independence viewpoint is most useful for designing models, and the factorization viewpoint is most useful for designing inference algorithms

An Introduction to Conditional Random Fields by Charles Sutton and Andrew McCallum

Bayesian networks and Markov networks (aka Markov random fields) are the two basic models used in the class, the key difference being directed vs. undirected edges. In a Bayesian network, the edges are directed while they are undirected in a Markov network.

How are these different kinds of graphic models related? Let's hope we'll find out.

There's a study group meetup here at the Institute for Systems Biology (and maybe other locations) on Thursday night. Come join us, if you're in Seattle and you're doing the class.

Supplemental reading