In the January 2008 issue of Communications of the ACM, Jeannette Wing of Carnegie Mellon University poses these questions:
- P=NP?
- What is computable?
- What is intelligence?
- What is information?
- (How) can we build complex systems simply?
Mad science in silico...
In the January 2008 issue of Communications of the ACM, Jeannette Wing of Carnegie Mellon University poses these questions:
- P=NP?
- What is computable?
- What is intelligence?
- What is information?
- (How) can we build complex systems simply?
I've harbored a secret desire to learn Haskell for a few years now. Simon Peyton-Jones is one of the key people behind Haskell. His web site at MSR has tons of papers, a tutorial on concurrent programming in Haskell, and a video lecture of A taste of Haskell. There's also a Simon Peyton-Jones podcast at SE-Radio.
What is Haskell
Haskell is a programming language that is
- purely functional
- lazy
- higher order
- strongly typed
- general purpose
Why should I care?
Functional programming will make you think differently about programming
- Mainstream languages are all about state
- Functional programming is all about values
Whether or not you drink the Haskell Kool-Aid, you'll be a better programmer in whatever language you regularly use
I should read a Haskell book or two, and, in related functional goodness, I keep reading how great Practical Common Lisp is. I also need to fulfill my quest to finish SICP. I've read the first three chapters twice, doing the examples once in Scheme and again in OCAML. I've read chapter 4 on interpreters. I need to work through the examples in that chapter and take in the final fifth chapter.
Let's start off by saying I'm not anti-Ruby. I like Ruby. Ruby is cool. Matz is cool. But, a while back I was wondering, What is a Ruby code block? My feeble curiosity has been revealed for the half-assed dilettantery it is by Paul Cantrell. Mr. Cantrell chases this question down, grabs it by the scruff of the neck, and wrings it out like a bulldog with a new toy. He also rocks on the piano, by the way.
So in fact, there are no less than seven -- count 'em, SEVEN -- different closure-like constructs in Ruby:
- block (implicitly passed, called with yield)
- block (&b => f(&b) => yield)
- block (&b => b.call)
- Proc.new
- proc
- lambda
- method
This is quite a dizzing array of syntactic options, with subtle semantics differences that are not at all obvious, and riddled with minor special cases. It's like a big bear trap from programmers who expect the language to just work. Why are things this way? Because Ruby is:
- designed by implementation, and
- defined by implementation.
Again, neither I nor Mr. P.C. are bashing Ruby. He shows how to pull off some tasty functional goodness like transparent lazy lists later in the article. Thanks to railspikes for the link.
If you're into stats, both of these are highly regarded, but miles over my head.
Papers
Dr. Larry Ruzzo at UW teaches a Computational Biology course. Some of the links above are from his reading list, particularly the Sean Eddy Primer articles from Nature Biotechnology.
In winter of 2008, some UW CS grad students held a seminar course on data management issues in life sciences. In case that link doesn't stay up forever, here's some of the reading list:
Intro to BiologyOverview on biological data integration
Specific tools and techniques
Also on the subject of data: Dynamic Fusion of Web Data.
Books I wanna read
Finally, here are some books that I haven't read, will probably never get the time to read, but I wish I would read.
Slashdot linked to Bjarne Stroustrup on Educating Software Developers which follows up on an earlier article, The 'Anti-Java' Professor and the Jobless Programmers. The Anti-Java professor is Robert Dewar at NYU, who coauthored a short paper, Computer Science Education: Where Are the Software Engineers of Tomorrow? They contend that computer science curricula have been dumbed down to counter falling enrollment post-dot-com-crash and partially blame Java, which fosters reliance on libraries and garbage collection. But, not all of their critique can be written off as language bigotry. The result?
We are training easily replaceable professionals.
Dewar advocates:
Those sound like solid points to me. One thing the field of medicine really gets right is an emphasis on mentoring. Mentoring is the heart of residency, which depending on specialty can last from 3 to 7 years. By the time a physician graduates from residency, they will have performed hundreds of procedures and seen thousands of patients under the guidance of an attending physician. I've often wished there was more of this in the computing field.
Over the years, I've accumulated a list of topics I wish I'd been exposed to as a CS undergrad.
Of course, then my undergraduate degree would have taken 7 years... On second thought, only my Dad would have complained.
What do you wish you'd learned in college? Post a comment!
I wondered who else might be working on bioinformatics related extensions for Firefox besides Firegoose. One interesting project is iHOPerator, which builds on Greasemonkey. And, there's a hint of something to come here.
It seems like there was a flurry of interest around 2005, in the early days of AJAX and mash-ups, which produced biobar along with two now-dead projects - bioFox and NCBI Search Toolbar. Back in those days, John Udell asked, How do you design a remixable Web application? Nifty developments like the REST API in EMBL's STRING 8.0 are starting to provide answers.
Pygr is a hypergraph database in Python with applications in bioinformatics written by Christopher Lee, a faculty member at UCLA. There's a 30 minute video of talk about Pygr and a bunch of other resources on the Lee Lab website and Lee's thinking bioinformatics blog.
Thesis: Hypergraphs are a general model for bioinformatics and Python’s core models are already a good model of Bioinformatics DataPygr aims to show that these Pythonic patterns are a general and scalable solution for bioinformatics.
- Sequence: protein and nucleic acid sequences
- Mapping / Graphs: alignment, annotation
- Attributes: schema, i.e. relations between data
- Namespace (import): the ontology of all bioinformatics data
The general idea is not entirely different from the data types behind Gaggle, especially in the emphasis on basic data structures without a heavy semantic component.
Dr. Lee is also writing a textbook on probabilistic inference.
I happened across a very cool project on web data integration at the University of Leipzig. Their paper Dynamic Fusion of Web Data is worth a look. They're working towards a theory of on-the-fly data integration for mashup applications that they refer to as dynamic data fusion. Data integration in mashups is dynamic in that it occurs as runtime. This provides for a pay-as-you-go model, rather than a large up-front semantic mapping task that limits the scalability of traditional data integration methods like data warehouses.
They describe mashups as workflow-like. Do they mean mashups are programmatic as opposed to declarative? In place of SQL, this group's iFuice system uses a scripting language with "set operations (e.g., union, intersection, and difference) and data transformation (e.g., fuse, aggregate) which can be used to post-process query results". Other key features are instance-level mapping and accommodation of structured and unstructured data.
This definitely gets at what Firegoose is good for - using the web as a channel for structured data - an approach that does for data integration what loose coupling does for software. Firegoose, part of the Gaggle framework, is a toolbar for Firefox that allows data to be exchanged between desktop software and the web. Firegoose can read microformats, call web services, query databases, or even perform nasty dirty screen scraping. Unlike a mashup, data integration in Firegoose and Gaggle requires user participation, although the user never deals with schemas, only instances of the Gaggle data types - mainly lists of identifiers, matrices of numeric data, networks, and tuples. The identifiers serve in a role somewhat analogous to primary keys.
More papers in a similar vein
I may as well come clean and admit that I'm developing a genome browser. What? Another genome browser? Why? You may well ask these questions. Well, it's a long story. But here is a completely non-exhaustive list of existing genome browsers.
Note: updated in Sept. 2009 to reflect the fact that everyone and their uncle built a genome browser this past couple of years. See Brother, can you spare a genome browser?
Note: updated again in May of 2010 and again in Feb 2011 to add Savant.
I came across a color chooser that kicks the Swing JColorChooser's booty. It looks great and supports transparency. It's available at colorchooser.dev.java.net and on the developer's blog. It will soon be used in my genome browser. Thanks, Jeremy!
A full-featured browser, an XUL front end, and the wealth of libraries available in Java makes for a powerful and flexible combination. The browser extension capability of Firefox, along with LiveConnect has been used by at least three extensions:
Most of what I've figured out about using Java from an extension came from Simile's David Huynh. Sadly, development of Piggy Bank has now "quiesced".
I don't know about the others, but Firegoose is hosed by the latest Java 6 update 10. Apparently, Java 6.10 introduces some significant changes into LiveConnect and the Java browser plugin. It's certainly good that Java in the browser is getting some attention, but I wish Java in a Firefox extension was a supported and regression tested use case (see whining here). The fact that it's such an arcane, unsupported and brittle hack is holding back what could otherwise be a nice technique.
Interest in Java in Firefox extensions appears to exist according to these posts in the MozillaZine Extension Development forum:
First Problem: The error I get appears to happen when reflectively instantiating a Java array and looks like this:
Error calling method on NPObject! [plugin exception: java.lang.IllegalArgumentException: No method found matching name newInstance and arguments [com.sun.java.browser.plugin2.liveconnect.v1.JavaNameSpace, java.lang.Integer]].
Instantiating the array through reflection was, itself, a work-around for another LiveConnect issue with type conversion between Javascript arrays and Java arrays. It's barfing on line 03 below:
// from http://simile.mit.edu/repository/java-firefox-extension/firefox/chrome/content/scripts/browser-overlay.js _toJavaUrlArray: function(a) { var urlArray = java.lang.reflect.Array.newInstance(java.net.URL, a.length); for (var i = 0; i < a.length; i++) { var url = a[i]; java.lang.reflect.Array.set( urlArray, i, (typeof url == "string") ? new java.net.URL(url) : url ); } return urlArray; }
Update 1: First problem solved easily enough.
var dummyUrl = new java.net.URL("http://gaggle.systemsbiology.org"); var urlArray = java.lang.reflect.Array.newInstance(dummyUrl.getClass(), a.length);
Now, on to more hideousness:
Error calling method on NPObject! [plugin exception: java.security.AccessControlException: access denied (java.lang.RuntimePermission createClassLoader)].
...caused by trying to instantiate a URLClassLoader. The next-generation Java Plug-in, including in update 10, makes changes to the security policy such that calls from Javascript to Java are uniformly treated as untrusted.
Update 2: A Work-around!
A post on the java.net forum has a work-around. You can disable the "next-generation plug-in" through the Java control panel. Under the Advanced tab, open Java Plug-in, deselect Enable the next-generation Java Plug-in, then, restart Firefox. There is a bug filed whose comments seem to suggest that it will be addressed in a future release of the Java Plug-in.Update 3: According to this thread on java.net, a fix is on the way in Java SE 6 update 12. Thanks, Sun!
More references:
Clay Shirky appeared on the excellent podcast Econ-Talk, to which I've become hopelessly addicted. (My geekosity knows few bounds.) He was hawking his book Here Comes Everybody.
Shirky is a great thinker about what makes the social aspects of technology tick. His current theory goes something like this: There is a cognitive surplus brought about the increasing amount of leisure time in modern industrialized societies. For a generation, that surplus has been dissipated through the artifice of television. Now, that surplus is being put to use. Open source software, tagging, Wikipedia, and other social web phenomena are the early results.
The question now is, what happens when someone figures out how to move this phenomena out to the physical world. Someone posted this quote in the comments section to the podcast:
The effect of a newspaper is not only to suggest the same purpose to a great number of persons, but to furnish means for executing in common the designs which they may have singly conceived.
-Alexis de Tocqueville
Java has a long standing limitation about loading native code from a jar, stemming from the unfortunate belief that native code was somehow impure. In the practical world, there was occasional need to do this, so people came up with the work-around of copying the native library out of the jar and into the local filesystem where it could then be loaded with System.load().
Another way do do this, within web start is to platform specific resource elements like these to your jnlp:
<resources os="Windows" arch="x86"> <j2se href="http://java.sun.com/products/autodl/j2se" version="1.5+"/> <nativelib href="sqlitejdbc-v052-native-win.jar"/> </resources> <resources os="Mac OS"> <j2se href="http://java.sun.com/products/autodl/j2se" version="1.5+"/> <nativelib href="sqlitejdbc-v052-native-mac.jar"/> </resources> <resources os="Linux"> <j2se href="http://java.sun.com/products/autodl/j2se" version="1.5+"/> <nativelib href="sqlitejdbc-v052-native-lin.jar"/> </resources>
This way, we get the native library for the right platform, if all goes according to plan. One wacky detail is that System.loadLibrary("foo") munges the library name in some platform specific and poorly documented way. So, you need a libfoo.jnilib for OS X, libfoo.so for linux, and a foo.dll for windoze. It's kind-of a hassle to set up all these separate jars. And System.load(...) just tries to do what it's told without munging the name, so people still frequently use the trick described above.
SqliteJDBC does just that to load the native Sqlite library. SqliteJDBC also includes "pure Java" drivers, which are considerably slower for some operations, but act as a nice fallback when the proper native libraries aren't handy.
This all works well enough. But, on OS X 10.5, Java 6, it ends up falling back on the non-native drivers. After digging a bit, I found that the call to System.load(..) was failing with this exception:
/private/tmp/libsqlitejdbc-19214.lib: java.lang.UnsatisfiedLinkError: /private/tmp/libsqlitejdbc-19214.lib: at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1822) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1702) at java.lang.Runtime.load0(Runtime.java:770) at java.lang.System.load(System.java:1005) ...
The same thing works find when using Java 5 on the same machine and on Java 6 on Windoze, so I'm guessing the native library needs to be recompiled to work correctly with OS X's 64-bit Java 6.
A little spelunking into SqliteJDBC's code reveals that the author has kindly provided hooks where you can insert the path and name of your own driver using these system properties:
The driver will then attempt this:
System.load(new File(libpath, libname).getAbsolutePath());
More Resources:
A previous post, Hacking NCBI Entrez, dealt with how to retrieve sequence information from NCBI's databases. That method seems to work for prokaryotes and for yeast, but fails for most other eukaryotes.
For mammals, efetch for genome XML gives back crap like this (for rat):
<gbseq_contig>join(NW_001084776.1:1..691014,gap(182895),NW_001084777.1:1..1914699,gap(182895),NW_001084778.1:1..26673,gap(182895),NW_001084779.1:1..2730,gap(182895),NW_001084780.1:1..61755,gap(182895),NW_001084781.1:1..20466,gap(182895),NW_001084782.1:1..657670,gap(182895),NW_001084783.1:1..55883,gap(182895),NW_001084784.1:1..9292,gap(182895),NW_001084785.1:1..10599,gap(182895),NW_001084786.1:1..14198,gap(182895),NW_001084787.1:1..3561,gap(182895),NW_001084788.1:1..106511,gap(182895),NW_001084789.1:1..21205827,gap(182895),NW_001084790.1:1..11152534,gap(182895),NW_001084791.1:1..6015389,gap(182895),NW_001084792.1:1..686425,gap(182895),NW_001084793.1:1..9344793)</gbseq_contig>
Or this for XML for human Y chromosome. Totally useless. I take it I'm supposed to request each of the referenced sequences and parse out the regions for each? What a colossal pain in the ass!
By the way, BioJava has a parser called GenbankXmlFormat, but it's docs say, "Deprecated. Use org.biojavax.bio.seq.io.INSDseqFormat". What INSDseqFormat is or how that is supposed to replace GenbankXML is totally unclear.
I attended a talk by Stuart Kauffman about systems biology, complexity, information theory, and related topics. It inspired the following crackpot theory / misunderstanding of Kauffman's ideas about the nature of the relationship between theory and reality.
Let's take physics.
Now, let's imagine an ill-defined concept that I'll call resolution (as in screen resolution). R(N) is a measure of the explanitory power or detail of the theory N (Newtonian physics).
R(N) < R(E) < R(Q)
Right? So is there such a thing as R(reality)? Or is the resolution of reality infinite? It's certainly true that theories are only useful when R(theory) << R(system being described). What use is a theory of Universe sized complexity?
I'm willing to guess is that there is a finite (although large) R(reality). And that there's some sort of Heisenberg/Goedel-like limit to how close any theory can come to accurately describing reality, regardless of how high the R of the theory is. More and more advanced theories asymptotically approach this limit.
I was encouraged by this guy to make better use of Eclipse's code templates. I've used templates for a while for boilerplate code like adding a log4j logger to a class, like this:
class Foo { private static final Logger log = Logger.getLogger(Foo.class); ... }
Using templates, I just have to type 'l4j' and cntl-space and there I go. One thing that had been plaguing me is that I still had to cnt-space again to add the imports, and thanks to Sun's needless duplication of effort, I had to select the log4j Logger over the lame java.util.Logger.
Little did I know about the ${import} and ${importStatic} variables. The following template gives me the required imports for free.
${imp:import(org.apache.log4j.Logger)} private static final Logger log = Logger.getLogger(${enclosing_type}.class);
And how about this for JUnit 4 imports:
${imp:import(org.junit.Test)} ${impst:importStatic('org.junit.Assert.*')}
With enough Eclipse templates, Java might not drive me completely batshit after all. I love it when I file a bug and the feature already exists. Thanks, Eclipse.
As previously mentioned, generics hurt my head. Sun's Java generics tutorial calls Collections.max() one of the subtlest examples of the use of generics.
public static <T extends Object & Comparable<? super T>> T max(Collection<? extends T> coll)
Subtle is one word for it. Anyway, to my walnut-sized brain, generics are confusing; maybe too confusing to be worth it. Was ClassCastException really a major problem for anybody?
Generics can be used in two places. Generic classes, like List<T>, and generic methods, like Collections.max().
As I learned in Programming Languages 505, return types are covariant while method parameters are contravariant (see the wikipedia entry). Generics, because they can be used to parameterize both return types and method parameters, are invariant. Keeping this in mind explains a lot. Thus, List<String> is not a subtype of List<Object>. This leads to a lot of confusion, particularly because Arrays in Java took the opposite choice. String[] is a subtype of Object[]. Aaarg!
Wildcards attempt to mitigate this problem, with extends and super.
List<? extends Flapdoodle> TreeSet(Comparator<? super E> c)
I pronounce List<?> as "list of WTF". You're really supposed to say it's an unbounded wildcard type. Seems about as annoying as @SuppressWarnings("unchecked").
For more confusing information, see the Generics chapter out of Effective Java, 2nd Ed. (Thanks to Joe Bowbeer for the pointer!):
In other words, if a parameterized type represents a T producer, use <? extends T>; if it represents a T consumer, use <? super T>.
All the cool kids are on github. ...and I still thought SVN was new and hip. Guess I'm behind the times.
I must be getting old. I voluntarily went to a non-technical talk by Jared Richardson about building teams for software development. Now I'm blogging about it. And using the term "blogging". Evidently, I've finally gone stark raving batshit.
Previously, generics were hurting my head. The generics tutorial has this to say about the problem that is at issue here:
Wildcards are designed to support flexible subtyping [...]. Generic methods allow type parameters to be used to express dependencies among the types of one or more arguments to a method and/or its return type.
The solution here (thanks to Eric Jain) uses a generic method to express the dependency between the type Features in the Container passed into renderMap's getRendererFor(...) function and the type of Renderer returned. The renderer must be able to render that type of feature.
package generics.solution; // We have a hierarchy of specialized subtypes of feature. class Feature { } class SpecialFeature extends Feature { } // Features of a given kind are accessed through containers interface Container<F extends Feature> { public Iterable<F> features(); } // We want to draw features in various specialized ways. interface Renderer<F extends Feature> { public void render(Iterable<? extends F> features); } class SpecialRenderer implements Renderer<SpecialFeature> { public void render(Iterable<? extends SpecialFeature> features) { // render special features } } // Which renderer we use for a given container of features gets // configured at runtime. This holds the mapping between renderers // and containers. interface RendererMap { public <F extends Feature> Renderer<? super F> getRendererFor(Container<F> container); } class RenderingScheduler { RendererMap rendererMap; public void render(Iterable<Container<? extends Feature>> containers) { for (Container<? extends Feature> container : containers) { render(container); } } public <F extends Feature> void render(Container<F> container) { Renderer<? super F> renderer = rendererMap.getRendererFor(container); renderer.render(container.features()); } }
Update: Generics hurt my head some more.
Here's a little generics problem, in the form of some code:
package generics.question; // We have a hierarchy of specialized subtypes of feature. class Feature { } class SpecialFeature extends Feature { } // Features of a given kind are accessed through containers interface Container<T extends Feature> { public Iterable<T> foos(); } class SpecialContainer implements Container<SpecialFeature> { public Iterable<SpecialFeature> foos() { return null; } } // We want to draw features in various specialized ways. interface Renderer<V extends Feature> { public void render(Iterable<V> foos); } class SpecialRenderer implements Renderer<SpecialFeature> { public void render(Iterable<SpecialFeature> foos) { // render Subfoo objects } } // Which renderer we use for a given container of features gets // configured at runtime. This holds the mapping between renderers // and containers. interface RendererMap { Renderer<? extends Feature> getRendererFor(Container<? extends Feature> container); } class RenderingScheduler { RendererMap rendererMap; public void render(Iterable<Container<? extends Feature>> containers) { for (Container<? extends Feature> container : containers) { Renderer<? extends Feature> renderer = rendererMap.getRendererFor(container); // The following line produces a compile error: // The method render(Iterable<capture#3-of ? extends Feature>) in // the type Renderer<capture#3-of ? extends Feature> is not // applicable for the arguments (Iterable<capture#4-of ? extends // Feature>) renderer.render(container.foos()); // this works, but... ug. // How are these type parameters helping me again? Renderer<Feature> r = (Renderer<Feature>)renderer; Iterable<Feature> foos = (Iterable<Feature>)container.foos(); r.render(foos); } } }Update: Solution! ...and generics hurt my head some more.
From Pathway Databases: A Case Study in Computational Symbolic Theories by Peter Karp in Science magazine in 2001:
One lesson for computer scientists provided by pathway DBs (and by other bioinformatics applications) concerns the importance of DB content to solving computational problems. Most computer scientists focus their attention on algorithms, thinking that the best way to solve a hard computational problem is through a better algorithm. However, for problems such as predicting the pathway complement of an organism from its genome, or predicting metabolic products that an organism can produce from a given growth medium, I know of no algorithms that can solve these problems without being coupled with an accurate and well-designed pathway DB.
The docs for Sqlite, under the heading "Threads are evil", pointed me to a paper by Edward A. Lee of UC Berkeley called The Problem with Threads.
Concurrency in software is difficult. However, much of this difficulty is a consequence of the abstractions for concurrency that we have chosen to use. The dominant one in use today for general purpose computing is threads. But non-trivial multi-threaded programs are incomprehensible to humans.
Lee argues that threads are wildly nondeterministic by default, while "nondeterminism should be judiciously and carefully introduced where needed, and should be explicit in programs."
And, who in their right mind would bother to write code to deal with it, since it's already been done about 5 million times? Well, me for one. OK, yes, I am a turbodork. I can no longer deny it. Here's the code:
/** * Represent Roman numerals between 1 and 4999 inclusive. */ public class Roman implements Comparable{ private int value; private String roman; private final static String[][] digits = { {"", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"}, {"", "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC"}, {"", "C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM"}, {"", "M", "MM", "MMM", "MMMM"} }; private final static Pattern regex = Pattern.compile("M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})"); public Roman(int n) { value = n; roman = intToRoman(n); } public Roman(String string) { if (isRoman(string)) { value = romanToInt(string); roman = string; } else throw new RuntimeException("\""+string+"\" is not a valid roman numeral."); } public String toString() { return roman; } public int toInt() { return value; } /** * Test whether the input string is a valid roman numeral (between 1 and 4999). */ public static boolean isRoman(String s) { return s!=null && !"".equals(s) && regex.matcher(s).matches(); } /** * Convert an integer between 1 and 4999 (inclusive) to a Roman numeral. */ public static String intToRoman(int n) { if (n>=5000 || n <= 0) throw new RuntimeException("Roman equivalents exist only for numbers between 1 and 4999."); StringBuilder sb = new StringBuilder(16); int place = 0; while (n>0) { int qoutient = n/10; int remainder = n - (qoutient * 10); sb.insert(0, digits[place][remainder]); place++; n = qoutient; } return sb.toString(); } /** * Convert a roman numeral to an integer. This method isn't strict about * the rules of what makes a valid roman numeral. In particular, it will * happily convert IIIIIIIIII to 10 or IL to 49 even though neither of * these are valid roman numerals. To check whether a String is a valid * roman numeral use isRoman(). */ public static int romanToInt(String roman) { int value = 0; int prev = 1000; for (int i=0; i prev) { value -= (prev << 1); } value += v; prev = v; } return value; } /** * @return return the integer value of a single roman character. * @throws RuntimeException if the character isn't a roman numeral. */ private static int charToValue(char c) { if (c=='I') return 1; if (c=='V') return 5; if (c=='X') return 10; if (c=='L') return 50; if (c=='C') return 100; if (c=='D') return 500; if (c=='M') return 1000; throw new RuntimeException("Illegal character in roman numeral: " + c); } public int compareTo(Roman other) { return this.value - other.value; } @Override public int hashCode() { return value; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Roman other = (Roman) obj; return this.value == other.value; } }
List files in a hierarchy without the svn crap.
find . -name ".svn" -prune -o -print
InternalError: Unable to convert JavaScript value [...blah blah...] to Java value of type java.net.URL[]
// from http://simile.mit.edu/repository/java-firefox-extension/firefox/chrome/content/scripts/browser-overlay.js _toJavaUrlArray: function(a) { var urlArray = java.lang.reflect.Array.newInstance(java.net.URL, a.length); for (var i = 0; i < a.length; i++) { var url = a[i]; java.lang.reflect.Array.set( urlArray, i, (typeof url == "string") ? new java.net.URL(url) : url ); } return urlArray; }
David L. Parnas's infuential 1972 paper On the Criteria to Be Used in Decomposing Systems into Modules (sited in Paul Shannon's Gaggle paper) described modularization by information-hiding, later developed into high cohesion and low coupling.
The paper points out that it takes deliberate attention to get good software structure and that limiting information is the key to good design. Where there are many modules, more structure is needed. It's useful to create a hierarchical modular structure using "part of" and arrange programs in a hierarchical structure based on "uses". (By programs, he seems to mean parts of a module?) He criticizes the idea of "layers of abstraction" as too vague in absence of any measure of what's more abstract than what.
The concept of information-hiding as a software design principle is widely accepted in academic circles. Many successful designs can be seen as successful applications of abstraction or information hiding. On the other hand, most industrial software developers do not apply the idea and many consider it unrealistic.
The principle is now clear. It is still hard to apply.
He also seems to foreshadow design patterns.
Researchers should be publishing proposed standard structures for classes of programs. Researchers should be publishing proposed standard designs.
This kind of design-centered approach to software engineering appeals to me a lot more than fluffy stuff like usability studies and requirements gathering.
The video-card manufacturers are at an interesting crossroads. nVidia is really pushing general purpose computing on graphics hardware.
Cool idea, but it can't really take off unless programs are written to detect the presence of capable hardware and load code specially compiled for that particular chipset. In other words, GPGPU code written for nVidia cards won't run on ATI cards. Bummer. Clearly nVidia understands the advantage to be had from a large body of code compiled for their instruction set. They want to be in the position of owning the GPU equivalent of the x86 instruction set. Nice move on their part.
But wouldn't it be much cooler for the rest of us if things were a little more open? Is a common instruction set for graphics cards at all plausable? Like ARM is for embedded devices? Or maybe a common intermediate layer and JIT for all types of GPU code? That would be cool. And something like that would induce a lot more developers to take the plunge into compiling to the GPU, which has got to be a fairly drastically different piece of code relative to a straightforward implementation.
You can't believe everything you read on the internet. For example, you'll read that, in Ruby, use of the return keyword is optional. Like this:
def foo(a) return a*a end foo(9) >>81 def bar(a) a*a end bar(8) >>64
So far, so good. You'll also read that a code block in Ruby is an anonymous function. And what could be more natural than to return from a function? But try this:
[1,2,3].map {|a| a*a} >> [1,4,9] [4,5,6].map {|a| return a*a} >> LocalJumpError: unexpected return >> from (irb):14 >> from (irb):14:in `map' >> from (irb):14 >> from :0
Well then, apparently one or both of our premises is wrong. Anyway, what is this LocalJump thing that is apparently in error?
My guess as to what's really going on is this: The code block is an attempt at creating a function-like artifact that matches the behavior of a code block in a non-functional language. That is, the code in the block doesn't execute in its own environment. Calling a code block doesn't cause a stack frame to be created. The reason the return causes problems is that there's no separate environment to return from.
It bothers me that I can't seem to google up an explanation of exactly what a Ruby code block (or Proc or lambda or method for that matter) is in terms of what does or doesn't happen to stack frames or environments. There's plenty of tutorials on how to use these constructs, but, little information on what they are, how they work, or why there are so many flavors of function-like things in Ruby.
Here's what I got from comp.lang.ruby.
State-of-the-art DNA microarrays contain between 1 million and 6 million features (different probes) on a single slide. Assuming a 32 bit floats, we need at 4 bytes per feature. If we include start and stop coordinates on the genome for each feature, we're up to 12 bytes per feature.
features | MB | MB w/ coords |
---|---|---|
1 million | 4 | 24 |
6.5 million | 24 | 75 |
In tiling arrays, probes target regularly spaced segments of the genome so that expression can be measured at every point along the genome. Our group has done some work with arrays containing 60bp probes tiled every 20 base-pairs along the genome of Halobacterium salinarium. The genome of H. salinarum is 2.6 million bases, so we are able to cover most of the genome with a little less than a quarter of a million probes. To cover the entire human genome at similar resolution would take ~307 million probes.
As a side note, AFFY sells a set of 14 arrays with a total of almost 90 million probes that covers the whole genome at 35 bp resolution. (90 million * 35bp = 3150 mpbs. Does their idea of a probe include both forward and reverse strand?)
These arrays are all custom made for an individual genome. I wonder if an array with all 4 million possible 11-mers would be useful? Since there's only 4 million of them, you could expect a fair number of collisions - where a probe is non-unique on the genome. How do you do that calculation?
As far as I can tell, the capability to handle much bigger heap sizes is the main benefit from 64-bit Java. Primitive types don't change in size. An int is still 32 bits.
Java arrays are indexed by 32-bit integers (actually, signed integers, so 31 bits). Apparently this doesn't change in 64-bit Java. From the Java Language Spec:
Arrays must be indexed by int values; short, byte, or char values may also be used as index values because they are subjected to unary numeric promotion (§5.6.1) and become int values. An attempt to access an array component with a long index value results in a compile-time error.
This is a serious limitation, if you ask me. The main advantage of 64-bits is a simple programming model for large data objects. Apparently, Sun disagrees. Integer array indexes are baked fairly deeply into the language and into existing application code. Think of everywhere you refer to the length or an offset into an array as an int. Imagine how often that's done in the library code. Dunno if they'll ever change it.
source: 64-bit versus 32-bit Virtual Machines for Java
Finally, here's an intriguing tidbit: sun.misc.Unsafe
I maintain a Firefox extension called Firegoose that (among other things) works with data embedded in HTML pages. We do this using microformats, embedded XML, and even, if we have to, nasty screen scraping. I'm not too religious about the format of the data - just give me something I can parse.
So, it occurred to me to wonder why JSON wasn't more popular in this niche. Easily parsable, legal in a browser, scriptable from within the page - why not? Of course people are, as usual, miles ahead of me here:
Strangely, activity around this topic seems to have surfaced briefly and disappeared sometime in mid 2007.
Ben Fry's Visualizing Data Here's an article about Processing and a cool book: Flash Math Creativity: Second Edition
Blog "LivingCode" w/ some Processing related stuff.
IBM's Visual Communication Lab
John Resig's processing.js
Nodebox: a graphics framework for Python on OS X.
I think this is cool. It's definitely bizarre.
There are funny tutorials for Haskell and Lisp.
I went to a cool talk (video should eventually be available) at UW by Jeffery Heer, developer of the prefuse visualization toolkit, at Berkeley's Visualization Lab.
His Flare toolkit, a flash-based follow-on to prefuse, also seems to validate my desire to ditch Java for this sort of thing and move to Adobe's platform (flash/flex/AIR).
Mentioned during the talk were: GapMinder, ManyEyes
A related class at Berkeley on infoviz. I suppose, if I get interested in this stuff, I ought to read those Tufte books.
You can become a scholar these days just by sitting in front of your computer watching video lectures.
A blog called FreeScienceOnline has a better course catalog than most universities.
A couple discussions about computer science lecture videos on stack exchange.
Life-hacker has a post on how to Get a free college education online.
Java's drag and drop functionality is painful. What I want is a list that the user can reorder by dragging. Here a couple of nice efforts with the caveat that they work only for single selections.
Sun's tutorial is here. There's a chapter in O'Reilly's Swing Hacks about it, but their code seems buggy to me.
Classcycle is a tool for analyzing dependencies among Java classes and packages. It can really help when trying to make sure you haven't introduced any unintentional dependencies. It's similar to JDepends, but a little more up-to-date. Its output in a nicely formatted web page (actually an xml + xsl -> html deal).
There's a lot going on in the world of programming languages these days - from functional/OO crossbreeding to new language constructs for concurrency. Here's a few random interesting tidbits:
Oh yeah, and concurrency is where it's at too:
On a cold gray sunday afternoon, I went through the tutorial for the Scala programming language (by a group in Switzerland called LAMP). There was an exercise straight out of SICP about evaluating simple algebraic expressions. Here's what I came up with. It may be far from optimal, but seems to work for at least the one test case.
abstract class Tree case class Sum(l: Tree, r: Tree) extends Tree case class Var(n: String) extends Tree case class Const(v: int) extends Tree object Expressions { type Environment = String => int def eval(t: Tree, env: Environment): int = t match { case Sum(l, r) => eval(l, env) + eval(r, env) case Var(n) => env(n) case Const(v) => v } def derive(t: Tree, v: String): Tree = t match { case Sum(l, r) => Sum(derive(l, v), derive(r, v)) case Var(n) if (v == n) => Const(1) case _ => Const(0) } // there's probably a more functional way to do this def simplify(t: Tree): Tree = t match { case Sum(l, r) => val sl: Tree = simplify(l) val sr: Tree = simplify(r) sl match { case Const(lv) => sr match { case Const(rv) => Const(lv + rv) case _ => if (lv==0) sr else Sum(sl,sr) } case _ => sr match { case Const(rv) if (rv==0) => sl case _ => Sum(sl,sr) } } case Var(n) => Var(n) case Const(v) => Const(v) } def toString(t: Tree): String = t match { case Sum(l, r) => "(" + toString(l) + " + " + toString(r) + ")" case Var(n) => n case Const(v) => v.toString() } def main(args: Array[String]) { val exp: Tree = Sum( Sum(Var("x"),Var("x")), Sum(Const(7),Var("y"))) val env: Environment = { case "x" => 5 case "y" => 7 } println("Expression: " + toString(exp)) println("Evaluation with x=5, y=7: " + eval(exp, env)) println("Derivative relative to x: " + toString(derive(exp, "x")) + " = " + toString(simplify(derive(exp, "x")))) println("Derivative relative to y: " + toString(derive(exp, "y")) + " = " + toString(simplify(derive(exp, "y")))) }
The subclasses of Tree define the nodes of a parse tree for simple expressions like:
((x + x) + (7 + y))
The program demonstrates Scala's pattern matching capabilities, defining methods that perform operations on trees: evaluate, take derivatives, and print. Pattern matching is a feature borrowed from the ML family of languages. You might think of them as case statements on steroids. Sometimes they can concisely replace polymorphism. Here, they play the role of the visitor pattern.
The visitor pattern makes it easy to add new operations on all members of an inheritance hierarchy in a modular way. (Modular meaning the operation is encapsulated). The cost is that adding a new member to the inheritance hierarchy requires modification of all existing operations. The cost and benefit of pattern matching statements is the same.
More Scala Resources:
Digithead's lab notebook is a place for tutorials, cheat-sheets and stray thoughts on topics related to programming, bioinformatics, machine-learning and visualization.