Wednesday, March 25, 2009

User directory structures

Some OS's call them home or user home directories, document directories. Here's a quick guide to how different operating systems typically organize user directories.

Directory TypeDirectory path returned for each operating system
User Home
Windows XP:C:\Documents and Settings\<user>
Windows Vista:C:\Users\<user>
Mac OSX:/Users/<user>
Linux:/home/<user>
User Desktop
Windows XP:C:\Documents and Settings\<user>\Desktop
Windows Vista:C:\Users\<user>\Desktop
Mac OSX:/Users/<user>/Desktop
Linux:/home/<user>/Desktop
User Documents
Windows XP:C:\Documents and Settings\<user>\My Documents
Windows Vista:C:\Users\<user>\Documents
Mac OSX:/Users/<user>/Documents
Linux:/home/<user>
User Application Data
Windows XP:C:\Documents and Settings\<user>\Local Settings\Application Data
Windows Vista:C:\Users\<user>\AppData\Local
Mac OSX:/Users/<user>/Library/Application Support
Linux:/home/<user>
User Preferences
Windows XP:C:\Documents and Settings\<user>\Local Settings\Application Data
Windows Vista:C:\Users\<user>\AppData\Local
Mac OSX:/Users/<user>/Library/Preferences
Linux:/home/<user>
Public Documents
Windows XP:C:\Documents and Settings \All Users\Documents
Windows Vista:C:\Users\Public\Documents
Mac OSX:/Library/Application Support
Linux:/usr/local/
Public Application Data
Windows XP:C:\Documents and Settings\ All Users\Application Data
Windows Vista:C:\ProgramData
Mac OSX:/Library/Application Support
Linux:/usr/local/
Public Preferences
Windows XP:C:\Documents and Settings\ All Users\Application Data
Windows Vista:C:\ProgramData
Mac OSX:/Library/Preferences
Linux:/etc
System Libraries
Windows XP:C:\Windows\System32
Windows Vista:C:\Windows\System32
Mac OSX:/Library/Frameworks
Linux:/usr/lib
Application Files
Windows XP:C:\Program Files
Windows Vista:C:\Program Files
Mac OSX:/Applications
Linux:/usr/local/
Volume Root
Windows XP:C:\
Windows Vista:C:\
Mac OSX:/
Linux:/
Temp
Windows XP:C:\Documents and Settings \<user>\Local Settings\Temp
Windows Vista:C:\Users\<user>\AppData \Local\Temp
Mac OSX:/private/tmp/folders.501 /TemporaryItems
Linux:/tmp

Thanks to National Instruments for the data.

See Wikipedia entries Home directory and My Documents. If you're in Java land, values of the os.name property are documented here and here.

Tuesday, March 24, 2009

Great example of what not to do

In case anyone was wondering, one of these web sites got it right. The other got it wrong. First, have a look at the query page from the bug database on java.net: Now, consider the front page of alltheweb.com: Some poor shlep put a lot of work into that query page for the bug database. I'm tempted to shake my head and say, "WTF?", but I know I've been guilty of this kind of thing. To be fair, java.net has a much easier search box, hidden off to the side where bozos like me won't find it. So, here's a reminder not to do shit like that.

Friday, March 20, 2009

Whitney artport

The Whitney artport is a museum of digit art containing a lot of cool stuff. There's some great abstract art code in the CODeDOC project and some processing stuff called software structures. My favorite is this bit from a visualization of breakups called the Dumpster.
Boo fucking hoo! My significant other broke up with me so now I'm going to overdose on Senekot and die shitting in my granny knickers. If I ever commited suicide, that's how I'd do it. And dressed like a clown. With ...

More Hacking NCBI

Writing scripts to interface with NCBI's web site has it's challenges. Getting data from the UCSC genome browser is simpler.

If you need a list of complete genomes, that can be had from the NCBI Genome database. One form of list is the genlist.cgi script. The type parameter seems to be a flag that limits the list to chromosomes, plasmids, or organelle specific sequences. The name parameter seems to be there only for looks. So far, I haven't figured out how to make genlist spit out either XML or text.

Two other scripts can produce text output, lproks and leuks.

These two can be scripted like this using parameters like these: view=1 dump=selected p3=11:|12:Green Algae. This information is available by ftp from ftp://ftp.ncbi.nih.gov/genomes/genomeprj/. There are 3 lproks.txt files, which look to correspond to the three tabs Organism info, Complete genomes, Genomes in progress. lproks_1.txt is the one we want. There's a lot of good information in the ftp directories to plunder.

There seems to be yet a third script: GenomesGroup.cgi. This one is linked from the Virus genomes page.

If I really wanted to suffer, I'd look into NCBI's source. Does anyone know where the source of lproks.cgi or genlist.cgi are? Is that part of the NCBI C++ Toolkit? (which is on macports here.) Maybe it's buried in NCBI's ftp site? Maybe I should ask the NCBI Information Engineering Branch? Maybe I need to start doing something more productive!

Saturday, March 14, 2009

Bug in GlazedLists' AutoCompleteSupport?

I found what looks to me like a couple bugs in the AutoCompleteSupport of GlazedLists. I'm not going to talk about how cool GlazedLists is - it's very cool. I just want to document the bugs. Either that, or document that I'm a bonehead and I'm doing something bass-ackwards with a perfectly good library.

Note: I am doing something boneheaded. I'm documenting a known bug (bug 458). I spammed GlazedLists' mailing list. The first issue, selecting items from the popup, I reported as bug 469.

Here's my repro. I'm working on OS X with Java 10.5.0_16, which I suspect is a factor.

package org.cbare.testglazedlists;

import javax.swing.*;

import ca.odell.glazedlists.BasicEventList;
import ca.odell.glazedlists.EventList;
import ca.odell.glazedlists.swing.AutoCompleteSupport;


public class SpeciesChooser {
 JFrame frame;
 AutoCompleteSupport autocomplete;
 private JComboBox chooser;

 public SpeciesChooser() {
  initGui();
 }

 private void initGui() {
  Box vbox, hbox;
  frame = new JFrame("Species Chooser");
  
  vbox = Box.createVerticalBox();
  vbox.setBorder(BorderFactory.createEmptyBorder(12, 12, 12, 12));
  hbox = Box.createHorizontalBox();
  
  hbox.add(new JLabel("Select Species:"));
  chooser = new JComboBox();
  chooser.setPrototypeDisplayValue("Marmoset");
  hbox.add(chooser);
  SwingUtilities.invokeLater(new Runnable() {
   public void run() {
    autocomplete = AutoCompleteSupport.install(chooser, getSpecies());
   }
  });

  vbox.add(hbox);
  frame.add(vbox);
  frame.pack();
  frame.setVisible(true);
 }

 private EventList getSpecies() {
  EventList result = new BasicEventList();
  result.add("Marmoset");
  result.add("Monkey");
  result.add("Moose");
  result.add("Mouse");
  result.add("Spaaa");
  result.add("Spider");
  result.add("Spidooo");
  return result;
 }

 public static void main(String[] args) {
  SpeciesChooser s = new SpeciesChooser();
 }
}

I used Jing to make a little screencast of my flailing about. Instructions for the repro follow:

First, if I type a prefix which matches several item in the list, and I use the down-arrow to scroll through the matches and pick one, it fails to update the context of the text box. In my example, I type "mo" which matches, Monkey, Moose, Mouse. I press down-arrow twice to select Moose then press Enter. "Monkey" is still in the text box. If I repeat that a couple of times, the popup then shows all choices, not just those matching my prefix. Maybe that's intentional.

Second, if I type something that matches nothing in the list, say "moz" and then backspace over the "z" I get an exception. The stack trace is shown here:

Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
    at java.util.ArrayList.RangeCheck(ArrayList.java:546)
    at java.util.ArrayList.get(ArrayList.java:321)
    at ca.odell.glazedlists.impl.gui.ThreadProxyEventList.applyChangeToCache(ThreadProxyEventList.java:175)
    at ca.odell.glazedlists.impl.gui.ThreadProxyEventList.access$600(ThreadProxyEventList.java:68)
    at ca.odell.glazedlists.impl.gui.ThreadProxyEventList$UpdateRunner.listChanged(ThreadProxyEventList.java:237)
    at ca.odell.glazedlists.event.ListEventAssembler$ListEventFormat.fire(ListEventAssembler.java:412)
    at ca.odell.glazedlists.event.ListEventAssembler$ListEventFormat.fire(ListEventAssembler.java:409)
    at ca.odell.glazedlists.event.SequenceDependenciesEventPublisher$SubjectAndListener.firePendingEvent(SequenceDependenciesEventPublisher.java:445)
    at ca.odell.glazedlists.event.SequenceDependenciesEventPublisher.fireEvent(SequenceDependenciesEventPublisher.java:344)
    at ca.odell.glazedlists.event.ListEventAssembler.commitEvent(ListEventAssembler.java:316)
    at ca.odell.glazedlists.impl.gui.ThreadProxyEventList$UpdateRunner.run(ThreadProxyEventList.java:225)
    at ca.odell.glazedlists.impl.swing.SwingThreadProxyEventList.schedule(SwingThreadProxyEventList.java:33)
    at ca.odell.glazedlists.impl.gui.ThreadProxyEventList.listChanged(ThreadProxyEventList.java:118)
    at ca.odell.glazedlists.event.ListEventAssembler$ListEventFormat.fire(ListEventAssembler.java:412)
    at ca.odell.glazedlists.event.ListEventAssembler$ListEventFormat.fire(ListEventAssembler.java:409)
    at ca.odell.glazedlists.event.SequenceDependenciesEventPublisher$SubjectAndListener.firePendingEvent(SequenceDependenciesEventPublisher.java:445)
    at ca.odell.glazedlists.event.SequenceDependenciesEventPublisher.fireEvent(SequenceDependenciesEventPublisher.java:344)
    at ca.odell.glazedlists.event.ListEventAssembler.commitEvent(ListEventAssembler.java:316)
    at ca.odell.glazedlists.FilterList.constrained(FilterList.java:389)
    at ca.odell.glazedlists.FilterList.changeMatcher(FilterList.java:286)
    at ca.odell.glazedlists.FilterList.changeMatcherWithLocks(FilterList.java:269)
    at ca.odell.glazedlists.FilterList.access$100(FilterList.java:51)
    at ca.odell.glazedlists.FilterList$PrivateMatcherEditorListener.changedMatcher(FilterList.java:443)
    at ca.odell.glazedlists.matchers.AbstractMatcherEditor.fireChangedMatcher(AbstractMatcherEditor.java:115)
    at ca.odell.glazedlists.matchers.AbstractMatcherEditor.fireConstrained(AbstractMatcherEditor.java:73)
    at ca.odell.glazedlists.matchers.TextMatcherEditor.setTextMatcher(TextMatcherEditor.java:321)
    at ca.odell.glazedlists.matchers.TextMatcherEditor.setFilterText(TextMatcherEditor.java:292)
    at ca.odell.glazedlists.swing.AutoCompleteSupport.applyFilter(AutoCompleteSupport.java:1271)
    at ca.odell.glazedlists.swing.AutoCompleteSupport.access$2300(AutoCompleteSupport.java:209)
    at ca.odell.glazedlists.swing.AutoCompleteSupport$AutoCompleteFilter.postProcessDocumentChange(AutoCompleteSupport.java:1497)
    at ca.odell.glazedlists.swing.AutoCompleteSupport$AutoCompleteFilter.remove(AutoCompleteSupport.java:1450)
    at javax.swing.text.AbstractDocument.remove(AbstractDocument.java:572)
    at javax.swing.text.DefaultEditorKit$DeletePrevCharAction.actionPerformed(DefaultEditorKit.java:1030)
    at javax.swing.SwingUtilities.notifyAction(SwingUtilities.java:1576)
    at javax.swing.JComponent.processKeyBinding(JComponent.java:2772)
    ...

In spite of my whining, GlazedLists is very well thought out and a nice piece of work. For later reference, I also reported bug 472.

Thursday, March 12, 2009

Split split

Apparently, there's some disagreement about what it means to split a string into substrings. Biological data frequently comes in good old fashioned tab-delimited text files. That's OK 'cause they're easily parsed in the language and platform of your choice. Most languages with any pretention of string processing offer a split function. So, you read the files line-by-line and split each line on the tab character to get an array of fields.

The disagreement comes about when there are empty fields. Since we're talking text files, there's no saying, "NOT NULL", so it's my presumption that empty fields are possible. Consider the following JUnit test.

import org.apache.log4j.Logger;
import static org.junit.Assert.*;
import org.junit.Test;

public class TestSplit {
  private static final Logger log = Logger.getLogger("unit-test");

  @Test
  public void test1() {
    String[] fields = "foo\t\t\t\t\t\t\tbar".split("\t");
    log.info("fields.length = " + fields.length);
    assertEquals(fields.length, 8);
  }

  @Test
  public void test2() {
    // 7 tabs
    String[] fields = "\t\t\t\t\t\t\t".split("\t");
    log.info("fields.length = " + fields.length);
    assertEquals(fields.length, 8);
  }
}

The first test works. You end up with 8 fields, of which the middle 6 are empty. The second test fails. You get an empty array. I expected this to return an array of 8 empty strings. Java's mutant cousin, Javascript get's this right, as does Python.

Rhino 1.6 release 5 2006 11 18
js> a = "\t\t\t\t\t\t\t";
js> fields = a.split("\t")
,,,,,,,
js> fields.length
8
Python 2.5.1 (r251:54863, Jan 17 2008, 19:35:17)
>>> a = "\t\t\t\t\t\t\t"
>>> fields = a.split("\t")
['', '', '', '', '', '', '', '']
>>> len(fields)
8

Oddly enough, Ruby agrees with Java as does Perl.

>> str = "\t\t\t\t\t\t\t"
=> "\t\t\t\t\t\t\t"
>> fields = str.split("\t")
=> []
>> fields.length
=> 0

My perl is way rusty, so sue me. but, I think this is more or less it:

$str = "\t\t\t\t\t\t\t";
@fields = split(/\t/, $str);
print("fields = [@fields]\n");
$len = @fields;
print("length = $len\n");

Which yields:

fields = []
length = 0

How totally annoying!

Tuesday, March 10, 2009

Schema-less and model-free

Clay Shirkey says:
Schema-less db FTW! 40 yrs ago, EF Codd tied semantics to performance in a shotgun wedding. Now a divorce?

A while back, a colleague used the phrase model-free. I didn't really know what he meant, but I'm starting to think I like it. He was describing a system which provides services over a set of resources (in the RESTful sense) that required no assumptions about any file format or data model for the resources. The services were along the lines of storage, retrieval, search, and access control. So, none of these directly depend on the structure of the resources. (Well, search has to get keywords somehow.) Of course, these features are quite useful and end up getting rewritten over and over again for every new application. Wouldn't it be smart to factor out features like this that can be provided independently of the specifics of a given application? The result might be a sort of schema-less data store in which the data format of the resources would be determined by convention between the producers and the consumers of any particular type of resource - more or less how MIME types work on the web.

Data management is getting to be a colossal problem and use-cases poorly served by the relational database are popping up in increasing numbers. A lot of folks are wondering if it's time to junk the RDBMS?. Relational DBs have tremendous advantages, but also some well-known ways to bite your butt. They're inflexible in that schemas are hard to change. They can inhibit interoperability. Traditional data integration between separately developed schemas is labor intensive and unscalable. Then there's the whole writhing can of worms labeled object-relational mapping.

I'm not sure I can put my finger right on it, but there's a similarity in spirit between REST and schema-less data management.

See also: