Digithead's Lab Notebook: links

Showing posts with label links. Show all posts

Monday, June 06, 2011

Primers in Computational Biology

Nature Biotechnology used to regularly feature primers on various topics in computational biology. Here's an incomplete listing based on what looked interesting to me. Some of these are old, but on topics that are fundamental enough not to go out of style. Lot's of these are just mini-tutorials in machine learning.

What is dynamic programming? Sean R Eddy
What is Bayesian statistics? Sean R Eddy
What is a hidden Markov model? Sean R Eddy
How does gene expression clustering work? Patrik D'haeseleer
Inference in Bayesian networks Chris J Needham, James R Bradford, Andrew J Bulpitt & David R Westhead
What are DNA sequence motifs? Patrik D'haeseleer
How does DNA sequence motif discovery work? Patrik D'haeseleer
What is a support vector machine? William S Noble
How do shotgun proteomics algorithms identify proteins? Edward M Marcotte
What are artificial neural networks? Anders Krogh
What is principal component analysis? Markus Ringnér
What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou
What are decision trees? Carl Kingsford & Steven L Salzberg
Understanding genome browsing Melissa S Cline & W James Kent
How to map billions of short reads onto genomes Cole Trapnell & Steven L Salzberg
How to visually interpret biological data using networks Daniele Merico, David Gfeller & Gary D Bader
How does multiple testing correction work? William S Noble
What is flux balance analysis? Jeffrey D Orth, Ines Thiele & Bernhard Ø Palsson
Analyzing 'omics data using hierarchical models Hongkai Ji & X Shirley Liu

...just in case you're in need of some bed-time reading or some mad comp-bio skillz. Sorry if some of these are behind a pay-wall, but there's usually a way around, under or over such walls.

Tuesday, March 01, 2011

Learning data science skills

According to Hal Varian and just about everyone these days, the hot skills to have are some combination of programming, statistics, machine learning, and visualization. Here are a pile of resources that'll help you get some mad data science skills.

Programming

There seems to be a few main platforms widely used for data intensive programming. R, is a statistical environment that is to statisticians what MatLab is to engineers. It's a weird beast, but it's open source and very powerful, plus has a great community. Python also makes a strong showing, with the help of NumPy, SciPy and matplotlib. An intriguing new entry is the combination of the Lisp dialect Clojure and Incanter. All these tools mix numerical libraries with functional and scripting programming styles in varying proportions. You'll also want to look into Hadoop, to do your big data analytics map-reduce style in the cloud.

Processing, along with Ben Fry's book, Visualizing Data.

Jeffrey Heer's papers, especially Software Design Patterns for Information Visualization. Heer is one of the creators of several toolkits: Prefuse, Flare and Protovis.

7 Classic Foundational Vis Papers and Seminal information visualization papers

Classes

Starting on March 5 at the Hacker Dojo in Mountain View (CA), Mike Bowles and Patricia Hoffmann will present a course on Machine Learning where R will be the "lingua franca" for looking at homework problems, discussing them and comparing different solution approaches. The class will begin at the level of elementary probability and statistics and from that background survey a broad array of machine learning techniques including: Unsupervised Learning, Clustering Techniques, and Fault Detection.

R courses from Statistics.com

Feb 11:  Modeling in R (Sudha Purohit -- more details after the jump)
Mar 4:  Introduction to R - Data Handling (Paul Murrell)
Apr 15:  Programming in R (Hadley Wickham)
Apr 29:  Graphics in R (Paul Murrell)
May 20:  Introduction to R – Statistical Analysis (John Verzani)

Data bootcamp (slides and code) from the Strata Conference. Tutorials covering a handful of example problems using R and python.

Plotting data on maps
Classifying emails
A classification problem in image analysis

Cosma Shalizi at CMU teaches a class: Undergraduate Advanced Data Analysis.

More resources

A great list of machine learning tutorials by Andrew Moore.
There are so many classes, books and lecture videos online these days, you're only limit is the rate at which you can absorb it.
Hadley Wickham's A philosophy of clean data
Abhishek Tiwari points us to a Quora thread: How do I become a data scientist?
Drew Conway's Data Science Venn Diagram, which he expands on in Data science in the US intelligence community. I like Conway's emphasis on the scientific method and hypothesis testing. Drew is coming out with a book soon, Machine Learning for Hackers, that sounds promising.
Good resources for learning about machine learning
Machine Learning in Action

Saturday, November 27, 2010

Git cheat sheet

I'm trying to wrap my head around Git, Linus Torvalds's complicated but powerful distributed version control system. Here's some quick notes and a wad of links:

Configure

git config --global user.name "John Q. Hacker"
git config --global user.email "jqhacker@somedomain.com"

Start a new empty repository

git init

mkdir fooalicious
cd fooalicious
git init
touch README
git add README
git commit -m 'first commit'
git remote add origin git@github.com:nastyhacks/foo.git
git push -u origin master

Create a local copy of a remote repository

git clone [remote-repository]

Commit to local repository

git commit -a -m "my message"

Review previous commits

git log --name-only

See what branches exist

git branch -v

Switch to a different branch

git checkout [branch you want to switch to]

Create a new branch and switch to it

git checkout -b [name of new branch]

Merge

git merge mybranch

merge the development in the branch "mybranch" into the current branch.

Show remote repositories tracked

git remote -v

Track a remote repository

git remote add --track master origin git@github.com:jqhacker/foo.git

Retrieve from a remote repository

git fetch

Git fetch grabs changes from remote repository and puts it in your repository's object database. It also fetches branches from remote repository and stores them as remote-tracking branches. (see this.)

Fetch and merge from a remote repository

git pull

Push to a remote repository

git push

Pull changes from another fork

git checkout -b otherguy-master master
git fetch https://github.com/otherguy/foo.git master
git merge otherguy-master/master

git checkout master
git merge otherguy-master
git push origin master

Resolve merge conflict in favor of us/them

git checkout --theirs another.txt
git checkout --ours some.file.txt

Diff between local working directory and remote tracking branch

Say you're working with Karen on a project. She adds some nifty features to the source file nifty_files/our_code.py. You'd like to diff your local working copy against hers to see the changes, and prepare to merge them in. First, make sure you have a remote tracking branch for Karen's repo.

git remote add karen git://github.com/karen/our_project.git
git remote -v

The results ought to look something like this:

karen git://github.com/karen/our_project.git (fetch)
karen git://github.com/karen/our_project.git (push)
origin git@github.com:cbare/our_project.git (fetch)
origin git@github.com:cbare/our_project.git (push)

Next, fetch Karen's changes into your local repo. Git can't do a diff across the network, so we have to get a local copy of Karen's commits stored in a remote tracking branch.

git fetch karen

Now, we can do our diff.

git diff karen/master:nifty_files/our_code.py nifty_files/our_code.py

Fixing a messed up working tree

git reset --hard HEAD

return the entire working tree to the last committed state

Shorthand naming

Branches, remote-tracking branches, and tags are all references to commits. Git allows shorthand, so you mostly ever shorthand rather than full names:

The branch "test" is short for "refs/heads/test".
The tag "v2.6.18" is short for "refs/tags/v2.6.18".
"origin/master" is short for "refs/remotes/origin/master".

Tuesday, July 20, 2010

How to design good APIs

A long time ago, I asked a bunch of programming gurus how to go about designing an API. Several gave answers that boiled down to the unsettling advice, "Try to get it right the first time," to which a super-guru then added, "...but you'll never get it right the first time." With that zen wisdom in mind, here's a pile of resources that may help get it slightly less wrong.

Joshua Bloch, designer of the Java collection classes and author of Effective Java, gives a Google tech-talk called How to Design a Good API & Why it Matters. Video for another version of the same talk is available on InfoQ. He starts off with the observation that, "Good programming is modular. Module boundaries are APIs."

Characteristics of a Good API

Easy to learn

Easy to use, even without documentation

Hard to misuse

Easy to read and maintain code that uses it

Sufficiently powerful to satisfy requirements

Easy to extend

Appropriate to audience

Michi Henning, in API Design Matters, Communications of the ACM, May 2009, observes that, "An API is a user interface. APIs should be designed from the perspective of the caller."

Much of software development is about creating abstractions, and APIs are the visible interfaces to these abstractions. Abstractions reduce complexity because they throw away irrelevant detail and retain only the information that is necessary for a particular job. Abstractions do not exist in isolation; rather, we layer abstractions on top of each other. [...] This hierarchy of abstraction layers is an immensely powerful and useful concept. Without it, software as we know it could not exist because programmers would be completely overwhelmed by complexity.

Because you'll get it wrong the first time, and just because things change, you'll have to evolve APIs. Breaking clients is unpleasant, but "Backward compatibility erodes APIs over time."

My own little bit of wisdom is this: Performance characteristics are often part of the API. Unless stated otherwise, the caller will assume that a function will complete quickly. For example, it often seems like a good idea to make remote method calls look just like local method calls. This is a bad idea, because you can't abstract away time.

Wednesday, April 21, 2010

Analytics vs Transaction processing

Analytics is driving developments in software engineering these days the same way transaction processing did in the 70's and 80's. Machine learning and data mining, often of big data sets with graph topologies, and often done in the cloud, applied to fields as diverse as social networks, business intelligence and scientific computing are giving rise to new software architectures made from ingredients like map-reduce and NoSQL.

Relational databases are probably the best example of real engineering in software - predictable performance based solidly on theory. But all tools are shaped by the problem they were designed to solve. For relational databases, that problem was transaction processing, as exemplified by the ATM network and the airline reservation system. Typically in these types of apps, small amounts of data are relevant to any given transaction and the transactions are algorithmically uncomplicated. The challenge comes from supporting masses of concurrent updates.

In contrast, imagine the kinds of questions Walmart might want to ask its data.

Who are credit-worthy sports fans with high-end buying habits who haven't recently purchased big-screen TVs?

Find home owners with kids within an age range whose spending patterns are uncorrelated with the business cycle and who have a history of responding to promotions.

What products should be in the seasonal section? Where in the store should that section be located?

The shapes of these questions are very different from traditional online transaction processing. Data for analytics is written once, updated only additively, and mined for patterns. And flexibility is a big deal as new types of data are required. As problem and solution become increasingly mismatched, friction increases. So, it's good that people are experimenting with alternatives.

If OLTP shaped relational databases, it might not be stretching too far to say that object-oriented programming was shaped by graphical user interfaces. At least, the design patterns book is chock-full of GUI examples. So, it's interesting to note the success of the functional-programming inspired map-reduce pattern for distributed computing, as part of Google's search. And search is just a query on unstructured data. These days, map-reduce is being applied to all sorts of problems, particularly in machine learning.

In scientific computing, the data deluge is turning to data-driven science as more data becomes available for analyzing and more research questions are being addressed by machine learning. Even computer science is changing to due to big data.

If you like programming with linked data, these are interesting times. There's a lot of creativity swirling around the intersection if distributed computing and storage, big data, machine learning and data mining. Someone once said, “Shape your tools or be shaped by them.” So, after years of being shaped by the transaction processing toolkit, it's refreshing to see a new generation of software tools being shaped by analytics.

Link stew

Wednesday, April 14, 2010

HTML CSS and JavaScript References

Here's a place for web and html related reference material.

CSS margin and padding

top, right, bottom, left

HTML Entities

Result	Description	Entity Name	Entity Number
	non-breaking space
<	less than	<	<
>	greater than	>	>
&	ampersand	&	&
"	quotation mark	"	"
'	apostrophe	'	'
“	left double quote	“	&#147 / “
”	right double quote	”	&#148 / ”
×	multiplication	×	×
÷	division	÷	÷
©	copyright	©	©

HTML template

<html>

<head>

<title></title>

<link rel="stylesheet" type="text/css" href="style.css" />

</head>

<body>

<h1>Example Page</h1>
<p></p>

</body>
</html>

Style

<link rel="stylesheet" type="text/css" href="style.css" />

<style type="text/css">
.myborder
{
border:1px solid black;
}
</style>

Table

<h2>Table</h2>

<table>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>

List

<h2>List</h2>

<ul>
<li><a href=""></a></li>
<li><a href=""></a></li>
</ul>

Tuesday, March 09, 2010

Protovis: data visualization in the browser

The Javascript world keeps getting cooler. What with JITting VMs like Google's V8 and Mozilla's JaegerMonkey, frameworks like prototype and jQuery, and an effort towards a standard library (commonjs), javascript is looking more and more like a respectable programming language.

The visualization toolkit Protovis is a taste of things to come. Check out the super-slick examples. One of the framework's creators, Jeffrey Heer, is also the designer of two other visualization toolkits, Prefuse for Java and Flare for Flash and also wrote a nice paper about Software Design Patterns for Information Visualization. In, Protovis: A Graphical Toolkit for Visualization, Heer and coauthor Michael Bostock explain that Protovis is aimed at a niche somewhere between point-and-click chart making programs like Excel and direct manipulation of vector graphic primitives as in Processing.

At heart, Protovis is a small domain specific language for charting. Javascript works surprisingly well as a host language. Like other DSLs, they use method chaining, where methods return the object they were called on, allowing the next method call to be tacked right on. If this isn't familiar, think of StringBuilder in Java.

Example

(stolen from the Protovis docs.)

var vis = new pv.Panel()
    .width(150)
    .height(150);

vis.add(pv.Bar)
    .data([1, 1.2, 1.7, 1.5, .7])
    .bottom(0)
    .width(20)
    .height(function(d) d * 80)
    .left(function() this.index * 25);

vis.render();

The way they've implemented “smart properties” is particularly clever. Property accessors accept either constant values or functions. The assignment of height of bars in the bar-graph example above is done this way. Marks (the protovis term for any graphical element) are associated with arrays of data. A pv.Bar, given a 5 element array, draws 5 bars. We've defined height to be a function of d. So, for each element d in the data array, we get a bar of height d * 80. This is a pleasantly seamless mixture of OO and functional constructs which also brings in something of the flavor of vectorized operations in R.

If you like reading code, the Protovis code is very nicely laid out and makes elegant use of Javascript's quirky set of language features.

Browsing genomes

I hacked up a quick test, which is (what else?) a genome browser. Looks pretty good for a dirt-simple hundred or so lines of code. Note that my quick hack loads up about 8MB of data, which will take some time over slow connections.

Live version.
Check out the source.

One bummer is that Protovis doesn't work in current versions of IE. Still, it works nicely in Firefox and Safari and is especially snappy in Chrome. It sounds as if IE support might happen soon.

For more on Protoviz, check out Robert Kosara's A Protovis Primer.

Wednesday, December 09, 2009

Distilling Free-Form Natural Laws from Experimental Data

Eureqa software implements the symbolic regression technique described in this paper:

Schmidt M., Lipson H. (2009) "Distilling Free-Form Natural Laws from Experimental Data," Science, Vol. 324, no. 5923, pp. 81 - 85.

Monday, November 30, 2009

Design Patterns 15 years later

Design Patterns 15 Years Later: An Interview with Erich Gamma, Richard Helm, and Ralph Johnson was recently featured on Lambda the Ultimate.

Some say design patterns are just work-arounds for the defects of C++. The paper Essential Programming Paradigm argues that design patterns occur because the programming paradigm disallows certain run-time composition of dynamic and static code. The GoF authors confirm that their design patterns fit object-oriented languages, and arise specifically from experience with C++ and Smalltalk, so are tied to language of implementation. "Design patterns eventually emerge for any language. Design déjà-vu is language neutral." Different design patterns may be emerging for dynamic languages or for functional languages.

They discuss the development of more design patterns beyond the 23 examples chosen for the Design Patterns book. Eric Gamma suggest some sort of collective intelligence approach for editing design patterns and rating their importance and applicability. Sounds like a good idea. Some new patterns they mention as candidates for inclusion in a revised set are: Null Object, Type Object, Dependency Injection, and Extension Object/Interface. Their new (draft) categorization of design patterns looks like this:

They seem to have dropped several, some of which I won't miss. but why axe composite or observer? And bridge, maybe not the most useful in practise, but when I finally understood what they meant, I felt like I had accomplished something.

Design patterns links

Wikipedia entry for Design pattern
Architectural pattern
Patterns of Enterprise Application Architecture
Enterprise Integration Patterns
Jeffrey Heer's Software Design Patterns for Information Visualization; IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 12, NO. 5, SEPTEMBER/OCTOBER 2006
Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects
Portland Pattern Repository
Peter Norvig's Design Patterns in Dynamic Programming
Rethinking Design Patterns

Saturday, November 28, 2009

Leroy Hood on a career in science

Leroy Hood, the founder of the Institute for Systems Biology, where I've worked for 3 years now, wrote up some career advice for scientists last year. It probably applies fairly well to any professional.

I leave students (and even some of my colleagues) with several pieces of advice. First, I stress the importance of a good cross-disciplinary education. Ideally, I suggest a double major with the two fields being orthogonal-say, biology with computer science or applied physics. Some argue that there is insufficient time to learn two fields deeply at the undergraduate level.
I argue that this is not true. If we realize that many undergraduate courses now taught are filled with details that are immediately forgotten after the course is finished, we must then learn to teach in an efficiently conceptual manner. As I noted above, as an undergraduate at Caltech I had Feynman for physics and Pauling for chemistry, and both provided striking examples of the power of conceptual teaching.
Second, I argue that students should grow accustomed to working together in teams: In the future, there will be many hard problems (like P4 medicine) that will require the focused integration of many different types of expertise.
Third, I suggest that students acquire an excellent background in mathematics and statistics and develop the ability to use various computational tools. Fourth, I argue that a scholar, academic, scientist, or engineer should have four major professional objectives: (a) scholarship, (b) education (teaching), (c) transferring knowledge to society, and (d ) playing a leadership role in the local community to help it become the place in which one would like one’s children and grandchildren to live.
Fifth, with regard to the scientific careers of many scientists-they can be described as bellshaped curves of success-they rise gradually to a career maximum and then slowly fall back toward the base line. To circumvent this fate, I propose a simple solution: a major change in career focus every 10 or so years. By learning a new field and overcoming the attendant insecurities that come from learning new areas, one can reset the career clock. Moreover, with a different point of view and prior experience, one can make fundamental new contributions to the new field by thinking outside the box. Then the new career curve can be a joined series of the upsides of the bellshaped curve, each reinvigorated by the ten-year changes.
Finally, science is all about being surrounded by wonderful colleagues and having fun with them, so I recommend choosing one’s science, environment, and colleagues carefully. I end this discussion with what I stressed at the beginning-I am so fortunate to have been surrounded by outstanding colleagues who loved science and engineering. Science for each of us is a journey with no fixed end goal. Rather, our goals are continually being redefined.

ISB recently topped 300 employees and, as of early 2009, had a budget of $55 million. Dr. Hood turned 70 in 2008.

Hamming on you and your research

Friday, November 06, 2009

The Stories Networks Tell

Dr. Carl Bergstrom gave a very cool lecture on “The Stories Networks Tell” at UW a week or two ago. Eigenfactor applies network analysis methods to mapping the citation structure of academic journals. He has a nice method of discovering modularity in networks using random walks. There are some nice flash interactive graphics on his lab's website. It's cool that he's published on economics, evolution, infectious disease, and information theory. On the theme of biology and economics:

Saturday, September 05, 2009

Closure

I've heard nice things about Clojure. Stuart Halloway has a book (Programming Clojure) out through the Pragmatic Programmers. The book grew out of Stuart's Java.next series of articles. There's also a podcast.

Clojure, like Scala, is functional programming for the JVM. Scala (which I fooled around with a little) descends from the ML/Caml/O-caml family of languages with an emphasis on pattern matching and static typing with type inference. Clojure is a dynamic Ruby-influenced dialect of Lisp specifically targeted to the JVM (hence the j). Yet another programming language on the digithead's cool-technology-I-want-to-play-with list.

Update

Clojure elevator pitch
Clojure tutorial slides and video
Matt Sears' Clojure links
Mark Volkmann's Clojure tutorial
Incanter is a Clojure-based, R-like platform for statistical computing and graphics
Labrepl is an environment for exploring the Clojure language
Clojure links from a commenter on Hacker News
Thoughts on Clojure
Book: Joy of Clojure

Wednesday, July 08, 2009

Books

Sunday, June 21, 2009

Semantic data in life science

When I first saw RDF I said, "Blech!" Well, that's also what I said when I first saw HTML. But, I'm coming around to it. Sure, it's not pretty, but RDF is a graph and graphs are cool. And, we're starting to see more and more that the relational model of data works less well in some situations than it does in transaction processing, while RDF related tools are moving from vaporware to something more practical.

With the web, a new data model is growing up which can be generalized as a graph, where nodes and edges have properties, plus indexes to quickly find sets of nodes in the graph. The web and its search engines are an instance of this pattern. As the web becomes a channel for structured data, it gets more natural to model your data like this, too. Biology has a great tradition of open data and the network is already a workhorse of modern biology. So, why not structure you data that way?

Tim Berners-Lee, in a TED-talk on the blooming of Linked Data, points out the huge untapped potential of integrating the separate data silos distributed all over the web. Because biology was an early adopter of open data, some of its key assets are open, but poorly linked and not very programmable. Maybe the Semantic Web of Life Science will change that particularly in Systems Biology, which demands the integration of diverse types of data.

Clay Shirky criticized the semantic web for its links to AI, and deductive reasoning, asking "What is the Semantic Web good for?" Well, maybe data integration, rather than inference, is the answer.

Linked Data - Connect Distributed Data across the Web
Web as platform: Linking data. RDF stores and augmentation
DBpedia
Freebase
Talis
Programmable Web
Bio2RDF and their blog
Gene Expression Atlas

Friday, June 05, 2009

Practical semantic web

Toby Segaran, author of the super-fun Programming Collective Intelligence, has a nice talk titled Why Semantics? available about practical semantic web. If you immediately think of jumbo shrimp or military intelligence, you're not alone. But, his talk isn't pie-in-the-sky. He explains some of the contortions commonly used to shoe-horn freeform data into relational databases, then shows that these issues are being addressed using the graph databases that are part of the semantic web effort.

His upcoming book on the subject is Programming the Semantic Web.

He mentions a few good resources:

Sesame - a RDF data store

Exhibit from MIT's Simile project

Wednesday, April 29, 2009

MySQL cheat-sheet

Well, now that MySQL is Oracle's SQL, I dunno how long this information will remain useful. But, here it goes:

Start

If you've installed MySQL by HomeBrew:

mysql.server start

otherwise...

sudo /usr/local/mysql/bin/mysqld_safe
cnt-z, bg

Set root password

mysqladmin -u root -pcurrentpassword password 'newpassword'

Console

mysql -p -u root

Create DB

create database foo;
use foo;

Create User

create user 'bar'@'localhost' identified by 'some_pass';
grant all privileges on foo.* to 'bar'@'localhost';
grant all privileges on foo.* to 'bar'@'localhost' with grant option;

Show Users

select host, user, password from mysql.user;

Shutdown

mysqladmin -p -u root shutdown

Load data from a table

LOAD DATA infile '/temp/myfile.tsv' INTO TABLE my_table IGNORE 1 lines;

You might get ERROR 13 (HY000): Can't get stat of ... caused by permissions. I get around it by giving full permissions to the file and its parent directory. See man stat for more.

Dump and restore data

mysqldump -p -u [user] [dbname] | gzip > [filename]
gunzip < [filename] | mysql -p -u [user] [dbname]

Docs for the mysqladmin tool and other client programs. SQL syntax docs for create table and select, insert, update, and delete.

BTW, where a server isn't needed, I'm starting to like SQLite a lot.

Friday, March 20, 2009

More Hacking NCBI

Writing scripts to interface with NCBI's web site has it's challenges. Getting data from the UCSC genome browser is simpler.

If you need a list of complete genomes, that can be had from the NCBI Genome database. One form of list is the genlist.cgi script. The type parameter seems to be a flag that limits the list to chromosomes, plasmids, or organelle specific sequences. The name parameter seems to be there only for looks. So far, I haven't figured out how to make genlist spit out either XML or text.

Two other scripts can produce text output, lproks and leuks.

These two can be scripted like this using parameters like these: view=1 dump=selected p3=11:|12:Green Algae. This information is available by ftp from ftp://ftp.ncbi.nih.gov/genomes/genomeprj/. There are 3 lproks.txt files, which look to correspond to the three tabs Organism info, Complete genomes, Genomes in progress. lproks_1.txt is the one we want. There's a lot of good information in the ftp directories to plunder.

There seems to be yet a third script: GenomesGroup.cgi. This one is linked from the Virus genomes page.

If I really wanted to suffer, I'd look into NCBI's source. Does anyone know where the source of lproks.cgi or genlist.cgi are? Is that part of the NCBI C++ Toolkit? (which is on macports here.) Maybe it's buried in NCBI's ftp site? Maybe I should ask the NCBI Information Engineering Branch? Maybe I need to start doing something more productive!

Wednesday, February 25, 2009

Ruby Docs

Ruby cheat sheet. Quick links to Ruby documentation on the web.

ruby-lang.org

Ruby-doc

List operations

map = map/collect or map!/collect!
filter = select, reject, find_all, partition
reduce = inject

Test membership in an Array

Does the array contain the element?

my_array.include?('element')

Ruby on Rails

rails v2.3.8 API documentation

rails current version API documentation

Ruby QuickRef

Pickaxe book

21 Ruby Tricks You Should Be Using In Your Own Code

Also see Ruby quirks and What is a Ruby code block?

Rubular - regex tester

Monday, February 09, 2009

Ruby Quirks

Ruby is a fun language, but here are a few things that tripped up this n00b whilst stumbling along the learning curve.

Why does string[n] return an integer character code instead of the character? Well, Ruby has no such thing as a character (prior to 2.0, anyway?), so then why not a string of length 1?

>> a = "asdf"
=> "asdf"
>> a[0]
=> 97

Either of these works, although they look a little funny:

>> a[0...1]
=> "a"
>> a[0..0]
=> "a"

Also, a[0].chr works. The integer.chr method is described as follows:

int.chr => string
Returns a string containing the ASCII character represented by the receiver‘s value.

It's non-obvious how to iterate through the characters of a string. The string.each_char is listed in the Ruby core 1.8.6 ruby-docs, but, confusingly, you have to require "jcode" for it to work. Maybe I'm just confused about whether core means "loaded by default" or "included in the Ruby distribution".

Two toString methods?In place of object.toString() Ruby has two methods: to_s and inspect. When coercion to a string is required, to_s is called. Docs for inspect say this:

Returns a string containing a human-readable representation of obj. If not overridden, uses the to_s method to generate the string.

If, then, else, elsif

If statements are confusing...

if x==123 then
   puts 'wazoo'
end

# then is optional, as long as you have the line break
if x==123
   puts 'wazoo'
end

For one liners, then is required. Or colon, if you prefer.

if x==123 then puts 'wazoo' end
if x==123 : puts 'wazoo' end

Curly braces seem not to work at all for if statements. For more curly brace related philosophy and WTFs see this issue. DON'T DO THIS:

if x==123 { puts 'qwer' }

Finally, would someone tell all these languages that crawl out of the primordial Bourne-shell ooze that neither elif nor elsif means jack shit in the english language?!?!?! (sputter... rant... fume...)

if x==123
  puts 'wazoo'
elsif x==456
  puts 'flapdoodle'
else
  puts 'foo'
end

An if statement is an expression and returns a value, but ruby also offers the good old ternary operator.

Require vs. Load

There are two ways to import code in Ruby, require and load. See the ruby docs for Kernel (require and load).

Defined? and nil?

Ruby has nil instead of null. Ok, and unlike Java's null, nil is a real object. I appreciate the difference between nil and undefined, but I wouldn't have guessed that defined? nil would return "nil". Not to be confused with a truly undefined variable, in which case defined? asdf returns nil. The pickaxe book explains the other strange return values of defined?. Then, there's nil?.

>> asdf.nil?
NameError: undefined local variable or method `asdf' for main:Object
 from (irb):47
 from :0
>> asdf = nil
=> nil
>> asdf.nil?
=> true

Command line arguments

Just the array ARGV. Not a quirk. Good.

Return

Return behaves oddly; to exit a script you use Kernel::exit(integer). Trying to return 1 instead causes a LocalJumpError whatever that means?? Trying to return from a code block returns from the surrounding context. That hurts my head.

Ruby Exception Handling

Ruby's equivalent of try-catch-finally is begin-rescue-ensure-end.

No Boolean

There's no Boolean class in Ruby. Instead, there's TrueClass and FalseClass. So, what type of value does the logical expression p ^ q produce? Everything has an implicit boolean value, which is true for everything except false and nil.

List operations

map = map/collect or map!/collect!
filter = select, reject, find_all, partition
reduce = inject

(see also Array and Enumerable)

Wednesday, February 04, 2009

Bioinformatics visualization workflow

An interesting post about a visualization workflow in bioinformatics.

Monday, June 06, 2011

Tuesday, March 01, 2011

Programming

Statistics

Machine Learning

Visualization

Classes

More resources

Saturday, November 27, 2010

Links

Tuesday, July 20, 2010

Links

Wednesday, April 21, 2010

Link stew

Related posts

Wednesday, April 14, 2010

CSS margin and padding

HTML Entities

HTML template

Style

Table

List

Tuesday, March 09, 2010

Example

Browsing genomes

Grab bag of Javascript and Visualization links

Wednesday, December 09, 2009

Monday, November 30, 2009

Saturday, November 28, 2009

Friday, November 06, 2009

Saturday, September 05, 2009

Update

Wednesday, July 08, 2009

Sunday, June 21, 2009

Friday, June 05, 2009

Wednesday, April 29, 2009

Friday, March 20, 2009

Wednesday, February 25, 2009

List operations

Test membership in an Array

Ruby on Rails

Monday, February 09, 2009

If, then, else, elsif

Require vs. Load

Defined? and nil?

Command line arguments

Return

Ruby Exception Handling

No Boolean

List operations

More

Wednesday, February 04, 2009

About

About Me

Blog Archive

Labels

Cheat Sheets

Feedz

Featured on