Showing posts with label links. Show all posts
Showing posts with label links. Show all posts

Monday, June 06, 2011

Primers in Computational Biology

Nature Biotechnology used to regularly feature primers on various topics in computational biology. Here's an incomplete listing based on what looked interesting to me. Some of these are old, but on topics that are fundamental enough not to go out of style. Lot's of these are just mini-tutorials in machine learning.

...just in case you're in need of some bed-time reading or some mad comp-bio skillz. Sorry if some of these are behind a pay-wall, but there's usually a way around, under or over such walls.

Tuesday, March 01, 2011

Learning data science skills

According to Hal Varian and just about everyone these days, the hot skills to have are some combination of programming, statistics, machine learning, and visualization. Here are a pile of resources that'll help you get some mad data science skills.

Programming

There seems to be a few main platforms widely used for data intensive programming. R, is a statistical environment that is to statisticians what MatLab is to engineers. It's a weird beast, but it's open source and very powerful, plus has a great community. Python also makes a strong showing, with the help of NumPy, SciPy and matplotlib. An intriguing new entry is the combination of the Lisp dialect Clojure and Incanter. All these tools mix numerical libraries with functional and scripting programming styles in varying proportions. You'll also want to look into Hadoop, to do your big data analytics map-reduce style in the cloud.

Statistics

  • John Verzani's Using R for Introductory Statistics, which I'm working my way through.
  • Machine Learning

  • Toby Segaran's Programming Collective Intelligence
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction
  • Bishop's Pattern Recognition and Machine Learning
  • Machine Learning, Tom Mitchell
  • Visualization

  • Tufte's books, especially The Visual Display of Quantitative Information
  • Processing, along with Ben Fry's book, Visualizing Data.
  • Jeffrey Heer's papers, especially Software Design Patterns for Information Visualization. Heer is one of the creators of several toolkits: Prefuse, Flare and Protovis.
  • 7 Classic Foundational Vis Papers and Seminal information visualization papers
  • Classes

    Starting on March 5 at the Hacker Dojo in Mountain View (CA), Mike Bowles and Patricia Hoffmann will present a course on Machine Learning where R will be the "lingua franca" for looking at homework problems, discussing them and comparing different solution approaches. The class will begin at the level of elementary probability and statistics and from that background survey a broad array of machine learning techniques including: Unsupervised Learning, Clustering Techniques, and Fault Detection.

    R courses from Statistics.com

    Feb 11:  Modeling in R (Sudha Purohit -- more details after the jump)
    Mar 4:  Introduction to R - Data Handling (Paul Murrell)
    Apr 15:  Programming in R (Hadley Wickham)
    Apr 29:  Graphics in R (Paul Murrell)
    May 20:  Introduction to R – Statistical Analysis (John Verzani)

    Data bootcamp (slides and code) from the Strata Conference. Tutorials covering a handful of example problems using R and python.

    • Plotting data on maps
    • Classifying emails
    • A classification problem in image analysis

    Cosma Shalizi at CMU teaches a class: Undergraduate Advanced Data Analysis.

    More resources

    Saturday, November 27, 2010

    Git cheat sheet

    I'm trying to wrap my head around Git, Linus Torvalds's complicated but powerful distributed version control system. Here's some quick notes and a wad of links:

    Configure

    git config --global user.name "John Q. Hacker"
    git config --global user.email "jqhacker@somedomain.com"
    

    Start a new empty repository

    git init
    mkdir fooalicious
    cd fooalicious
    git init
    touch README
    git add README
    git commit -m 'first commit'
    git remote add origin git@github.com:nastyhacks/foo.git
    git push -u origin master
    

    Create a local copy of a remote repository

    git clone [remote-repository]

    Commit to local repository

    git commit -a -m "my message"

    Review previous commits

    git log --name-only

    See what branches exist

    git branch -v

    Switch to a different branch

    git checkout [branch you want to switch to]

    Create a new branch and switch to it

    git checkout -b [name of new branch]

    Merge

    git merge mybranch

    merge the development in the branch "mybranch" into the current branch.

    Show remote repositories tracked

    git remote -v

    Track a remote repository

    git remote add --track master origin git@github.com:jqhacker/foo.git

    Retrieve from a remote repository

    git fetch

    Git fetch grabs changes from remote repository and puts it in your repository's object database. It also fetches branches from remote repository and stores them as remote-tracking branches. (see this.)

    Fetch and merge from a remote repository

    git pull

    Push to a remote repository

    git push

    Pull changes from another fork

    git checkout -b otherguy-master master
    git fetch https://github.com/otherguy/foo.git master
    git merge otherguy-master/master
    
    git checkout master
    git merge otherguy-master
    git push origin master
    

    Resolve merge conflict in favor of us/them

    git checkout --theirs another.txt
    git checkout --ours some.file.txt
    

    Diff between local working directory and remote tracking branch

    Say you're working with Karen on a project. She adds some nifty features to the source file nifty_files/our_code.py. You'd like to diff your local working copy against hers to see the changes, and prepare to merge them in. First, make sure you have a remote tracking branch for Karen's repo.

    git remote add karen git://github.com/karen/our_project.git
    git remote -v
    

    The results ought to look something like this:

    karen git://github.com/karen/our_project.git (fetch)
    karen git://github.com/karen/our_project.git (push)
    origin git@github.com:cbare/our_project.git (fetch)
    origin git@github.com:cbare/our_project.git (push)
    

    Next, fetch Karen's changes into your local repo. Git can't do a diff across the network, so we have to get a local copy of Karen's commits stored in a remote tracking branch.

    git fetch karen
    

    Now, we can do our diff.

    git diff karen/master:nifty_files/our_code.py nifty_files/our_code.py
    

    Fixing a messed up working tree

    git reset --hard HEAD

    return the entire working tree to the last committed state

    Shorthand naming

    Branches, remote-tracking branches, and tags are all references to commits. Git allows shorthand, so you mostly ever shorthand rather than full names:

    • The branch "test" is short for "refs/heads/test".
    • The tag "v2.6.18" is short for "refs/tags/v2.6.18".
    • "origin/master" is short for "refs/remotes/origin/master".

    Links

    Tuesday, July 20, 2010

    How to design good APIs

    A long time ago, I asked a bunch of programming gurus how to go about designing an API. Several gave answers that boiled down to the unsettling advice, "Try to get it right the first time," to which a super-guru then added, "...but you'll never get it right the first time." With that zen wisdom in mind, here's a pile of resources that may help get it slightly less wrong.

    Joshua Bloch, designer of the Java collection classes and author of Effective Java, gives a Google tech-talk called How to Design a Good API & Why it Matters. Video for another version of the same talk is available on InfoQ. He starts off with the observation that, "Good programming is modular. Module boundaries are APIs."

    Characteristics of a Good API

    • Easy to learn
    • Easy to use, even without documentation
    • Hard to misuse
    • Easy to read and maintain code that uses it
    • Sufficiently powerful to satisfy requirements
    • Easy to extend
    • Appropriate to audience
    Michi Henning, in API Design Matters, Communications of the ACM, May 2009, observes that, "An API is a user interface. APIs should be designed from the perspective of the caller."
    Much of software development is about creating abstractions, and APIs are the visible interfaces to these abstractions. Abstractions reduce complexity because they throw away irrelevant detail and retain only the information that is necessary for a particular job. Abstractions do not exist in isolation; rather, we layer abstractions on top of each other. [...] This hierarchy of abstraction layers is an immensely powerful and useful concept. Without it, software as we know it could not exist because programmers would be completely overwhelmed by complexity.

    Because you'll get it wrong the first time, and just because things change, you'll have to evolve APIs. Breaking clients is unpleasant, but "Backward compatibility erodes APIs over time."

    My own little bit of wisdom is this: Performance characteristics are often part of the API. Unless stated otherwise, the caller will assume that a function will complete quickly. For example, it often seems like a good idea to make remote method calls look just like local method calls. This is a bad idea, because you can't abstract away time.

    Links

    Wednesday, April 21, 2010

    Analytics vs Transaction processing

    Analytics is driving developments in software engineering these days the same way transaction processing did in the 70's and 80's. Machine learning and data mining, often of big data sets with graph topologies, and often done in the cloud, applied to fields as diverse as social networks, business intelligence and scientific computing are giving rise to new software architectures made from ingredients like map-reduce and NoSQL.

    Relational databases are probably the best example of real engineering in software - predictable performance based solidly on theory. But all tools are shaped by the problem they were designed to solve. For relational databases, that problem was transaction processing, as exemplified by the ATM network and the airline reservation system. Typically in these types of apps, small amounts of data are relevant to any given transaction and the transactions are algorithmically uncomplicated. The challenge comes from supporting masses of concurrent updates.

    In contrast, imagine the kinds of questions Walmart might want to ask its data.

    Who are credit-worthy sports fans with high-end buying habits who haven't recently purchased big-screen TVs?
    Find home owners with kids within an age range whose spending patterns are uncorrelated with the business cycle and who have a history of responding to promotions.
    What products should be in the seasonal section? Where in the store should that section be located?

    The shapes of these questions are very different from traditional online transaction processing. Data for analytics is written once, updated only additively, and mined for patterns. And flexibility is a big deal as new types of data are required. As problem and solution become increasingly mismatched, friction increases. So, it's good that people are experimenting with alternatives.

    If OLTP shaped relational databases, it might not be stretching too far to say that object-oriented programming was shaped by graphical user interfaces. At least, the design patterns book is chock-full of GUI examples. So, it's interesting to note the success of the functional-programming inspired map-reduce pattern for distributed computing, as part of Google's search. And search is just a query on unstructured data. These days, map-reduce is being applied to all sorts of problems, particularly in machine learning.

    In scientific computing, the data deluge is turning to data-driven science as more data becomes available for analyzing and more research questions are being addressed by machine learning. Even computer science is changing to due to big data.

    If you like programming with linked data, these are interesting times. There's a lot of creativity swirling around the intersection if distributed computing and storage, big data, machine learning and data mining. Someone once said, “Shape your tools or be shaped by them.” So, after years of being shaped by the transaction processing toolkit, it's refreshing to see a new generation of software tools being shaped by analytics.

    Link stew

    Related posts

    Wednesday, April 14, 2010

    HTML CSS and JavaScript References

    Here's a place for web and html related reference material.

    CSS margin and padding

    • top, right, bottom, left

    HTML Entities

    Result Description Entity Name Entity Number
      non-breaking space    
    < less than &lt; &#60;
    > greater than &gt; &#62;
    & ampersand &amp; &#38;
    " quotation mark &quot; &#34;
    ' apostrophe  &apos; &#39;
    left double quote &ldquo; &#147 / &#8220;
    right double quote &rdquo; &#148 / &#8221;
    × multiplication &times; &#215;
    ÷ division &divide; &#247;
    © copyright &copy; &#169;

    HTML template

    <html>
    
    <head>
    
    <title></title>
    
    <link rel="stylesheet" type="text/css" href="style.css" />
    
    </head>
    
    <body>
    
    <h1>Example Page</h1>
    <p></p>
    
    </body>
    </html>
    

    Style

    <link rel="stylesheet" type="text/css" href="style.css" />
    
    <style type="text/css">
    .myborder
    {
    border:1px solid black;
    }
    </style>
    

    Table

    <h2>Table</h2>
    
    <table>
    <tr>
    <td></td>
    <td></td>
    </tr>
    <tr>
    <td></td>
    <td></td>
    </tr>
    </table>
    

    List

    <h2>List</h2>
    
    <ul>
    <li><a href=""></a></li>
    <li><a href=""></a></li>
    </ul>
    

    Tuesday, March 09, 2010

    Protovis: data visualization in the browser

    The Javascript world keeps getting cooler. What with JITting VMs like Google's V8 and Mozilla's JaegerMonkey, frameworks like prototype and jQuery, and an effort towards a standard library (commonjs), javascript is looking more and more like a respectable programming language.

    The visualization toolkit Protovis is a taste of things to come. Check out the super-slick examples. One of the framework's creators, Jeffrey Heer, is also the designer of two other visualization toolkits, Prefuse for Java and Flare for Flash and also wrote a nice paper about Software Design Patterns for Information Visualization. In, Protovis: A Graphical Toolkit for Visualization, Heer and coauthor Michael Bostock explain that Protovis is aimed at a niche somewhere between point-and-click chart making programs like Excel and direct manipulation of vector graphic primitives as in Processing.

    At heart, Protovis is a small domain specific language for charting. Javascript works surprisingly well as a host language. Like other DSLs, they use method chaining, where methods return the object they were called on, allowing the next method call to be tacked right on. If this isn't familiar, think of StringBuilder in Java.

    Example

    (stolen from the Protovis docs.)

    var vis = new pv.Panel()
        .width(150)
        .height(150);
    
    vis.add(pv.Bar)
        .data([1, 1.2, 1.7, 1.5, .7])
        .bottom(0)
        .width(20)
        .height(function(d) d * 80)
        .left(function() this.index * 25);
    
    vis.render();

    The way they've implemented “smart properties” is particularly clever. Property accessors accept either constant values or functions. The assignment of height of bars in the bar-graph example above is done this way. Marks (the protovis term for any graphical element) are associated with arrays of data. A pv.Bar, given a 5 element array, draws 5 bars. We've defined height to be a function of d. So, for each element d in the data array, we get a bar of height d * 80. This is a pleasantly seamless mixture of OO and functional constructs which also brings in something of the flavor of vectorized operations in R.

    If you like reading code, the Protovis code is very nicely laid out and makes elegant use of Javascript's quirky set of language features.

    Browsing genomes

    I hacked up a quick test, which is (what else?) a genome browser. Looks pretty good for a dirt-simple hundred or so lines of code. Note that my quick hack loads up about 8MB of data, which will take some time over slow connections.

    One bummer is that Protovis doesn't work in current versions of IE. Still, it works nicely in Firefox and Safari and is especially snappy in Chrome. It sounds as if IE support might happen soon.

    For more on Protoviz, check out Robert Kosara's A Protovis Primer.

    Grab bag of Javascript and Visualization links

    Wednesday, December 09, 2009

    Distilling Free-Form Natural Laws from Experimental Data

    Eureqa software implements the symbolic regression technique described in this paper:

    Schmidt M., Lipson H. (2009) "Distilling Free-Form Natural Laws from Experimental Data," Science, Vol. 324, no. 5923, pp. 81 - 85.

    Monday, November 30, 2009

    Design Patterns 15 years later

    Design Patterns 15 Years Later: An Interview with Erich Gamma, Richard Helm, and Ralph Johnson was recently featured on Lambda the Ultimate.

    Some say design patterns are just work-arounds for the defects of C++. The paper Essential Programming Paradigm argues that design patterns occur because the programming paradigm disallows certain run-time composition of dynamic and static code. The GoF authors confirm that their design patterns fit object-oriented languages, and arise specifically from experience with C++ and Smalltalk, so are tied to language of implementation. "Design patterns eventually emerge for any language. Design déjà-vu is language neutral." Different design patterns may be emerging for dynamic languages or for functional languages.

    They discuss the development of more design patterns beyond the 23 examples chosen for the Design Patterns book. Eric Gamma suggest some sort of collective intelligence approach for editing design patterns and rating their importance and applicability. Sounds like a good idea. Some new patterns they mention as candidates for inclusion in a revised set are: Null Object, Type Object, Dependency Injection, and Extension Object/Interface. Their new (draft) categorization of design patterns looks like this:

    They seem to have dropped several, some of which I won't miss. but why axe composite or observer? And bridge, maybe not the most useful in practise, but when I finally understood what they meant, I felt like I had accomplished something.

    Design patterns links

    Saturday, November 28, 2009

    Leroy Hood on a career in science

    ISBLeroy Hood, the founder of the Institute for Systems Biology, where I've worked for 3 years now, wrote up some career advice for scientists last year. It probably applies fairly well to any professional.

    I leave students (and even some of my colleagues) with several pieces of advice. First, I stress the importance of a good cross-disciplinary education. Ideally, I suggest a double major with the two fields being orthogonal-say, biology with computer science or applied physics. Some argue that there is insufficient time to learn two fields deeply at the undergraduate level.

    I argue that this is not true. If we realize that many undergraduate courses now taught are filled with details that are immediately forgotten after the course is finished, we must then learn to teach in an efficiently conceptual manner. As I noted above, as an undergraduate at Caltech I had Feynman for physics and Pauling for chemistry, and both provided striking examples of the power of conceptual teaching.

    Second, I argue that students should grow accustomed to working together in teams: In the future, there will be many hard problems (like P4 medicine) that will require the focused integration of many different types of expertise.

    Third, I suggest that students acquire an excellent background in mathematics and statistics and develop the ability to use various computational tools. Fourth, I argue that a scholar, academic, scientist, or engineer should have four major professional objectives: (a) scholarship, (b) education (teaching), (c) transferring knowledge to society, and (d ) playing a leadership role in the local community to help it become the place in which one would like one’s children and grandchildren to live.

    Fifth, with regard to the scientific careers of many scientists-they can be described as bellshaped curves of success-they rise gradually to a career maximum and then slowly fall back toward the base line. To circumvent this fate, I propose a simple solution: a major change in career focus every 10 or so years. By learning a new field and overcoming the attendant insecurities that come from learning new areas, one can reset the career clock. Moreover, with a different point of view and prior experience, one can make fundamental new contributions to the new field by thinking outside the box. Then the new career curve can be a joined series of the upsides of the bellshaped curve, each reinvigorated by the ten-year changes.

    Finally, science is all about being surrounded by wonderful colleagues and having fun with them, so I recommend choosing one’s science, environment, and colleagues carefully. I end this discussion with what I stressed at the beginning-I am so fortunate to have been surrounded by outstanding colleagues who loved science and engineering. Science for each of us is a journey with no fixed end goal. Rather, our goals are continually being redefined.

    ISB recently topped 300 employees and, as of early 2009, had a budget of $55 million. Dr. Hood turned 70 in 2008.

    Friday, November 06, 2009

    The Stories Networks Tell

    Dr. Carl Bergstrom gave a very cool lecture on “The Stories Networks Tell” at UW a week or two ago. Eigenfactor applies network analysis methods to mapping the citation structure of academic journals. He has a nice method of discovering modularity in networks using random walks. There are some nice flash interactive graphics on his lab's website. It's cool that he's published on economics, evolution, infectious disease, and information theory. On the theme of biology and economics:

    Saturday, September 05, 2009

    Closure

    I've heard nice things about Clojure. Stuart Halloway has a book (Programming Clojure) out through the Pragmatic Programmers. The book grew out of Stuart's Java.next series of articles. There's also a podcast.

    Clojure, like Scala, is functional programming for the JVM. Scala (which I fooled around with a little) descends from the ML/Caml/O-caml family of languages with an emphasis on pattern matching and static typing with type inference. Clojure is a dynamic Ruby-influenced dialect of Lisp specifically targeted to the JVM (hence the j). Yet another programming language on the digithead's cool-technology-I-want-to-play-with list.

    Update

    Sunday, June 21, 2009

    Semantic data in life science

    When I first saw RDF I said, "Blech!" Well, that's also what I said when I first saw HTML. But, I'm coming around to it. Sure, it's not pretty, but RDF is a graph and graphs are cool. And, we're starting to see more and more that the relational model of data works less well in some situations than it does in transaction processing, while RDF related tools are moving from vaporware to something more practical.

    With the web, a new data model is growing up which can be generalized as a graph, where nodes and edges have properties, plus indexes to quickly find sets of nodes in the graph. The web and its search engines are an instance of this pattern. As the web becomes a channel for structured data, it gets more natural to model your data like this, too. Biology has a great tradition of open data and the network is already a workhorse of modern biology. So, why not structure you data that way?

    Tim Berners-Lee, in a TED-talk on the blooming of Linked Data, points out the huge untapped potential of integrating the separate data silos distributed all over the web. Because biology was an early adopter of open data, some of its key assets are open, but poorly linked and not very programmable. Maybe the Semantic Web of Life Science will change that particularly in Systems Biology, which demands the integration of diverse types of data.

    Clay Shirky criticized the semantic web for its links to AI, and deductive reasoning, asking "What is the Semantic Web good for?" Well, maybe data integration, rather than inference, is the answer.

    Friday, June 05, 2009

    Practical semantic web

    Toby Segaran, author of the super-fun Programming Collective Intelligence, has a nice talk titled Why Semantics? available about practical semantic web. If you immediately think of jumbo shrimp or military intelligence, you're not alone. But, his talk isn't pie-in-the-sky. He explains some of the contortions commonly used to shoe-horn freeform data into relational databases, then shows that these issues are being addressed using the graph databases that are part of the semantic web effort.

    His upcoming book on the subject is Programming the Semantic Web.

    He mentions a few good resources:

    Sesame - a RDF data store
    Exhibit from MIT's Simile project
    Linking Open Data
    Geonames
    Music Brainz
    Freebase

    Wednesday, April 29, 2009

    MySQL cheat-sheet

    Well, now that MySQL is Oracle's SQL, I dunno how long this information will remain useful. But, here it goes:

    Start

    If you've installed MySQL by HomeBrew:

    mysql.server start

    otherwise...

    sudo /usr/local/mysql/bin/mysqld_safe
    cnt-z, bg

    Set root password

    mysqladmin -u root -pcurrentpassword password 'newpassword'

    Console

    mysql -p -u root

    Create DB

    create database foo;
    use foo;

    Create User

    create user 'bar'@'localhost' identified by 'some_pass';
    grant all privileges on foo.* to 'bar'@'localhost';
    grant all privileges on foo.* to 'bar'@'localhost' with grant option;

    Show Users

    select host, user, password from mysql.user;

    Shutdown

    mysqladmin -p -u root shutdown

    Load data from a table

    LOAD DATA infile '/temp/myfile.tsv' INTO TABLE my_table IGNORE 1 lines;

    You might get ERROR 13 (HY000): Can't get stat of ... caused by permissions. I get around it by giving full permissions to the file and its parent directory. See man stat for more.

    Dump and restore data

    mysqldump -p -u [user] [dbname] | gzip > [filename]
    gunzip < [filename] | mysql -p -u [user] [dbname]
    

    Docs for the mysqladmin tool and other client programs. SQL syntax docs for create table and select, insert, update, and delete.

    BTW, where a server isn't needed, I'm starting to like SQLite a lot.

    Friday, March 20, 2009

    More Hacking NCBI

    Writing scripts to interface with NCBI's web site has it's challenges. Getting data from the UCSC genome browser is simpler.

    If you need a list of complete genomes, that can be had from the NCBI Genome database. One form of list is the genlist.cgi script. The type parameter seems to be a flag that limits the list to chromosomes, plasmids, or organelle specific sequences. The name parameter seems to be there only for looks. So far, I haven't figured out how to make genlist spit out either XML or text.

    Two other scripts can produce text output, lproks and leuks.

    These two can be scripted like this using parameters like these: view=1 dump=selected p3=11:|12:Green Algae. This information is available by ftp from ftp://ftp.ncbi.nih.gov/genomes/genomeprj/. There are 3 lproks.txt files, which look to correspond to the three tabs Organism info, Complete genomes, Genomes in progress. lproks_1.txt is the one we want. There's a lot of good information in the ftp directories to plunder.

    There seems to be yet a third script: GenomesGroup.cgi. This one is linked from the Virus genomes page.

    If I really wanted to suffer, I'd look into NCBI's source. Does anyone know where the source of lproks.cgi or genlist.cgi are? Is that part of the NCBI C++ Toolkit? (which is on macports here.) Maybe it's buried in NCBI's ftp site? Maybe I should ask the NCBI Information Engineering Branch? Maybe I need to start doing something more productive!

    Wednesday, February 25, 2009

    Ruby Docs

    RubyRuby cheat sheet. Quick links to Ruby documentation on the web.

    ruby-lang.org

    Ruby-doc

    List operations

    Test membership in an Array

    Does the array contain the element?

    my_array.include?('element')

    Ruby on Rails

    Ruby QuickRef

    Pickaxe book

    21 Ruby Tricks You Should Be Using In Your Own Code

    Also see Ruby quirks and What is a Ruby code block?

    Rubular - regex tester

    Monday, February 09, 2009

    Ruby Quirks

    Ruby is a fun language, but here are a few things that tripped up this n00b whilst stumbling along the learning curve.

    Why does string[n] return an integer character code instead of the character? Well, Ruby has no such thing as a character (prior to 2.0, anyway?), so then why not a string of length 1?

    >> a = "asdf"
    => "asdf"
    >> a[0]
    => 97
    

    Either of these works, although they look a little funny:

    >> a[0...1]
    => "a"
    >> a[0..0]
    => "a"
    

    Also, a[0].chr works. The integer.chr method is described as follows:

    int.chr => string
    Returns a string containing the ASCII character represented by the receiver‘s value.
    

    It's non-obvious how to iterate through the characters of a string. The string.each_char is listed in the Ruby core 1.8.6 ruby-docs, but, confusingly, you have to require "jcode" for it to work. Maybe I'm just confused about whether core means "loaded by default" or "included in the Ruby distribution".

    Two toString methods?In place of object.toString() Ruby has two methods: to_s and inspect. When coercion to a string is required, to_s is called. Docs for inspect say this:

    Returns a string containing a human-readable representation of obj. If not overridden, uses the to_s method to generate the string.

    If, then, else, elsif

    If statements are confusing...

    if x==123 then
       puts 'wazoo'
    end
    
    # then is optional, as long as you have the line break
    if x==123
       puts 'wazoo'
    end
    

    For one liners, then is required. Or colon, if you prefer.

    if x==123 then puts 'wazoo' end
    if x==123 : puts 'wazoo' end
    

    Curly braces seem not to work at all for if statements. For more curly brace related philosophy and WTFs see this issue. DON'T DO THIS:

    if x==123 { puts 'qwer' }
    

    Finally, would someone tell all these languages that crawl out of the primordial Bourne-shell ooze that neither elif nor elsif means jack shit in the english language?!?!?! (sputter... rant... fume...)

    if x==123
      puts 'wazoo'
    elsif x==456
      puts 'flapdoodle'
    else
      puts 'foo'
    end
    

    An if statement is an expression and returns a value, but ruby also offers the good old ternary operator.

    Require vs. Load

    There are two ways to import code in Ruby, require and load. See the ruby docs for Kernel (require and load).

    Defined? and nil?

    Ruby has nil instead of null. Ok, and unlike Java's null, nil is a real object. I appreciate the difference between nil and undefined, but I wouldn't have guessed that defined? nil would return "nil". Not to be confused with a truly undefined variable, in which case defined? asdf returns nil. The pickaxe book explains the other strange return values of defined?. Then, there's nil?.

    >> asdf.nil?
    NameError: undefined local variable or method `asdf' for main:Object
     from (irb):47
     from :0
    >> asdf = nil
    => nil
    >> asdf.nil?
    => true
    

    Command line arguments

    Just the array ARGV. Not a quirk. Good.

    Return

    Return behaves oddly; to exit a script you use Kernel::exit(integer). Trying to return 1 instead causes a LocalJumpError whatever that means?? Trying to return from a code block returns from the surrounding context. That hurts my head.

    Ruby Exception Handling

    Ruby's equivalent of try-catch-finally is begin-rescue-ensure-end.

    No Boolean

    There's no Boolean class in Ruby. Instead, there's TrueClass and FalseClass. So, what type of value does the logical expression p ^ q produce? Everything has an implicit boolean value, which is true for everything except false and nil.

    List operations

    (see also Array and Enumerable)

    More