Thursday, July 11, 2013

Generate UUIDs in R

Here a snippet of R to generate a Version 4 UUID. Dunno why there wouldn't be an official function for that in the standard libraries, but if there is, I couldn't find it.


## Version 4 UUIDs have the form:
##    xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
##    where x is any hexadecimal digit and
##    y is one of 8, 9, A, or B
##    f47ac10b-58cc-4372-a567-0e02b2c3d479
uuid <- function(uppercase=FALSE) {

  hex_digits <- c(as.character(0:9), letters[1:6])
  hex_digits <- if (uppercase) toupper(hex_digits) else hex_digits

  y_digits <- hex_digits[9:12]

  paste(
    paste0(
      sample(hex_digits, 8, replace=TRUE),
      collapse=''),
    paste0(
      sample(hex_digits, 4, replace=TRUE),
      collapse=''),
    paste0(
      '4',
      paste0(sample(hex_digits, 3, replace=TRUE),
             collapse=''),
      collapse=''),
    paste0(
      sample(y_digits,1),
      paste0(sample(hex_digits, 3, replace=TRUE),
             collapse=''),
      collapse=''),
    paste0(
      sample(hex_digits, 12, replace=TRUE),
      collapse=''),
    sep='-')
}
}

View as a gist: https://gist.github.com/cbare/5979354

Note: Thanks to Carl Witthoft for pointing out that my first version was totally broken. Turns out calling sample with __replace=TRUE__ greatly expands the possible UUIDs you might generate!

Carl also says, "In general, as I understand it, the value of UUID codes is directly dependent on the quality of the pseudo-random number generator behind them, so I’d recommend reading some R-related literature to make sure “sample” will be good enough for your purposes."

This sounds wise, but I'm not sure if I'm smart enough to follow up on it. It could be that the randomness of these UUIDs is less than ideal.