tag:blogger.com,1999:blog-5964816804623588850.post4552914378876577354..comments2024-03-15T00:31:24.817-07:00Comments on Digithead's Lab Notebook: R String processingChristopher Barehttp://www.blogger.com/profile/01570188379488941406noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-5964816804623588850.post-26473058095299496502011-10-09T13:50:28.533-07:002011-10-09T13:50:28.533-07:00@Dickoa - Nice one! That looks like a very R-ish s...@Dickoa - Nice one! That looks like a very R-ish solution. I had to try it out to figure out what your code is doing - pulling one group out of the original string for each pass through the sapply. Being a stickler, I have to point out that pulling out four fields requires four applications of the regex, instead of one, but that probably won't matter in practice.<br /><br />BTW, I recently put together a quick crib-sheet on <a href="http://digitheadslabnotebook.blogspot.com/2011/08/string-functions-in-r.html" rel="nofollow">R string functions</a>.<br /><br />When R 2.14 comes out at the end of October, R's string manipulation capability is <a href="http://cbio.ensmp.fr/~thocking/papers/2011-08-16-directlabels-and-regular-expressions-for-useR-2011/2011-useR-named-capture-regexp.pdf" rel="nofollow">slated to get a big boost</a>.Christopher Barehttps://www.blogger.com/profile/01570188379488941406noreply@blogger.comtag:blogger.com,1999:blog-5964816804623588850.post-23638099812324490972011-10-09T05:41:15.250-07:002011-10-09T05:41:15.250-07:00What about this solution :
coords <- c("ch...What about this solution :<br />coords <- c("chromosome+:157470-158370", "chromosome+:158370-158450", "chromosome+:158510-159330", "chromosome-:157460-158560")<br /><br />ligne <- strsplit(coords, "\n")<br />regex <- "(.*)([+-]):(\\d+)-(\\d+)"<br /><br />results <- as.data.frame(sapply(1:4, function(x) sub(pattern = regex, replacement= paste("\\", x, sep = ""), x = coords)), stringsAsFactors = FALSE)<br /><br />results<br /><br /> V1 V2 V3 V4<br />1 chromosome + 157470 158370<br />2 chromosome + 158370 158450<br />3 chromosome + 158510 159330<br />4 chromosome - 157460 158560<br /><br /><br />I'm pretty sure that we can almost do everything python do in R may be it's not just as straightforward as the python solution sometime. But this is Rdickoahttps://www.blogger.com/profile/03289036717410396319noreply@blogger.comtag:blogger.com,1999:blog-5964816804623588850.post-66593087799702393232011-08-25T16:25:22.203-07:002011-08-25T16:25:22.203-07:00Looks like this is finally coming in R 2.14!
See:...Looks like this is finally coming in R 2.14!<br /><br />See: Toby Hocking's <a href="http://cbio.ensmp.fr/~thocking/papers/2011-08-16-useR/2011-useR-named-capture-regexp.pdf" rel="nofollow">Fast, named capture regular expressions in R 2.14</a>Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5964816804623588850.post-50565128238195359022011-08-25T15:58:28.837-07:002011-08-25T15:58:28.837-07:00Same topic addressed on stack overflow w/o any res...Same topic addressed on stack overflow w/o any resolution:<br /><br /><a href="http://stackoverflow.com/questions/952275/regex-group-capture-in-r" rel="nofollow">http://stackoverflow.com/questions/952275/regex-group-capture-in-r</a>Richard Hertznoreply@blogger.comtag:blogger.com,1999:blog-5964816804623588850.post-57371040323483804532010-11-23T06:58:50.811-08:002010-11-23T06:58:50.811-08:00Thanks for the tip, Roman.
My issue was that I di...Thanks for the tip, Roman.<br /><br />My issue was that I didn't know how to extract all the captured pieces of my original strings. gsub gives you a way to grab one field at a time, whereas most languages give you the ability to grab several captured blocks out of a single match (as in the python example). If that's possible, it would be good to know.Christopher Barehttps://www.blogger.com/profile/01570188379488941406noreply@blogger.comtag:blogger.com,1999:blog-5964816804623588850.post-549474336795719492010-11-23T03:01:48.517-08:002010-11-23T03:01:48.517-08:00What I would do is use one of the apply family fun...What I would do is use one of the apply family functions. Function to be used by apply would extract each component (using regular expressions) and stick it into a [1, 3] data frame. At the end you can arrange your result into a data frame (for instance, if you get a list of [1, 3] data frames, you can stick them together with do.call("rbind", my.list). But, there's more than a 100 ways to skin a cat...Roman Luštrikhttp://danganothererror.wordpress.comnoreply@blogger.com