Ruby Regexp weirdness
/(\w+([-‘.]\w+)*)/) To match words including ones with hyphens and apostrophes (O’Rielly Ruby Cookbook recipe 1.9)
The pattern makes sense, but if it’s used directly like this:
[ruby]
irb(main):127:0> string.downcase.scan(/\w+([-]\w+)*/) { | a,b| puts”a#{a} b#{b}” }
a b
a b
a b
a-fleury b
a b
=> “hello there sally de-fleury hello”
[/ruby]
Nothing gets put in the first pattern match. However if it’s done as an extension to the string class all is fine.
[ruby]
irb(main):135:0> class String
irb(main):136:1> def word_display
irb(main):137:2> downcase.scan(/(\w+([-‘.]\w+)*)/) { |word, ignore| puts”1 #{word} 2 #{ignore}” }
irb(main):138:2> end
irb(main):139:1> end
=> nil
irb(main):140:0> string.word_display
1 hello 2
1 there 2
1 sally 2
1 de-fleury 2 -fleury
1 hello 2
1 he-lo-there 2 -there
=> “hello there sally de-fleury hello he-lo-there”
[/ruby]
Why the difference? The second match is spurious, and only really there so it can have a star after it to match multiply hyphenated words. Got round it by adding ?: which suppresses the storing of the match:
[ruby]
irb(main):145:0> string.downcase.scan(/\w+(?:[-‘,]\w+)*/)
=> [“hello”, “there”, “sally”, “de-fleury”, “hello”, “he-lo-there”]
[/ruby]
You must be logged in to post a comment.