Posted By

noah on 08/18/08


Tagged

curl javascript html DOM diff dhtml wget tidy selenium selenium-rc analysis rwget rendered source


Versions (?)

Who likes this?

2 people have marked this snippet as a favorite

umang_nine
webstic


Rendered WGet with Selenium


 / Published in: Ruby
 

URL: http://ajax.sys-con.com/node/507034

Created in response to a discussion about "ghosting," between Kord Campbell of Splunk and Christian Heilman of Yahoo! at Ajax World 2008.

IMPORTANT: The Selenium-RC server must be running on port 4444 (the default) and you must have Curl and Tidy installed on your system.

NOTE: Diffing the rendered versus the "server" source. This option works OK as a learning tool, but I need to do more in terms of normalizing the server source versus the rendered source. I run both the "server" and innerHTML sources through Tidy, but unfortunately there still seems to be a lot of extraneous differences between them.

So while this works OK for downloading the rendered source via a Ruby script, I've got a ways to go before it can produce a reliable "rendered diff."

keywords: rwget, rwdiff, ruby, selenium rc, selenium remote control, examples

  1. #!/usr/bin/env ruby
  2. #== Synopsis
  3. # Time-stamp: <[c:/noah/n_s/tools/foo_tool/selenium-rc-tests/rwget.rb] was last modified by Noah Sussman at 13:46:48 on 2008.07.22 on 5M8DLC1-NYO. (Serious Cat) v1.3>
  4. # Based on the demo code packaged with Selenium-RC: 10:24:44 PM EST on Saturday, March 22 2008
  5. # Rendered WGet and Rendered-Versus-Server-Source diff
  6. # Get rendered HTML for a DHTML page and optionally compare it with the HTML stored on the server.
  7. #
  8. #== Examples
  9. # Get the rendered HTML from site.com
  10. # rwget site.com
  11. #
  12. #== Usage
  13. # rwget [options] <http url>
  14. #
  15. #== Options
  16. # -d, --diff diff the rendered source agains the server source.
  17. #
  18. #== Author
  19. # Noah Sussman (noah@onemorebug.com)
  20. #
  21. #== Copyright
  22. # Copyright (c) 2008 Noah Sussman under the MIT License:
  23. # http://www.opensource.org/licenses/mit-license.php
  24.  
  25. require 'open3'
  26. require 'rdoc/usage'
  27. require 'uri'
  28. #require '~/Documents/n_s/tools/foo_tool/selenium-rc-tests/selenium.rb'
  29. require 'selenium'
  30.  
  31. #page = ARGV[0]
  32. #click_on_id = ARGV[1]
  33.  
  34. def rendered_wget (list)
  35. #First arg is shifted off, any remaining args are assumed to be IDs and get clicked before the source is grabbed.
  36. page = list.shift()
  37. unless page =~ /^http:\/\//
  38. page = "http://" + page
  39. end
  40. page_url = URI.parse(page)
  41. remote_host = page_url.scheme + "://" + page_url.host
  42. @selenium = Selenium::SeleniumDriver.new("localhost", 4444, "*firefox", remote_host, 10000);
  43. # @selenium = Selenium::SeleniumDriver.new("localhost", 4444, "*iexplore", remote_host, 10000);
  44. @selenium.start
  45. @selenium.open(page)
  46. @selenium.wait_for_page_to_load(5000)
  47. for id in (list)
  48. @selenium.click(id)
  49. end
  50. src = @selenium.get_html_source
  51. @selenium.stop
  52. return src
  53. end
  54.  
  55. if (ARGV.length == 0)
  56. RDoc::usage('usage')
  57. elsif (ARGV[0] =~ /^-?-d/)
  58. #diff rendered vs. server source
  59. tidy_rendered, tidy_server = ""
  60. ARGV.shift() #No more need for the -d option now that we know it was passed.
  61. server_src = `curl -s #{ARGV[0]}`
  62. rendered_src = rendered_wget ARGV #corrupts ARGV
  63. Open3.popen3('tidy ') { |stdin, stdout, stderr|
  64. stdin.puts rendered_src
  65. stdin.close_write #without this the script will hang
  66. tidy_rendered = stdout.read
  67. }
  68. rendered_tmp = File.open("rwget_rendered.tmp", "w");
  69. rendered_tmp.puts tidy_rendered
  70. rendered_tmp.close
  71. Open3.popen3('tidy ') { |stdin, stdout, stderr|
  72. stdin.puts server_src
  73. stdin.close_write #without this the script will hang
  74. tidy_server = stdout.read
  75. }
  76. server_tmp = File.open("rwget_server.tmp", "w");
  77. server_tmp.puts tidy_server
  78. server_tmp.close
  79.  
  80. diffs = `diff -u rwget_server.tmp rwget_rendered.tmp`
  81.  
  82. puts diffs
  83.  
  84. `rm rwget_rendered.tmp rwget_server.tmp`
  85.  
  86. #How do I diff 2 buffers without dumping to tmp files?
  87. else
  88. #print rendered source
  89. puts rendered_wget(ARGV)
  90. end

Report this snippet  

You need to login to post a comment.