JavaScript minification bakeoff dataset

Steve Gribble -- gribble (at) cs (dot) washington.edu (dot) edu

This graph shows the distribution of script sizes within the dataset. The blue bars represent embedded scripts, the olive bars represent libraries, and the combined stack bars represent all scripts.
This page lets you browse or download the set of pages I used to do the JavaScript minification bakeoff study. With this data, you can reproduce my results or generate new analyses of your own.

The dataset consists of 179 pieces of JavaScript that were manually classified as being not minified. These JavaScript piees were taken from a large-scale, broad Web crawl; I seeded my crawl with 1,000 pages from the DMOZ open directory and gathered over 25,000,000 HTML pages, CSS files, and JavaScript files. From these, I gathered 179 pieces of unique JavaScript, where uniqueness was tested by SHA1 hash. The selected JavaScript consists of a mixture of JavaScript libraries (i.e., HTTP objects consisting entirely of JavaScript) and embedded JavaScript scripts extracted from within HTML files.

Some statistics about the corpus:

Browsing or downloading the dataset

Here's the link to the full dataset:

scripts_dataset.tar.gz
Within that tar.gz file, you'll find a separate directory for each kind of minification I measured, and both minified and gzipped-minified JavaScript files within them.

Alternatively, you can browse the JavaScript using the following links. Each link will show you one of the 179 scripts, and it will allow you to switch between different minified versions of the scripts.

If the script is a library, the URL is displayed. Alternatively, if the script is embedded within an HTML file, the URL is mangled to show the URL of the HTML file itself as well as which script number the embedded script is. For example, consider the following two mangled URLs:

embed:0/http://www.seaportboston.com/
embed:6/http://www.imdb.com/name/nm0000576/videogallery
The first indicates that the script is the first embedded script within the HTML page retrieved from http://www.seaportboston.com/. The second indicates that the script is the seventh embedded script within the HTML page retrieved from http://www.imdb.com/name/nm0000576/videogallery.

Here's the full list of URLs:

embed:2/http://adventisthealth.staywellsolutionsonline.com/
http://12702.hittail.com/mlt.js
http://www.supthemag.com/wp-content/plugins/flexible-lightbox/js/lightbox_call.js
http://www.intelecare.com/scripts/scriptaculous/effects.js
http://www.joomlatune.com/forum/Themes/default/fader.js
embed:20/http://mac.ign.com/objects/081/081994.html
embed:10/http://www.seaofshoes.typepad.com/
embed:2/http://www.sherry-lehmann.com/
http://www.indylaw.indiana.edu/_Assets/js/nav.js
http://digitadiko.com/modules/productscategory/productscategory.js
embed:6/http://www.seaofshoes.typepad.com/
http://adventisthealth.staywellsolutionsonline.com/printpage.js
embed:1/http://commons.apache.org/dbutils/apidocs/org/apache/commons/dbutils/handlers/package-summary.html
embed:6/http://bossip.com/317811/who-looked-more-bangin-nicki-minaj-cassie-or-amber-rose/cassie-nicki-amber-3/
embed:9/http://7d.blogs.com/blurt/2012/01/usdas-new-plant-hardiness-zone-confirms-that-vermont-is-getting-warmer.html
http://chapters.americanalpineclub.org/southernappalachian/wp-content/themes/Inclined/js/jquery.http.js
http://media.ticketmaster.com/en-us/js/d7e56591808f05f670e846a62de9e4f7/omniture_tracker.js
http://www.sleepproducts.org/_navigation/ICMenuBar.js
http://www.susanrodgersdesigns.com/js/thumbnail_scroller/jquery.thumbnailScroller.js
embed:0/http://www.seaportboston.com/
http://www.the311store.com/js/mage/translate.js
http://www.intc.com/common/scripts/sh_scripts.js
embed:6/http://dart.fine-art.com/NewGenPosts.asp
http://cn.dancesportinfo.net/JavaScript/AdapterUtils.js
http://forum.lupa.cz/reklama/sklik/sklik.js
http://www.jordansportraits.com/wp-content/plugins/powerpress/player.js
http://hometracks.nascar.com/files/js/js_7cf00579d7a8e199e4e4e345394fa274.js
http://baonshop.ru/public/js/shop/order.js
http://www.seoegghead.com/cms_js/AC_RunActiveContent.js
http://www.searo.who.int/en/lof/jquery.easing.js
http://www.sei.cmu.edu/commonspot/ns-resize.js
embed:9/http://forums.androidcentral.com/introductions/156792-hey-yall.html
embed:6/http://ma.tt/2003/01/bday/
http://www.sfaf.org/components/print/printer-friendly.js
embed:13/http://www.shirogami520.deviantart.com/
http://diku.dk/scripts_js/functions.js
http://media.charlotteobserver.com/static/scripts/mi/mi_script_scheduler.js
embed:3/http://meta.wikimedia.org/wiki/English_Wikipedia_anti-SOPA_blackout
embed:6/http://lasvegas.citysearch.com/bestof/categories
http://avro.apache.org/docs/current/skin/getBlank.js
http://www.servicist.com.sg/wp-content/plugins/google-analytics-for-wordpress/custom_se_async.js
http://blog.mysanantonio.com/spursnation/wp-content/plugins/pluck_comments/js/commentcount.js
http://www.aptonline.org/aptweb.nsf/corner.js
http://www.italianmoda.com/_new_scripts/modalframe/tmt_jquery_modalframe.js
http://www.knobhallwinery.com/wp-content/plugins/event-calendar/xmlhttprequest.js
http://libraries.tigris.org/branding/scripts/alm.js
http://listonart.us1.list-manage.com/js/jquery.form.js
http://www.joyreplicawatch.com/templates/newwatches/js/moo.js
http://www.the311store.com/js/varien/form.js
http://letters.ocregister.com/wp-content/plugins/democracy/democracy.js
http://www.sfcritic.com/wp-content/themes/mimbo/js/dropdowns.js
embed:13/http://lastfmpresents.radio.com/2011/12/20/last-fm-presents-the-wombats/
embed:6/http://www.imdb.com/name/nm0000576/videogallery
http://libraries.tigris.org/branding/scripts/tigris.js
http://www.site5.com/js/hoverIntent.js
http://appexchange.salesforce.com/resource/1328914730000/sharedlayout/js/reportAbuse.js
embed:9/http://www.sortmusic.com/
http://bluepoolroad.com/js/prototype/validation.js
http://www.soe.com/js/cookie.js
http://www.ajalon.net/utilities.js
http://avro.apache.org/docs/current/skin/fontsize.js
embed:8/http://blogs.discovermagazine.com/80beats/2012/02/16/quantum-dots-can-get-nearby-neurons-firing/comment-page-1/
http://www.theanand.com/wp-content/plugins/lifestream/lifestream.js
http://www.aofta.org/wp-content/plugins/dynamic-content-gallery-plugin/js-mootools/scripts/jd.gallery_1_2_4_4.js
http://ads.investingchannel.com/adtags/pragcap/ros/125x125.js
embed:21/http://dictionary.reference.com/browse/Drop+dead%21
embed:0/http://www.the-alist.org/
http://www.shual.com/2008/wp-content/themes/Shual/js/AC_RunActiveContent.js
http://drh.img.digitalriver.com/DRHM/Storefront/Site/turbine/cm/multimedia/OT_files/turbine_HomePage_contentBody.js
http://www.the311store.com/js/mage/cookies.js
http://www.selkehats.com/images/mm_menu.js
http://contribute.ctpost.com/ver1.0/content/direct/scripts/pork.iframe.js
embed:9/http://local.healthcommunities.com/101_Popular_Topics_Alameda_CA-t6776_Alameda+CA.html
http://blog.mysanantonio.com/spursnation/wp-content/plugins/pluck_comments/js/pluckcomments.js
http://www.statista.com/assets/200042fb/prototype-1.6.0.3/prototype.js
http://www.skimuseum.net/javascript/dropdown.js
http://www.bbb.org/canada/wwwroot/js/client.min.js
http://applywifi.com/wp-content/themes/yabloggy/jquery.ifixpng.js
embed:7/http://twitchfilm.com/news/2011/12/ridley-scott-teases-the-prometheus-trailer.php
http://mizgomez.8m.com/fs_img/js/md5.js
http://www.innovibe.co.il/blog/wp-content/themes/journalcrunch/js/twittercb.js
http://austin.ynn.com/Scripts/NonCachedRedirect.js
http://chapters.americanalpineclub.org/alaska/wp-content/themes/Inclined/js/jquery.altrows.js
http://www.shb.umn.edu/templates_v5/lib/js/searchfield.js
embed:4/http://www.silive.com/entertainment/dining/index.ssf/2012/01/butter_on_steroids.html
http://www.italianmoda.com/_new_scripts/getbrowsersize.js
embed:13/http://www.infoplease.com/
http://www.actualtest.ca/design/validator.js
http://guitarsandallthatjazz.homestead.com/~site/javascript/sc_v1/s_code.js
embed:11/http://www.imdb.com/name/nm0000576/videogallery
http://www.slowfood.com/_2010_inc_sito/com/_core/scripts/jquery_plugins/cookie/jquery.cookie.js
http://blog.mozilla.com/security/wp-content/js/webtrends/webtrends-v0.1.js
http://ard.bmj.com/site/js/publisher_custom_tower.js
http://trumplin.com/templates/AlgDown/js/dropdowntabs.js
http://www.theclassicalshop.net/js/animatedcollapse.js
http://www.subhub.com/sites/default/files/js/js_AHyBAYZMC0aoLby7IhzDcekyKo2__10bztcN6PSiv_g.js
http://image.teacup.com/js/ImageReport.js
http://www.sei.cmu.edu/commonspot/pagemode/always-include-ns.js
http://magnetic-separation.msimagnets.com/plp/js/PlpCookie.js
http://bigbrowser.blog.lemonde.fr/wp-content/themes/common/js/miaas.js
http://www.bbb.org/baton-rouge/scripts/cbbb-google-translate.js
http://www.jefflinsky.com/rw_common/themes/hv_xxl/scripts/sliding_effect.js
embed:0/http://laurahoney.rockz.com/
http://www.singaporetgcc.com/sites/all/themes/stgcc2011/mainslide/js/jquery.galleryview-3.0.js
http://www.smashfly.com/Resources/Shared/scripts/initWidgets.js
http://www.sportsbasement.com/searchside/js/iepngfix_tilebg.js
http://cdnsecurity.clbmedia.dgtlpub.com/2008/2008-12-31/window_control.js
http://www.spanishcourthotel.com/hotel/wp-content/themes/spanishcourt/js/jquery.innerfade.js
http://www.sephone.com/themes/main/js/jquery.innerfade.js
http://www.shoptrudeau.com/a/j/soft_add.js
http://www.sliceofscifi.com/wp-content/plugins/democracy/democracy.js
embed:2/http://blogs.cisco.com/author/TomGillis/
http://ceasefiremagazine.co.uk/wp-content/themes/News/scripts/slider.js
http://avro.apache.org/docs/current/skin/breadcrumbs.js
http://www.intel.com/about/sitewide/js/global.js
http://www.jefflinsky.com/contact_files/stacks_page_page4.js
http://www.somarts.org/js-global/FancyZoom.js
http://www.adserver.rfidjournal.com/adx.js
http://www.slate.com/features/xmldump/vpgame/src/prototype.js
http://mizgomez.8m.com/fs_img/js/overlay.js
embed:12/http://blog.kobayashi.eu/
embed:2/http://www.seedlingmusic.com/
embed:2/http://ameblo.jp/1394-1995/
http://www.share.org/js/dnncore.js
http://f.camp8.org/BuiltTheme/nature_bliss/ce329b55/scripts/BonaPrint.js
http://auto-repair.helium.com/spresources/javascripts/adcode.js
http://dnn506yrbagrg.cloudfront.net/pages/scripts/0006/8617.js
embed:6/http://autos.aol.com/car-finder/
http://www.sitepronews.com/wp-content/plugins/obsocialbookmarker/include/ajax/scriptaculous.js
http://listonart.us1.list-manage.com/js/jquery.validate.js
http://www.bampfa.berkeley.edu/code/js/bampfaccordion.js
http://www.smashfly.com/js/dnncore.js
http://www.studenthomes.eu/slimbox.js
http://www.smiledesigndental.com/Scripts/superfish/supersubs.js
embed:16/http://www.amazon.co.uk/
http://www.sportbettingsystemreview.com/wp-content/plugins/tweetmeme/button.js
http://ca2.php.net/userprefs.js
embed:7/http://www.askmen.com/adele/picture-2.html
embed:12/http://leatherbag.bandcamp.com/
http://eu.joliprint.com/joliprint/js/popin/joliprint-popin.js.jspz
http://da.dancesportinfo.net/JavaScript/MenuAdapter.js
http://www.sonatype.com/extension/sonatype/design/sonatype_com/javascript/elqNow/elqImg.js
http://www.kisforkids.org/wp-content/plugins/google-calendar-events/js/gce-script.js
embed:4/http://community.invisionpower.com/blog/1174/entry-5791-ipboard-320-dev-update-calendar-improvements-part-i-seo-improvements/
embed:6/http://boston.cityvoter.com/bbq-town/biz/12727
http://www.jimmybruno.com/forum/styles/prosilver/template/forum_fn.js
embed:4/http://betterthansexnyc.wordpress.com/2008/07/26/day-35-perfecting-the-surprise-kiss-from-the-1930s-is-bts/
http://cache.boston.com/universal/js/bcom_ttd_redframe.js
http://www.searo.who.int/AC_RunActiveContent.js
http://archaeology.about.com/library/games/bl_quiz45.js
http://mac.brothersoft.com/softjs/popwindow.js
embed:4/http://br.wikipedia.org/wiki/1207
embed:1/http://lb.wikipedia.org/wiki/Apichatpong_Weerasethakul
http://www.sei.cmu.edu/commonspot/pagemode/always-include-common.js
http://blog.radioleft.com/_static/js/ajaxian_comments.js
http://www.somarts.org/js-global/FancyZoomHTML.js
embed:20/http://answers.reference.com/Digital/Web/how_to_meet_new_friends
embed:6/http://www.inetgiant.ca/userads/679175
http://www.artinfo.com/sites/default/files/js/js_IDBX5SzkJ9gGNq7x-qOE_2DZsexqguTJQGMKvi4w-Uw.js
http://www.smithsoldebar.com/wp-content/themes/that-music-theme/tabber.js
http://www.septa.org/site/js/ui.core.js
embed:0/http://www.sharesomecandy.com/
http://batonrouge.bbb.org/WWWRoot/js/Cibr3/Cibr3.js
embed:17/http://moneyland.time.com/2007/10/19/dr_watson_is_racism_in_your_dn/comment-page-1/
http://adventisthealth.staywellsolutionsonline.com/jq-actions.js
http://www.slb.com/js/superfish.js
http://www.studiokimvallee.com/analytics/spanalytics.js
http://www.sitesell.com/ssjs/common/util_DEVELOPMENT.js
http://www.sitesell.com/cookie.js
http://fibroaction.healthunlocked.com/assets/snowdrop/js/jquery.cookie.js
http://ard.bmj.com/site/js/publisher_custom.js
embed:2/http://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki
http://animalbehaviorsociety.org/portal_javascripts/Free%20Arch%20Theme/register_function.js
http://mendelson.4t.com/fs_img/js/set_homepage.js
embed:4/http://dictionary.reference.com/browse/-istic
http://www.spaceportamerica.com/media/system/js/caption.js
embed:15/http://baboonsguide.blogspot.com/
embed:4/http://www.searanchlodge.com/
http://amsabuyersguide.com/check_rfi.js