GET /api/v2/video/130
HTTP 200 OK Vary: Accept Content-Type: text/html; charset=utf-8 Allow: GET, PUT, PATCH, HEAD, OPTIONS
{ "category": "Kiwi PyCon 2009", "language": "English", "slug": "robert-coup----me-wants-it--scraping-sites-to-get", "speakers": [ "Robert Coup" ], "tags": [ "api", "html", "kiwipycon", "kiwipycon2009", "rest", "scraping", "web" ], "id": 130, "state": 1, "title": "Robert Coup - /me wants it. Scraping sites to get data.", "summary": "", "description": "/me wants it. Scraping sites to get data.\n\nPresented by Robert Coup\n\nAbstract\n\nBuilding scrapers for grabbing data from websites. Tools, techniques, and\ntips.\n\nOutline\n\nLife would be so much easier if the data contained in websites was available\nraw via APIs. Alas, until that mythical day comes we either need to deal with\nunhelpful people via email and phone, or just get it ourselves. Python has\nsome great tools available to help with building scrapers and for parsing and\nformatting the data we get. Starting off with the basics - tracking what needs\nto be done, making web requests, parsing HTML, following links, and\nextricating data from Excel and PDF documents. Our scraper needs to be\nresilient against too-clever content management systems, Frontpage-era HTML,\nand plain dodgy data. We may need to pass through logins and other messiness.\nThere are some techniques and tips for approaching the problems and keeping\nyour solution flexible and as simple as possible. We'll discuss some scrapers\nbuilt for New Zealand data, and introduce a new project from the NZ open\ngovernment data group to provide a RESTful interface to scrapers - effectively\ncreating a nice API where there isn't one.\n\nSlides: [\ndata](\n\n[VIDEO HAS ISSUES: Sound and video are poor. Slides are hard to read.]\n\n", "quality_notes": "", "copyright_text": "Creative Commons Attribution-NonCommercial-ShareAlike 3.0", "embed": "", "thumbnail_url": "", "duration": null, "video_ogv_length": null, "video_ogv_url": null, "video_ogv_download_only": false, "video_mp4_length": null, "video_mp4_url": null, "video_mp4_download_only": false, "video_webm_length": null, "video_webm_url": null, "video_webm_download_only": false, "video_flv_length": null, "video_flv_url": "", "video_flv_download_only": false, "source_url": "", "whiteboard": "", "recorded": null, "added": "2012-02-23T04:20:00", "updated": "2014-04-08T20:28:25.899" }