class Pandata::Scraper

Downloads a user’s Pandora.com data. A user’s profile must be public for Pandata to download its data.

Attributes

download_cb[RW]

A Proc that gets called after some data has been downloaded.

webname[R]

What Pandora uses to identify a user and it remains constant even if the user ties a new email address to their Pandora account.

Public Class Methods

get(user_id) click to toggle source

If possible, get a Scraper instance for the user_id otherwise return an array of similar webnames. @param user_id [String] email or webname @return [Scraper] a scraper object for the supplied user ID @return [Array] array of similar webnames

# File lib/pandata/scraper.rb, line 23
def self.get(user_id)
  search_url = DATA_FEED_URLS[:user_search] % { searchString: user_id }
  html = Downloader.read_page(search_url)
  webnames = Parser.new.get_webnames_from_search(html)

  if webnames.include?(user_id)
    new(user_id)
  # If user_id looks like an email and still gets a result.
  elsif webnames.size == 1 && /.*@.*\..*/ =~ user_id
    new(webnames.first)
  else
    webnames
  end
end
new(webname) click to toggle source
# File lib/pandata/scraper.rb, line 39
def initialize(webname)
  @parser = Parser.new
  @webname = webname
end

Public Instance Methods

followers() click to toggle source

Get the user’s public followers. @return [Array] identical to following

# File lib/pandata/scraper.rb, line 80
def followers
  scrape_for(:followers, :get_followers)
end
following() click to toggle source

Get the public users being followed by the user. @return [Array] array of hashes with keys:

- :name - profile name
- :webname - unique Pandora ID
- :href - URL to online Pandora profile
# File lib/pandata/scraper.rb, line 74
def following
  scrape_for(:following, :get_following)
end
likes(like_type = :all) click to toggle source

Get the user’s liked data. (The results from giving a ‘thumbs up.’) @param like_type [Symbol]

- :artists - returns an array of artist names
- :albums - returns an array of hashes with :artist and :album keys
- :stations - returns an array of station names
- :tracks - returns an array of hashes with :artist and :track keys
- :all - returns a hash with all liked data
# File lib/pandata/scraper.rb, line 51
def likes(like_type = :all)
  case like_type
  when :tracks
    scrape_for(:liked_tracks, :get_liked_tracks)
  when :artists
    scrape_for(:liked_artists, :get_liked_artists)
  when :stations
    scrape_for(:liked_stations, :get_liked_stations)
  when :albums
    scrape_for(:liked_albums, :get_liked_albums)
  when :all
    { artists: likes(:artists),
      albums: likes(:albums),
      stations: likes(:stations),
      tracks: likes(:tracks) }
  end
end

Private Instance Methods

download_all_data(url) { |html, next_data_indices| ... } click to toggle source

Downloads all data given a starting URL. Some Pandora feeds only return 5 - 10 items per page but contain a link to the next set of data. Threads cannot be used because page A be must visited to know how to obtain page B. @param url [String]

# File lib/pandata/scraper.rb, line 119
def download_all_data(url)
  next_data_indices = {}

  while next_data_indices
    html = Downloader.read_page(url)

    # Sometimes Pandora returns the same next_data_indices as the previous page.
    # If we don't check for this, an infinite loop occurs.
    # This problem occurs with tconrad.
    prev_next_data_indices = next_data_indices
    next_data_indices = @parser.get_next_data_indices(html)
    next_data_indices = false if prev_next_data_indices == next_data_indices

    url = yield(html, next_data_indices)
  end
end
get_url(data_name, next_data_indices = {}) click to toggle source

Grabs a URL from DATA_FEED_URLS and formats it appropriately. @param data_name [Symbol] @param next_data_indices [Symbol] query parameters to get the next set of data

# File lib/pandata/scraper.rb, line 139
def get_url(data_name, next_data_indices = {})
  if next_data_indices.empty?
    next_data_indices = { nextStartIndex: 0, nextLikeStartIndex: 0, nextThumbStartIndex: 0 }
  else
    next_data_indices = next_data_indices.dup
  end

  next_data_indices[:webname] = @webname
  next_data_indices[:pat] = Downloader.get_pat

  DATA_FEED_URLS[data_name] % next_data_indices
end
scrape_for(data_type, parser_method) click to toggle source

Downloads all data for a given type, calls the supplied Pandata::Parser method and removes any duplicates. @param data_type [Symbol] @param parser_method [Symbol] method to be sent to the Parser instance @return [Array]

# File lib/pandata/scraper.rb, line 91
def scrape_for(data_type, parser_method)
  results = []

  url = get_url(data_type)
  download_all_data(url) do |html, next_data_indices|
    new_data = @parser.public_send(parser_method, html)

    if new_data.kind_of?(Array)
      results.concat(new_data)
    else
      results.push(new_data)
    end

    if @download_cb
      break if @download_cb[new_data] == :stop
    end

    get_url(data_type, next_data_indices) if next_data_indices
  end

  # Pandora data often contains duplicates--get rid of them.
  results.uniq
end