Extract Urls from a remote webpage using PHP
Scraping data from website is extremely popular now a days. I have written a simple website parser class to grab all the urls from a website. Shared the class below for all to see and fun.
We will use the parser class below to extract all image sources and hyper links from a website.
Uses:
Create an instance of WebsiteParser class with a website url to get all the urls from their. And, then call getHrefLinks()
and getImageSources()
method like below to extract hyper links and image sources respectively.
i add the proxy values to curl_options array:
properties in the class:
/**
* Proxy Address
* @var int
*/
private $proxy;
/**
* Proxy Port
* @var int
*/
private $proxy_port;
constructor:
/**
* Class constructor
* @param string $url Target Url to parse
* @param string $link_type Link type to grab
* @param int $proxy proxy address
* @param int $proxy_port proxy port
*/
function __construct($url, $link_type = 'all', $proxy = 0, $proxy_port = 0) {
$this->target_url = $url;
$this->setUrls();
$this->setLinksType($link_type);
if ($proxy > 0 && $proxy_port > 0) {
$this->proxy = $proxy;
$this->proxy_port = $proxy_port;
$this->curl_options = array_merge (
$this->curl_options,
array('CURLOPT_PROXY' => $this->proxy, 'CURLOPT_PROXYPORT' => $this->proxy_port));
}
} //__construct()
ups sorry, property "proxy" is a string type!
Thank you for your comment and using the class. You can update that in to https://github.com/morshedalam/url-scraper-php. Can checkout the class from – http://www.phpclasses.org/package/8113-PHP-Parse-and-extract-links-and-images-from-Web-pages.html as well.
Thank you for your comment and using the class.
You can update that in to https://github.com/morshedalam/url-scraper-php.
Can checkout the class from – http://www.phpclasses.org/package/8113-PHP-Parse-and-extract-links-and-images-from-Web-pages.html as well.