Challenge - Week One: (12/29 to 1/5)

Goal:

Create a web crawler that scrapes and indexes our homepage (https://icodestuff.io) for all anchor tags using the language of your choice, though PHP is preferred. The data will need the href and the anchor text

 

Process: 

  1. Using PHP/cURL or the language of your choice, you will need to get the html contents of the page (Guzzle is an alternative)
  2. Using native PHP you will want to loop through the results create an array with the base URL as the root like so: [https://icodestuff.io] => [CONTENT]
  3.  After looping through the content and creating an array you will want to convert it to json. 
  4. Once the data is in json you will want to write the contents into a file and have it saved in a file named content.json
  5. After all the prior steps have been completed and everything works properly you will want to email challenges@icodestuff.io with your code, if in a language other than PHP you will also want to send instructions of how to run your code. 

 

Tools (PHP):

  • PHP cURL/Guzzle HTTP: Will retrieve the contents of our homepage via curl
  • DomDocument: You will be able to select specific HTML elements
  • Xampp: Launch a server on your local machine so you can run PHP files

 

Measurements:

  • Content: The json needs to be identical to ours
  • Efficiency: Make sure your code is as efficient as possible
  • Integrity: We don't tolerate cheaters
  • Simplicity: Don't overcomplicate things, keep things as simple as possible

 

Prize: 

There will be 2 winners announced a few days after the deadline. The winners will receive a shoutout on social media and will win a free shirt or mug. Remember all submissions must be submitted to challenges@icodestuff.io for your chance to win. Good luck!

 

Demo: 

In this demo I will be using https://www.youtube.com as our base URI and I will be running the code is both the command line and within the browser using PHP as my language.