Project Title: Scraping Amazon.com Product Details

Project Description: We need assistance with a data scraping project.

Project overview:

We need to have several amazon.com product details scraped (see list below). The data from the scrapes needs to be formatted to Excel spreadsheets (see attached examples) and provided to us to process and upload to our website and the other websites/channels that we sell products on. This has been and will be an ongoing project. Our current database of scraped products contains approximately 1.1 million items. We would like the scrape to happen every two weeks and the data from the scrape to be provided every two weeks (example: every other Monday). The primary reason for the scrapes are to (one) gather new product data and (two) to maintain our existing data for price changes and product availability changes.

Details:

Website to be scraped: amazon.com

Product details to be scraped (see attached screenshots):

1. Product detail page url.

2. Product title.

3. Product description.

4. Manufacturers model number.

5. Product ASIN number.

6. Product selling price.

7. Product brand/manufacturer name.

8. Product dimensions.

9. Product shipping weight. (all weights below 1 pound need to be rounded up to 1 pound)

10. Product category/navigation string.

11. Product image URL.

12. Amazon best sellers rank.

13. Product features.

14. Primary Product Category (example: Appliances)

15. Shipping charge (Most items scraped have free shipping; however, many of the items that we need scraped have a shipping charge. If a scraped product has a shipping charge then we need to know what that shipping charge is. See the attached screenshot named “ShippingCharge”.

Please note that the scraping process must be able to gather the information listed above regardless of the variations of the product detail page layouts on amazon.com.

Primary Amazon Product Categories Scraped:

Appliances

Arts Crafts & Sewing

Automotive (only the following subcategories)

-Exterior Accessories

-Interior Accessories

Baby

Beauty

Cell Phones & Accessories

Computers

Electronics

Grocery & Gourmet Food

Health & Personal Care

Home & Kitchen

Musical Instruments

Office Products

Patio, Lawn & Garden

Pet Supplies

Sports & Outdoors

Tools & Home Improvement

Toys & Games

Watches

Filters:

Only items between $20.00 and $850.00 USD (filter applied during the scraping process)

Additional Filters:

We do not want:

-Products that show options for Color/Size/Unit/Etc (see attached screenshot named “Options”)

-Automotive products that show option Select your vehicle (see attached screenshot named “SelectYourVehicle”)

-Products from External Websites that do not display shipping price/info (see attached screenshot named “NoShippingInfo”)

-Products with no product image available

-Products out of the $20 to $850 price range

-Products without a selling price displayed

-Products without a title displayed

-Products that are only available as USED

Issues We Have Encountered:

1. Shipping Weight Not Displayed: Some products do not show a shipping weight (#9 from the list above) displayed on the page. For those items we need the term “No Ship Weight” for the data instead of a number.

2. Category Not Displayed: Some products do not show a category (#10 from the list above) displayed on the page. For those items we need the navigation string as it would be if you were navigating from the Amazon home page.

Example:

Department

• ‹ Arts, Crafts & Sewing

• ‹ Art Supplies

• ‹ Art Paper

• Easel Pads

In the provided data it would read: Arts, Crafts & Sewing>Art Supplies>Art Paper>Easel Pads

3. Changes on Amazon: We need to work with a company that can stay on top of Amazon’s periodic changes so we don’t receive missing or inaccurate product data.

Additional Tasks:

1. Comparison process: All products scrapped every two weeks would need to be compared to our currently uploaded products (two sets of similar products for one account that we call MMP and the other that we call EPB). From this comparison process we will need to know the following:

-What products are new (not in our current database of products). We will then process these new items and uploaded them to our website and the other websites/channels that we sell products on.

-What product in our currently uploaded products have price changes. We will then upload the revised prices to our website and the other websites/channels that we sell products on.

-What products should be removed from our currently uploaded products because they are not longer available on amazon.com, they are temporarily out of stock, or they have been revised on amazon.com so that one or more of our filters from the list above would be applied to them. (example: a product has been revised on amazon and now it is available with a color, or size option.)

2. Image file download, rename and upload: The images files for items scraped every two weeks that are determined to be new, (not part of our currently uploaded products) as described above in the comparison process, need to be downloaded, renamed (any random number for the name), then uploaded to two of our image servers. As mentioned above, we have two sets of similar products for one account that we call MMP and the other that we call EPB. Each account, MMP and EPB, have the product images hosted on two different image servers. The new image urls will need to be provided in the new product data that you provide.

3. Data formatting: I have attached sample files to this email so that you can review new product data that we currently receive and the comparison result files that we currently receive.

-Watches_RAW DATA EXAMPLE: This file is an example of the new product data that we currently receive.

-RES Removes_EXAMPLE: This is an example of the comparison results file that we receive after the products from the most recent data scraped are compared to our currently uploaded products. This file shows the products that should be removed and the reason for removal (see red text in column D). We will provide this file prior to each data scrape and expect it to be returned with the proper notations after the data scrape and comparison process.

-RES Price Increases_EXAMPLE: This is an example of the comparison results file that we receive after the products from the most recent data scraped are compared to our currently uploaded products. This file shows the current prices for the products that are not to be removed and what the current prices are (see red text in column D and column E). It also shows the current shipping charge for those products that have a shipping charge (see red text in column AG). We will provide this file prior to each data scrape and expect it to be returned with the proper notations after the data scrape and comparison process.

Please note that these 3 example files have been reduced in size/number of items so that they may be emailed.

Summary:

In this ongoing project we would like to develop a long term relationship with a company that can provide us with affordable and accurate product data in a timely manner. This company must be able to adjust to any changes that we require as we move forward. This company must not only be a service provider, but also be a company that can provide recommendations for improvements to the process. We will also have a variety of side projects that we will need assistance with.

Please email me as soon as possible with your proposal, quote, or any questions that you may have.

For smililar work requirement feel free to email us on info@webscrapingexpert.com