The golden rule for an e-trader is to be the most competitive on the prices of items and to manage their promotions. E-commerce players have to justify the real drop in prices on their special offers or sales. Ditto for event sales specialists. To meet this need, Octopeek has developed a price watch solution based on a Big Data platform.

Online sales sites offer hundreds of thousands of articles. Some are general e-commerce sites with all types of categories: clothing, shoes, care products, furniture, small household appliances, food, etc. while others are sites that specialize in a range of products. All put several sales on line every day. Thousands of products and therefore prizes must be verified by the pricing and purchase teams as part of promotional sales*. And they must do this in an extremely short time, sometimes less than a week before the launch of these sales. Distributors only have a few days to research and check several hundreds of prices on e-shops and the e-commerce sites of brands. These brand sites are the only ones to show the famous Recommended Retail Price or RRP of the article when it comes onto the market.
How can you find these references within the time limit set at time T? How can you mobilize the necessary internal resources in record time while preserving your margin? The workload is colossal and extremely time-consuming for distributors.

Construction of the Big Data platform and price watch software

To meet this need, Octopeek has developed price watch software on a Big Data platform. The Octopeek solution queries the e-shops of suppliers and manufacturers of products to gather the necessary information. The data is stored in our Big Data infrastructure. This is sourced and structured and can be exploited and interrogated by the customer in self-service mode.

The architecture of the chosen solution is divided into 3 parts:

  1. Data recovery
  2. Data ingestion
  3. Data restitution

The dataflows are implemented via Apache NiFi. Python scripts take care of collecting the data and depositing it in directories monitored by MiNiFi (minimalist NiFi processes). As soon as a new file is dropped into the target directory, the MiNiFi ingests it and sends it to the central NiFi cluster. The NiFi cluster implements the different dataflows that feed the databases used by the price watch application. The data is first fed into an Apache Hive database that allows both archiving (under HDFS) and one-off reporting.

Using ElasticSearch as a business database

Another NiFi workflow is responsible for creating the ElasticSearch indexes that will serve the web application via API. Using ElasticSearch as a business database allows you to take advantage of its query speed (low latency) and the search engine (Lucene). This has a search engine for the database including features such as auto-completion, automatic correction and management of synonyms.

Finally, an important part of the solution implemented by Octopeek is the restitution of data via an intuitive and easy-to-use web application, validated by customer business teams. The aim is that a simple tutorial allows a user to quickly adapt to the tool.


*In 2017, 19 e-commerce companies were fined by the Direction Générale de la Concurrence, de la Consommation et de la Répression des Fraudes – DGCCRF (Directorate General for Competition, Consumption and the Prevention of Fraud) for a total amount of 2.4 million Euros. 116,000 establishments and 11,000 websites were audited.   A first in France, accompanied by requests for documents and prerequisites.  15,000 reports have been drawn up, and not only distributors are included. The brands themselves via their own promotions on their e-shop sites have had to justify an effective price reduction compared to their initial RRP (Recommended Retail Price). The same treatment goes for the airline sector, where advertised promotions on airline tickets were not consistent with the RRP.


Learn more about the Octopeek offers:
Contact us about our price watch solution: or 09 53 73 74 74