Big Data assists e-commerce sites


The explosion of e-commerce and the management of a growing number of products each month make it difficult to maintain an accurate vision of their competitiveness.
How can a buyer be instantly given the information they need to evaluate and renegotiate supplier offers in a very short time?

Profitability not necessarily guaranteed

Today, the consumer is just one click away from comparing offers with the competition. The e-commerce sector is very competitive, but not necessarily profitable: 87% of e-commerce sales are made by only 5% of the 182,000 merchant sites listed in 2018, with only 68% of sites declared profitable (source Fevad).

An instant vision of their positioning

These merchant sites must each month manage hundreds of thousands of references (adding, deleting, updating, promotions…).

Having a precise vision of its offer compared to that of its competitors becomes a real challenge, as much with respect to the complexity of the watch to be performed as with the importance of the reactivity of having to adjust its offer.

Two main challenges thus emerge: the volume of data and the recognition of identical offers, regardless of the merchant site and hence the structure of the online offer.

In order to respond to this, Octopeek has developed software on a Big Data platform for an international pureplayer specializing in retail pricing. This solution interrogates the e-shops of the referenced competitors (including international) through collection tools. The data is then stored in a Big Data infrastructure hosted in France and managed by Octopeek. The data is sourced, structured and the competitive offers of the same products are matched. The results are made available to the customer via an intuitive and self-service interface.

A true Big Data project

To qualify as a Big Data project, it is necessary that the 3V rule be addressed (Volume, Velocity, Variety). At Octopeek, our experience in this field leads us to consider 6V (Volume, Velocity, Variety, Veracity, Value, Visualization).

As part of this e-commerce project, we respond to 6V.

  • Volume: Processing hundreds of thousands of new references every month.
  • Velocity: Ability to process information very quickly.
  • Variety: Lack of a standardized data format between different sources.
  • Veracity: Need for control of prices and different declarative fields.
  • Value: Give value to data and improve performance.
  • Visualization: Allow a visualization of the data at a time t.

A Big Data infrastructure in 4 steps

The architecture of the solution adopted is separated into four parts:

  1. Collection of distributed data and ingestion into Octopeek databases.
  2. Reconciliation of data and provision of qualified data in our databases.
  3. Machine Learning / Analytics module to explore the data and integrate new use cases.
  4. Feedback module via HMI interfaces (Human Machine Interface) for the interrogation of this data.

Dataflows are implemented via Apache NiFi and are used to feed the databases used by the price watch application. The new data is first fed into an Apache Hive database that allows HDFS archiving. This database is also used for the reconciliation and standardization of data from different sources (via a landing zone). Spark ML (Machine Learning) modules are employed on this data to answer particular uses of the end customer. For example, to match products distributed across multiple sites based on NLP (natural language processing) and matching algorithms.

The “gold” data is injected and made available in datamarts dedicated to the customer.

Another NiFi workflow is responsible for creating the ElasticSearch indexes that will serve the web application via API. Using ElasticSearch as a business database allows advantage to be taken of its query speed (low latency) and the search engine (Lucene). It includes features such as auto-completion, automatic correction and management of synonyms.

An intuitive Big Data tool for users

The important part of the solution implemented by Octopeek is the restitution of the data via an intuitive and easy to use web application, validated by customer business teams The aim is that a simple tutorial allows a user to quickly adapt to the tool. In addition, ElasticSearch datamarts allow the use of Kibana to visualize the data. The customer’s technical teams therefore have DataViz and configurable and customizable dashboards.

The tool, used in self-service mode, allows “non-experts” of Big Data to harness its power and improve their productivity and performance through artificial intelligence.