JSON Lines Files to Cloud Storage | eRetail Audit Marketplace

This guide outlines the steps required to integrate your sales, browsing, and other data into eRetail Audit Marketplace using JSON Lines files uploaded to a cloud storage bucket. This method supports high performance, scalability, and ease of use, and can be hosted on your cloud infrastructure or ours.

Data Structure and Preparation

  1. Data Format:

    • The data should be in JSON Lines format, which is well-supported by most data warehouses.

    • Each line in a JSON Lines file represents a single JSON object.

    • The data should follow our corresponding API integration specs (including at least the mandatory attributes for each data type)

  2. File Size and Compression:

    • For better performance and lower storage costs, break daily data into files of approximately 100MB each and compress the files using gzip, resulting in each gzip file being around 10-20MB.

  3. Directory Structure:

    • Organize the files in the following directory structure based on the date:

      eram/sales/YYYYMMDD/ eram/pageviews/YYYYMMDD/ eram/search_terms/YYYYMMDD/

    • Within each directory, name the files in the format:

      part_00.json.gz part_01.json.gz

    • Note: Dates in the directory refer to the dates of the events (sales, pageviews, etc.). For example, eram/sales/20240613/ will have sales data for 2024-06-13, eram/sales/20240614/ will have sales data for 2024-06-14, and so on.

Example File Tree

├── sales/
│ └── 20240613/
│ ├── part_00.json.gz
│ └── part_01.json.gz
│ ├── 20240614/
│ ├── part_00.json.gz
│ │ └── part_01.json.gz
│ └── 20240615/
│ ├── part_00.json.gz
│ └── part_01.json.gz
├── pageviews/
│ └── 20240613/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
│ ├── 20240614/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
│ └── 20240615/
│ ├── part_00.json.gz
│ └── part_01.json.gz
└── search_terms/
│ └── 20240613/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
│ ├── 20240614/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
└── 20240615/
├── part_00.json.gz
└── part_01.json.gz

Integration Steps

  1. Create the Cloud Storage Bucket:

    • Set up a cloud storage bucket (e.g., AWS S3 or Google Cloud Storage) where the JSON Lines files will be uploaded.

    • Ensure the bucket follows a consistent naming convention and structure as described above.

    • Provide us with the name and region of the bucket.

  2. Upload Data:

    • Upload the JSON Lines files to the designated cloud storage bucket on a daily basis.

    • Ensure that the files are uploaded in the specified directory structure.

    • Note: We will ingest data daily, so please ensure that data for the previous day is in the bucket by a specific time (preferably in the morning).

    • Data should remain in the bucket (at least for a reasonable time window) for possible refetches.

  3. Access Configuration:

    • To grant us access to your cloud storage bucket, set up access policies to allow read and list permissions.

    • Follow the instructions for your specific cloud provider:

      • AWS S3 Access Setup (permissions required: s3:GetObject, s3:ListBucket) - we will create and send you the ARN of the AWS user we will use to access your S3 bucket.

      • Google Cloud Storage Access Setup (permissions required: storage.objects.get, storage.objects.list)

Example JSON Lines File

{"order_id": "12345", "created_at": "2024-06-13T12:34:56Z", "products": [{"name": "Product1", "sku": "SKU1", "quantity": 1, "revenue": 100, "brand": "BrandA", "category": "Category1"}]}

{"order_id": "12346", "created_at": "2024-06-13T12:35:56Z", "products": [{"name": "Product2", "sku": "SKU2", "quantity": 2, "revenue": 200, "brand": "BrandB", "category": "Category2"}]}

Implementation Steps

  1. Bucket Policy Setup:

    • Configure your cloud storage bucket with a policy that allows read access to our AWS user or GCP service account (we will provide the needed information for the cloud provider of your choice).

    • Ensure that only Get and List permissions are granted.

  2. Data Upload:

    • Ensure that your daily data is uploaded to the cloud storage bucket following the directory structure and file naming conventions.

    • Automate the data upload process to ensure consistency and timeliness.

  3. Testing and Validation:

    • Once the integration is set up, we will validate the data access and ensure that the files are correctly formatted and accessible.

    • Conduct a test run to verify that the data is being ingested into our platform correctly.

  4. Go Live:

    • After successful testing and validation, the integration will go live.

    • Monitor the data upload and integration process to ensure smooth and continuous operation.


For any questions or assistance during the integration process, please contact our Partner Success team at partnersuccess@convertgroup.com.