JSON Lines Files to Cloud Storage | eRetail Audit Marketplace

This guide outlines the steps required to integrate your sales, browsing, and other data into eRetail Audit Marketplace using JSON Lines files uploaded to a cloud storage bucket. This method supports high performance, scalability, and ease of use, and can be hosted on your cloud infrastructure or ours.

Data Structure and Preparation

  1. Data Format:

    • The data should be in JSON Lines format, which is well-supported by most data warehouses.

    • Each line in a JSON Lines file represents a single JSON object.

    • The data should follow our corresponding API integration specs (including at least the mandatory attributes for each data type)

  2. File Size and Compression:

    • For better performance and lower storage costs, break daily data into files of approximately 100MB each and compress the files using gzip, resulting in each gzip file being around 10-20MB.

  3. Directory Structure:

    • Organize the files in the following directory structure based on the date:

      eram/sales/YYYYMMDD/ eram/pageviews/YYYYMMDD/ eram/search_terms/YYYYMMDD/

    • Within each directory, name the files in the format:

      part_00.json.gz part_01.json.gz

    • Note: Dates in the directory refer to the dates of the events (sales, pageviews, etc.). For example, eram/sales/20240613/ will have sales data for 2024-06-13, eram/sales/20240614/ will have sales data for 2024-06-14, and so on.

Example File Tree

eram/
├── sales/
│ └── 20240613/
│ ├── part_00.json.gz
│ └── part_01.json.gz
│ ├── 20240614/
│ ├── part_00.json.gz
│ │ └── part_01.json.gz
│ └── 20240615/
│ ├── part_00.json.gz
│ └── part_01.json.gz
├── pageviews/
│ └── 20240613/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
│ ├── 20240614/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
│ └── 20240615/
│ ├── part_00.json.gz
│ └── part_01.json.gz
└── search_terms/
│ └── 20240613/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
│ ├── 20240614/
│ ├── part_00.json.gz
│ ├── part_01.json.gz
└── 20240615/
├── part_00.json.gz
└── part_01.json.gz

Integration Steps

  1. Create the Cloud Storage Bucket:

    • Set up a cloud storage bucket (e.g., AWS S3 or Google Cloud Storage) where the JSON Lines files will be uploaded.

    • Ensure the bucket follows a consistent naming convention and structure as described above.

    • Provide us with the name and region of the bucket.

  2. Upload Data:

    • Upload the JSON Lines files to the designated cloud storage bucket on a daily basis.

    • Ensure that the files are uploaded in the specified directory structure.

    • Note: We will ingest data daily, so please ensure that data for the previous day is in the bucket by a specific time (preferably in the morning).

    • Data should remain in the bucket (at least for a reasonable time window) for possible refetches.

  3. Access Configuration:

    • To grant us access to your cloud storage bucket, set up access policies to allow read and list permissions.

    • Follow the instructions for your specific cloud provider:

      • AWS S3 Access Setup (permissions required: s3:GetObject, s3:ListBucket) - we will create and send you the ARN of the AWS user we will use to access your S3 bucket.

      • Google Cloud Storage Access Setup (permissions required: storage.objects.get, storage.objects.list)

Example JSON Lines File

{"order_id": "12345", "created_at": "2024-06-13T12:34:56Z", "products": [{"name": "Product1", "sku": "SKU1", "quantity": 1, "revenue": 100, "brand": "BrandA", "category": "Category1"}]}

{"order_id": "12346", "created_at": "2024-06-13T12:35:56Z", "products": [{"name": "Product2", "sku": "SKU2", "quantity": 2, "revenue": 200, "brand": "BrandB", "category": "Category2"}]}

Implementation Steps

  1. Bucket Policy Setup:

    • Configure your cloud storage bucket with a policy that allows read access to our AWS user or GCP service account (we will provide the needed information for the cloud provider of your choice).

    • Ensure that only Get and List permissions are granted.

  2. Data Upload:

    • Ensure that your daily data is uploaded to the cloud storage bucket following the directory structure and file naming conventions.

    • Automate the data upload process to ensure consistency and timeliness.

  3. Testing and Validation:

    • Once the integration is set up, we will validate the data access and ensure that the files are correctly formatted and accessible.

    • Conduct a test run to verify that the data is being ingested into our platform correctly.

  4. Go Live:

    • After successful testing and validation, the integration will go live.

    • Monitor the data upload and integration process to ensure smooth and continuous operation.

Support

For any questions or assistance during the integration process, please contact our Partner Success team at partnersuccess@convertgroup.com.