available for new roles

ASAD IKRAM

data_engineer chevening_scholar 6+ yrs xp 300+ projects
📍 Lahore, Pakistan

I architect adversarial scraping infrastructure that defeats every anti-bot system, and build cloud-native data pipelines that turn raw chaos into clean, trusted warehouses powering real business decisions.

pipeline.live
scroll
0+Years Exp
0+Projects
0M+International Comapnies
0+Scrapers Built
0+Sites Crawled
About

Data at enterprise scale

I'm a Senior Data Engineer with 6+ years turning messy, blocked, or non-existent data sources into clean, scalable pipelines businesses actually trust.

My specialty is adversarial scraping — building infrastructure that defeats Akamai, Cloudflare, Datadome and every other anti-bot system — combined with the engineering depth to build proper layered data warehouses on top.

I've built platforms for M+C Saatchi, Dubizzle, Fix.com and 50+ global clients. Currently running ArtemisAI — a social data intelligence platform for brands.

🏆
Chevening Scholar 2024–2025 Top 2–3% of 70,000+ applicants across 160 countries
Fully funded by UK Foreign & Commonwealth Development Office
hover any tag · click skill tabs below
Scrapy Playwright TLS Evasion PySpark Redshift LangChain ECS Fargate Airflow Cloudflare Bypass RAG Pipelines Snowflake Kubernetes Elasticsearch dbt curl_cffi OpenAI API
Skills

The full stack

click a tab to explore all skills
🕷️Scraping
🛡️Anti-Bot
⚙️ETL / ELT
🗄️Warehousing
☁️Cloud AWS
🐳DevOps
💾Databases
🤖ML / AI
📊BI
💻Languages
Web scraping frameworks and extraction tooling — 6+ years, 400+ sites, 50M+ records
Scrapy
Scrapy Cluster
Scrapyd
Playwright
Puppeteer
Selenium
BeautifulSoup
Splash
Apify
curl_cffi
httpx
requests
Scrapy-Playwright
lxml / parsel
mechanize
Crawlee
Fingerprint evasion, CAPTCHA solving, API recon — defeating Akamai v3, Cloudflare Turnstile, Datadome, PerimeterX, Kasada, F5
TLS / JA3 / JA4 Spoofing
curl_cffi
cycle-tls
GoLogin
undetected-chromedriver
playwright-stealth
rebrowser-patches
2Captcha
Puzzle Solving
Proxy Rotation
Akamai v3 Bypass
Cloudflare Turnstile
Datadome Bypass
PerimeterX
Kasada
F5 Shape Security
Bright Data
Oxylabs
Smartproxy
NetNut
Charles Proxy
Burp Suite
HTTP Toolkit
Android Studio MITM
Hidden REST / GraphQL
JS Bundle Deobfuscation
ETL/ELT pipelines, orchestration and streaming — production pipelines processing hundreds of millions of records
PySpark
AWS Glue
dbt
Matillion
Apache Airflow
AWS Step Functions
EventBridge
Apache Spark
EMR Serverless
Apache Kafka
AWS Kinesis
SNS / SQS
Great Expectations
Pandas
Polars
NumPy
Apache Flink
Data warehouse design and modelling — RDL → ODL → ADL layered architecture, star & snowflake schemas
Amazon Redshift
Snowflake
Google BigQuery
Apache Hive
Hadoop HDFS
Star Schema
Snowflake Schema
Dimensional Modeling
RDL → ODL → ADL
Bronze / Silver / Gold
Data Lakehouse
Redshift Spectrum
AWS Athena
Apache Parquet / ORC
Delta Lake
AWS cloud-native infrastructure — full stack from Lambda to EKS, end-to-end pipeline ownership
ECS Fargate
EKS (Kubernetes)
AWS Batch
Lambda
ECR
S3
Athena
AWS Glue
Kinesis
SNS / SQS
CloudWatch
CloudFormation
EventBridge
SageMaker
QuickSight
DynamoDB
EMR Serverless
Step Functions
DevOps, CI/CD and infrastructure automation — containerised, automated, production-grade deployments
Docker
Kubernetes
GitHub Actions
Scrapyd
Scrapy Cluster
CloudFormation
Terraform
CI/CD Pipelines
Git
Linux / Bash
pytest
Datadog / CloudWatch
Relational, NoSQL and search — from OLTP Postgres to Elasticsearch full-text search
PostgreSQL
MySQL
MS SQL Server
MongoDB
DynamoDB
Firebase
Elasticsearch
ELK Stack (Kibana)
Redis
SQLite
Machine learning, NLP and AI pipelines — from RAG and vector search to custom LLM fine-tuning
LangChain
OpenAI API
RAG Pipelines
HuggingFace
Vector Databases
BERT
GPT Fine-tuning
LSTM
XGBoost
LightGBM
AWS SageMaker
VADER (Sentiment)
spaCy
TextBlob / NLTK
Topic Modeling
Sentiment Analysis
Business intelligence and dashboards — from raw pipeline output to C-suite reporting
Tableau
Power BI
Amazon QuickSight
Sisense
Periscope Data
Retool
Kibana
D3.js
Matplotlib
Plotly
Programming languages and core frameworks — Python-first, polyglot where it matters
Python
SQL (advanced)
JavaScript / Node.js
Go (Golang)
Bash / Shell
Django
FastAPI
Flask
React
REST APIs
GraphQL
HTML / CSS
Experience

Where I've built things

click any role to expand
M+C Saatchi Fluency
Data Engineer Consultant · UK Remote
click to expand Dec 2024 – Present +
  • Architected end-to-end AWS pipeline (GitHub Actions → ECR → ECS Fargate → EventBridge → Step Functions → S3 → AWS Glue → Athena) from zero to production in under 30 days
  • Delivered ETL/ELT pipelines + SageMaker NLP sentiment analysis powering marketing campaigns for Amazon, Nike, Ford, Reckitt and the UK Government
  • Eliminated 100% of manual data collection via Scrapy with CAPTCHA bypass, reverse-engineering, proxy rotation and TLS fingerprint evasion
  • Reduced infrastructure costs 35% with auto-scaling ECS Fargate container architecture replacing legacy VMs
  • Built real-time CloudWatch monitoring + Slack alerting with Retool data freshness dashboards; maintained 99%+ pipeline uptime
ECS FargateStep FunctionsSageMakerScrapyAWS GlueAthenaNLP
Founder & Lead Data Engineer · UK
click to expand Feb 2025 – Present +
  • Built a Meta social data lakehouse from scratch — GitHub Actions → ECR → ECS Fargate → DynamoDB → S3 — ingesting 30+ brand pages daily at scale
  • Designed layered Redshift DWH (RDL → ODL → ADL) orchestrated by Step Functions; sub-second KPI reporting via QuickSight
  • Deployed ECS-hosted ML sentiment-tagging + topic-labeling microservices; packaged ADL outputs as B2B subscription datasets
  • Trained custom LLMs on engagement data; predictive trend models improved client content monetization by 40%
  • Implemented self-healing infrastructure — EventBridge dead-letter queues, CloudWatch alarms and Twilio WhatsApp alerts — zero manual intervention
  • Multi-tenant Redshift access layer with row-level security; secure B2B data sharing across 10+ client accounts
RedshiftLangChainCustom LLMsECSDynamoDBQuickSightStep Functions
Dubizzle Labs
Senior Data Engineer (Promoted ×2) — Dubizzle · Bayut · Zameen · OLX · Pakistan
click to expand Feb 2022 – Aug 2024 +
  • Led team owning 500+ regional web scrapers across MENA on EKS — GitHub Actions CI/CD, Matillion ELT / AWS Glue (S3 → Athena → Redshift)
  • Redesigned layered DWH (RDL → ODL → ADL) for Propforce; improved ETL/ELT throughput 60%, directly driving record revenues
  • Built competitor intelligence platform using DWH + scraping; modelled Star Schema, Snowflake Schema and Dimensional warehouse schemas
  • Developed Elasticsearch data pipelines for property search; optimised query performance 50% via custom index mapping and sharding
  • Reverse-engineered competitor mobile apps via Android Studio MITM (HTTP Toolkit + Charles Proxy) to extract hidden REST and GraphQL APIs
  • Introduced unit testing + Scrapy contract validation; cut production scraping incidents 45%
  • Promoted twice in 2 years: Associate SWE → Data Engineer → Senior Data Engineer
EKSScrapyElasticsearchMatillionRedshiftAWS GlueMobile MITM
VendueTech
Lead Data Engineer · Ireland Remote · Contract
click to expand Jun – Jul 2025 +
  • Led 3-engineer team building real-time auction data ingestion using Scrapy, Playwright, AWS Batch and Step Functions; bypassed CAPTCHA and anti-bot stacks
  • Migrated full infrastructure Azure → AWS; PySpark on EMR Serverless reduced data latency 40% across Bronze, Silver and Gold layers
  • Designed idempotent pipeline with DynamoDB deduplication checkpoints ensuring exactly-once processing at high throughput
PySparkEMR ServerlessPlaywrightAWS BatchDynamoDB
Fix.com
Data Engineer Consultant · Canada Remote
click to expand May 2023 – Present +
  • Designed Scrapy + Scrapyd pipelines processing 50M+ data points from 200+ e-commerce sites; expanded product catalog 50% in 12 months
  • Optimised distributed Scrapy Cluster on AWS ECS; cut crawl time 40% via async I/O tuning, connection pooling and smart retry logic
  • Implemented residential proxy rotation across Bright Data, Oxylabs and Smartproxy with CloudWatch burn-rate monitoring and auto-failover
  • Introduced Great Expectations data validation; automated schema drift detection, reducing bad data in production by 70%
Scrapy ClusterProxy RotationAWS ECSGreat ExpectationsMS SQL
Prefe
Data Engineer Consultant · Italy Remote
click to expand Sep – Dec 2024 +
  • Rebuilt legacy scripts into modern Scrapy-based architecture with middleware, item pipelines and retry logic; reduced scraper failure rate 60%
  • Set scraping best practices and code review standards adopted across the full team; mentored 2 junior engineers on anti-bot evasion techniques
ScrapyMiddlewareAnti-BotMentoring
CXG
Freelance Data Engineer · Dubai Remote
click to expand Dec 2025 – Jan 2026 +
  • Built and maintained production crawlers for global luxury fashion brands; bypassed advanced anti-bot protections including Akamai v3 and Cloudflare Turnstile
  • Introduced pytest-based automated QA quality-check framework; achieved 99%+ data completeness across all active crawlers
Akamai v3Cloudflare TurnstilepytestLuxury Brands
Fiverr / Independent Consulting
Senior Freelance Data Engineer · Level 2 Seller · 5★ · 300+ Projects
click to expand Dec 2018 – Feb 2026 +
  • Delivered 300+ web scraping and data engineering projects for global clients across 400+ websites worldwide
  • Bypassed Akamai, Cloudflare, Datadome, PerimeterX, Kasada and F5 Shape Security via TLS/JA3 evasion, GoLogin, proxy rotation and behavioural mimicry
  • Deployed full-stack data products: Scrapy → ECS Fargate → S3 → Redshift DWH → Tableau / QuickSight dashboards for 50+ international clients
  • Deployed AWS-native scrapers, ECS pipelines, Django web tools and browser extensions; mobile API recon via Android Studio emulator MITM
TLS EvasionGoLogincurl_cffiAnti-BotMobile MITMDjango
Projects

Featured work

click a card to read more
CLICK TO EXPAND 🌐
ArtemisAI
Social data intelligence platform for brands — Meta data lakehouse, ML sentiment tagging, B2B subscription datasets.
ECS FargateRedshiftLLMsStep FunctionsQuickSight
  • GitHub Actions → ECR → ECS Fargate → DynamoDB → S3 ingesting 30+ brand pages daily
  • Layered Redshift DWH (RDL → ODL → ADL) with sub-second KPI reporting
  • Custom LLMs on engagement data — improved client monetization 40%
  • Self-healing EventBridge + Twilio WhatsApp alerts — zero manual ops
  • → artemisai.co.uk
CLICK TO EXPAND 🛡️
Adversarial Scraping Toolkit
Production-grade stealth infrastructure defeating Akamai v3, Cloudflare Turnstile, Datadome, PerimeterX, Kasada and F5 Shape Security — <1% block rate at scale.
curl_cffiScraplingCamoufoxGoLoginTLS/JA3/JA4
  • TLS fingerprint spoofing: curl_cffi and cycle-tls for JA3/JA4 impersonation — browser-identical TLS handshakes without spinning up Chrome
  • Scrapling: next-gen Scrapy alternative with built-in stealth, async-first design and smart element finding resistant to DOM changes
  • Camoufox: hardened Firefox fork for anti-detect automation — randomised canvas, WebGL, AudioContext, font and screen fingerprints per session
  • playwright-stealth + rebrowser-patches: headless Chrome evasion — removes all automation signals from navigator, plugins and runtime APIs
  • GoLogin + undetected-chromedriver: full browser profile management with persistent identity pools across sessions
  • Mobile API recon via Android emulator MITM: Android Studio + HTTP Toolkit + Charles Proxy to intercept encrypted app traffic, extract hidden REST and GraphQL endpoints, and reverse-engineer request signing logic from deobfuscated JS bundles
  • 2Captcha + puzzle solvers: automated CAPTCHA bypass including reCAPTCHA v2/v3, hCaptcha, Turnstile, image puzzles
  • Residential/mobile proxy rotation — Bright Data, Oxylabs, Smartproxy — with burn-rate monitoring and auto-failover
CLICK TO EXPAND 🚗
Ford LENS Auto-Healer
AI-powered self-healing scraping platform — Claude autonomously detects spider failures, diagnoses root causes, patches code and redeploys. Zero human intervention.
Django DRFFastAPI MCPClaude SonnetECS FargateAurora Serverless
EventBridge Weekly schedule ECS Fargate Scrapy spiders Django API Logs + status Aurora PG Serverless v2 failure detected ↓ CloudWatch Alert + trigger FastAPI MCP 9 tool endpoints Claude Sonnet Agentic loop generates fix ↓ GitHub Branch Isolated PR commit ECS Re-trigger Auto re-run Normal flow Failure path AI heal loop → Full architecture docs ↗
CLICK TO EXPAND 📦
Amazon CORE ETL Pipeline
End-to-end social media ETL pipeline — ingesting multi-platform data across 8 channels into a structured warehouse powering major brand campaign analytics.
AWS LambdaS3AWS GlueAthenaStep FunctionsSageMaker NLP
Sprinklr API 8 platforms Lambda 1-3 Ingest + map IDs S3 Raw Master table CSV Lambda 4-6 Filter + tag + bench SageMaker NLP Sentiment + topics PostgreSQL RDS Campaign analytics Step Functions orchestration CloudWatch + Slack Retool Dashboards GitHub Actions CI/CD 99%+ uptime · 35% cost saving · 30 days to prod
CLICK TO EXPAND 🏢
Dubizzle Labs Data Platform
Enterprise-scale data warehouse and scraping platform across MENA — 500+ scrapers, layered DWH, Elasticsearch search pipelines and competitor intelligence.
EKSMatillion ELTRedshiftElasticsearchScrapy
COLLECTION Scrapy Fleet 500+ spiders on EKS S3 Raw Landing zone Matillion ELT Glue + Athena WAREHOUSE RDL — Raw ODL — Ops ADL — Biz SEARCH Elasticsearch Full-text search 50% query perf gain INTELLIGENCE Competitor Intel MITM + API recon Tableau · Sisense · Periscope · Retool BI dashboards — 15+ stakeholders 60% ETL throughput gain · promoted 2x in 2 years
CLICK TO EXPAND
Pakistan Energy Crisis Monitor
Live real-time dashboard tracking Pakistan's oil reserve depletion, fuel prices, load-shedding map and IMF economic impact — second-by-second data, no refresh needed.
AWS CloudFrontReal-time DataWeb ScrapingData VizGeospatial
  • Live strategic reserve depletion counter draining at 4.82 barrels/second in real time
  • Fuel price tracker — petrol, diesel, LPG with full history since the Feb 2026 Hormuz blockade
  • Lahore load-shedding map — area-by-area LESCO outage schedule with live feeder status
  • IMF economic dashboard — GDP growth, inflation, current account deficit all tracked live
  • Scrolling live news ticker with breaking energy crisis updates
  • Deployed on AWS CloudFront for global low-latency delivery
  • → Live dashboard ↗
AI Assistant

Ask anything about me

Powered by Gemini AI — ask about my experience, skills, availability, or anything else. Free to use, no login needed.

Enter to send · open full chat ↗
Contact

Let's build
something real

Open to senior data engineering roles, scraping contracts and pipeline architecture consulting. Based in Lahore, Pakistan. Response within 24 hours.

Asad_Ikram_Resume_2026.pdf · 2 pages · updated Mar 2026
🤖 Ask AI