Chandra Kiran Narala

FPGA Engineer & Hardware-Aware Systems Engineer

I design and optimize high-performance digital systems across FPGA, ASIC, RTL, embedded platforms, and low-latency software.

About Me

I am an engineer specializing in FPGA/ASIC design, RTL development, and performance-focused hardware-software systems. My background spans Verilog/SystemVerilog, FPGA prototyping, SRAM design, timing analysis, embedded systems, C++, Python, and low-latency data pipelines.

I am actively seeking full-time roles in FPGA design, ASIC design, hardware engineering, digital design, embedded systems, and low-latency systems.

Beyond hardware, I care about the intersection of silicon and software — building tools and systems where every nanosecond and every byte matters. I'm currently pursuing Software Engineer, FPGA Compiler Software Engineer, ASIC Design, and RTL Design roles.

Chandra Kiran Narala

Technical Arsenal

C++C#Verilog / VHDLPyTorchNumPy / PandasRedisStreamlitCadence VirtuosoTCP/IPDistributed SystemsPythonEmbedded CLow-Latency SystemsFIX ProtocolComputer ArchitectureRISC-VDSPLinuxGitCocotbHSPICEModelSimFPGAVerilogSystemVerilogRTL DesignASIC DesignSRAM DesignMBISTVivadoVitis HLSZynq-7000

Work Experience

New York University — Tandon School of Engineering

Jan 2024 - Nov 2025

SRAM Design Verification EngineerNew York, NY

  • Led a 4-person team to optimize a 256x4-bit SRAM array in 7nm FinFET, reducing power by ~20% vs 6T baseline.
  • Developed and verified SRAM testbenches using Cocotb for Python-based functional and latency testing.
  • Resolved critical timing violations for reliable read/write operations at high frequencies via STA and back-annotation.
  • Collaborated with physical design team on RTL-level fixes during timing closure; hands-on Cadence ASIC flow exposure.

Cognizant Technology Solutions

Nov 2020 - Dec 2021

Software EngineerIndia

  • Built Python-based systems handling 100K+ requests/sec, improving throughput by 15% and reducing latency by 20%.
  • Built distributed pipelines for low-latency data ingestion and transformation for operational decision-making.
  • Implemented monitoring, logging, and debugging strategies maintaining 99.9% uptime in production.
  • Developed fault-tolerant systems with retry logic and circuit breakers under high-load conditions.

Education

NYU Tandon School of Engineering

Jan 2022 - May 2024

MS in Electrical EngineeringNew York, NY

  • Computer System Architecture
  • Advanced VLSI Design
  • Advanced Hardware Design
  • Digital Signal Processing

GVPCOE

Jul 2016 - Sep 2020

BE in Electronics & Communications EngineeringIndia

  • Data Structures & Algorithms
  • System Design & Optimization
  • Computer Architecture
  • Digital Signal Processing
  • Operating Systems

Featured Projects

NYC Subway Arrival Telegram Bot

Production Telegram bot built with async Python (aiohttp + asyncio). Fetches MTA GTFS-RT feeds concurrently, with per-stage latency instrumentation, rate limiting, and stale-feed detection for production reliability.

PythonasyncioaiohttpTelegram Bot APIGTFS-RT

Ultra-Low-Latency FPGA FIX Parser

Hardware FIX protocol parser built for HFT-grade latency targets. FSM-based pipelined parsing, FIFO buffering, BRAM/ROM lookup for tag-value validation, and parallel field extraction — designed for sub-microsecond message processing on FPGA fabric.

VerilogFPGAFIX ProtocolFSMFIFOBRAMLow-Latency

C++-Based Huffman Compression System

High-performance file compression tool built using data structures and algorithms for lossless compression with memory management and efficient I/O

C++DSAHuffman Coding / EncodingMMT

Blood Pressure Monitoring System

Embedded system project using an STM32 microcontroller. Interfaced with an MPR pressure sensor to capture pulse waves. Implemented digital signal processing (DSP) filters to extract heart rate and estimate blood pressure. Displayed real-time waveforms on the built-in LCD using the LTDC controller.

CSTM32SPIDSP

Object Detection on Zynq-7000 SoC

Real-time object detection accelerated on Zynq-7000 using Vitis HLS. Offloads convolution layers to FPGA fabric, achieving significant inference speedup over ARM-only baseline with hardware/software co-design.

Zynq-7000Vitis HLSVivadoC++Hardware Acceleration

RISC-V RV32I Processor

Full 5-stage pipelined RV32I processor in SystemVerilog with hazard detection, forwarding logic, and branch stubs. Verified on Basys 3 FPGA at 100 MHz.

SystemVerilogRISC-VRV32IBasys 3Pipelining

MBIST Engine — 256x4b SRAM @ 7nm

Memory Built-In Self-Test engine for a 256x4-bit SRAM at 7nm FinFET. Implements March-C and MATS+ algorithms with full fault coverage analysis, timing verification, and automated TCL regression flows.

ASICSRAM7nm FinFETMBISTHSPICECadence VirtuosoCocotbTCL

Low-Latency Pong on Artix-7

Pong game implemented entirely in Verilog on Artix-7 FPGA with 60 fps VGA output. Custom hSync/vSync controller, sprite rendering pipeline, collision detection — zero software overhead, running purely in FPGA fabric.

VerilogArtix-7VGAFSMDigital Design

Moving Average Crossover Dashboard

Event-driven trading strategy prototype with live Alpha Vantage data, Streamlit dashboard, Sharpe ratio & Max Drawdown metrics, and Redis caching for sub-second responsiveness.

PythonStreamlitAlpha Vantage APIRedisPlotlyEvent-Driven

F1 Performance Analysis Platform

Real-time F1 telemetry dashboard integrating lap times, tyre degradation, and pit stop analytics via OpenF1 API with multi-level caching.

PythonStreamlitOpenF1 APIRedisPlotly

Get In Touch

I'm always open to discussing FPGA and hardware design roles, research collaborations, and engineering opportunities across RTL, ASIC, and low-latency systems.

Say Hello

© 2026Chandra Kiran Narala. Built with Next.js & Tailwind.