In the zettabyte era, per-flow measurement becomes more challenging for the data center owing to the increment of both traffic volumes and the number of flows. Also, the swiftness of detection of anomalies (e.g., congestion, link failure, DDoS attack, and so on) becomes paramount. For fast and accurate traffic measurement, managing an accurate working set of active flows (WSAF) from massive volumes of packet influxes at line rates is a key challenge. WSAF is usually located in high-speed but expensive memory, such as TCAM or SRAM, and thus the number of entries to be stored is quite limited. To cope with the scalability issue of WSAF, we propose to use In-DRAM WSAF with scales, and put a compact data structure called FlowRegulator in front of WSAF to compensate for DRAM's slow access time by substantially reducing massive influxes to WSAF without compromising measurement accuracy. To verify its practicability, we further build a per-flow measurement system, called InstaMeasure, on an off-the-shelf Atom (lightweight) processor board. We evaluate our proposed system in a large scale real-world experiment (monitoring our campus main gateway router for 113 hours, and capturing 122.3 million flows). We verify that InstaMeasure can detect heavy hitters (HHs) with 99% accuracy and within 10 ms (detection is faster for heavier HHs) while providing the one million flows record with only tens of MB of DRAM memory. InstaMeasure's various performance metrics are further investigated by the packet trace-driven experiment using one-hour CAIDA dataset, where the target of measurement was all the 78 million L4 flows for one-hour.