Choke points in the data supply chain

Data has no inherent value. To be useful, data must flow to agents who will ultimately process, analyze, and synthesize it to produce information that drives decisions. The recent conversation in DoD has focused on what is referred to as the “big data problem,” that is, since we don’t know what’s important in the data being collected, everything must be saved. But this is much harder than it sounds.

The problem can be summed up best with an example. One ARGUS wide-area EO sensor collects approximately 6 PB of data in a 24-hr period. The following graphic gives you an idea of how much data we are talking about.

How much data is a petabyte? (courtesy of Mozy.com)

Data is ubiquitous, storage is commoditized, comms are precious

As daunting as the data explosion might seem, it has been accompanied by a dramatic increase in the supply of mass data storage solutions that leverage “cloud” architectures. The “big data problem” actually has less to do with data storage than it does with transporting the data, that is, moving data from the edge (where it’s collected) to the core (where it’s stored).

The key constraint in the data storage equation is comms. And comms doesn’t scale. To my knowledge, there are no near term, brute force solutions that alleviate this constraint. Wireless data links provide nowhere near the throughput required of today’s military data sources (e.g. ARGUS), and expeditionary operating environments don’t lend themselves to the installation of physical pipes.

Save all the data?

The current state of the art allows us to save all of the data (more or less), but it doesn’t allow us to move all of the data from the edge to the cloud. It’s clear that we need to start thinking about this problem in a different way…

Data is not important. It’s the information that can be gleaned from the data that matters. The old data paradigm emphasizes precision: save only what you consider to be relevant at the time the data is collected. This approach works only so long as you are dealing with a more or less static context where “relevance” can be readily established.

In the contemporary threat environment, operational context is constantly changing. Since relevance can’t be established a priori, the natural inclination is to save all of the data based on some indeterminate future value. But “data hoarding” only works so long as the means to aggregate massively distributed data actually exists.

Analytics at the edge

It may be within the realm of the possible to save all of the data, but it’s not possible to move all of that data around. This realization has led the community to consider approaches that aggregate metadata (i.e. data that describes the underlying data sets). Such approaches provide a valuable window into the distributed data inventory but fail to address the problem of leveraging the aggregate data to produce information.

A smarter solution is to process data at the edge to derive feature vectors that describe the information contained in the data. More processing (and not just static data storage) at the edge supports rapid indexing, correlation, and fusion of data to establish the rich contextual relationships between data sets along with the spatial, temporal, phenomenological derivatives that capture the underlying dynamics of the data. Rather than “storing everything,” such an approach enables the community to “exploit everything” and store only what is needed.

Processing at the point of collection is the key idea underwriting Mav6′s Service Oriented Horizontal Information Exchange (SOHIX), which is the computational backbone of the Blue Devil Block II Payload Integration Infrastructure (PII). Leveraging SOHIX and the parallelized SOHIX data processing architecture, we are turning raw sensor data into actionable information that can be disseminated and accessed over conventional air-to-ground data links.

7 Responses to Choke points in the data supply chain

  1. Pingback: Every Day, Army’s Panopticon Drone Will Collect 80 Years’ Worth of HD Video | JLD Express Shopping

  2. Pingback: Every Day, Army’s Panopticon Drone Will Collect 80 Years’ Worth of HD Video - My Story live | My Story live

  3. Information overload equals pattern recognition.
    The geographic background is a relatively static data base.
    That data is your 1st lens of exclusion.
    Since buildings don’t move (except in the case of earthquakes & reaper drones) you need to collect & analyze all hive activity/motion. That’s the valuable data.
    Human activity data is what you are trying to gather.
    Where are the bad guys, where did they come from & where are they going ?

    SOHIX is the right approach. ArgusPanoptes2012 needs some help from the locals. I suggest the DoD freely distribute “Freedom Box ” plug-in servers on the ground to promote “improved tele-communications infrastructure” with a back door big enough to aggregate real-time intel of value. That would give you a 1st lens of inclusion.

    But the real challenge still falls on that GUI on the arm of the edgefighter.
    The real time synthesis of all relevant intel to catch or kill the bad guys will succeed only if the edgefighter’s batteries are charged up, the dispaly screen is readable in sunlight & they have access to plenty of bandwidth. That last item is another justification for “Freedom Box” deployment. A good open source mesh network will invite “black hat” bad guy hackers for sure but it could also help support our guys on the ground. And who knows it might empower a few locals to drop anonymous tips to ~that’s right~ both sides. Ahh the double edged sword of Open Source Intel.

    Keep up the good work. I am very impressed with the recent updates on BlueDevil’s progress. Disclaimer : I have no affiliation with “Freedom Box”.
    I just see it as an inevitable element in the social matrix that will be utilized in the third world. Think “Arab Spring” and One Laptop per Child.

  4. Correction to my response: DoD should procure the Open Source low power mesh servers aka “Freedom Box” and some NGO like Green Crescent or OLPC should distribute them along with some solar panels & LED lighting packages.

  5. Pingback: Every Day, Army’s Panopticon Drone Will Collect 80 Years’ Worth of HD Video | Today's Defense

  6. Pingback: Every Day, Army’s Panopticon Drone Will Collect 80 Years’ Worth of HD Video | Search The Earth Blog

  7. Pingback: Big Data Takes to the Sky « The Non-State Unlimited

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s