Analytics is driving developments in software engineering these days the same way transaction processing did in the 70's and 80's. Machine learning and data mining, often of big data sets with graph topologies, and often done in the cloud, applied to fields as diverse as social networks, business intelligence and scientific computing are giving rise to new software architectures made from ingredients like map-reduce and NoSQL.
Relational databases are probably the best example of real engineering in software - predictable performance based solidly on theory. But all tools are shaped by the problem they were designed to solve. For relational databases, that problem was transaction processing, as exemplified by the ATM network and the airline reservation system. Typically in these types of apps, small amounts of data are relevant to any given transaction and the transactions are algorithmically uncomplicated. The challenge comes from supporting masses of concurrent updates.
In contrast, imagine the kinds of questions Walmart might want to ask its data.
Who are credit-worthy sports fans with high-end buying habits who haven't recently purchased big-screen TVs?
Find home owners with kids within an age range whose spending patterns are uncorrelated with the business cycle and who have a history of responding to promotions.
What products should be in the seasonal section? Where in the store should that section be located?
The shapes of these questions are very different from traditional online transaction processing. Data for analytics is written once, updated only additively, and mined for patterns. And flexibility is a big deal as new types of data are required. As problem and solution become increasingly mismatched, friction increases. So, it's good that people are experimenting with alternatives.
If OLTP shaped relational databases, it might not be stretching too far to say that object-oriented programming was shaped by graphical user interfaces. At least, the design patterns book is chock-full of GUI examples. So, it's interesting to note the success of the functional-programming inspired map-reduce pattern for distributed computing, as part of Google's search. And search is just a query on unstructured data. These days, map-reduce is being applied to all sorts of problems, particularly in machine learning.
In scientific computing, the data deluge is turning to data-driven science as more data becomes available for analyzing and more research questions are being addressed by machine learning. Even computer science is changing to due to big data.
If you like programming with linked data, these are interesting times. There's a lot of creativity swirling around the intersection if distributed computing and storage, big data, machine learning and data mining. Someone once said, “Shape your tools or be shaped by them.” So, after years of being shaped by the transaction processing toolkit, it's refreshing to see a new generation of software tools being shaped by analytics.
- Analytics: The Unreasonable Effectiveness of Data
- Map/Reduce in academic papers
- Map-Reduce for Machine Learning on Multicore
- Cloud computing with Hadoop
- The End of Theory: The Data Deluge Makes the Scientific Method Obsolete
- The Data Deluge
- Beyond the Data Deluge