Tag Archive

You are currently browsing the tag archive for the ‘Data Mining’ tag.

Curation – Amplifier or Condenser?

July 19, 2012 in Curation, Information Lifecycle Management, Social Media | Tags: browsing, Content, Curation, Data Mining, Definition, Information Lifecyle Management, Marketing, Networking, Pinterest, poll, Sharing, Signal, Social media, Social Network, Streams, visit | 24 comments

In a very short time curation has evolved from a minor supporting role to a major or even leading role in Social Media engagement. It is no longer sufficient to just share items of interest, breaking news and opinion, not if you want to be regarded as authentic and taken seriously.

Knowledge Condenser

Curation has many definitions, including my own: “Curation is the acquisition, evaluation, augmentation, exhibition, disposition and maintenance of digital information, usually centered around a specific topic or theme”. The Digital Curation Center (DCC) in the United Kingdom puts it more succinctly

Digital curation, broadly interpreted, is about maintaining and adding value to a trusted body of digital information for current and future use. (DCC)

Both definitions infer an information lifecyle process, that manages the digital objects from creation to deletion. Both suggest that capturing and adding value, whether by commentary or related material, is vital to the end product which is knowledge or information that can be referenced now and in the future.

Message Amplification

However the evolution of digital curation is experiencing some fragmentation. Not that this is bad, but it does suggest the differences should be understood as curation tools will differ in features and capabilities as each tries to satisfy its target customer base. So far I have identified 3 major distinctions in curation:

Marketing Content: comes in several forms as marketeers move away from landing pages on Facebook and web sites, and seek to amplify brand presence through curated content.
Information (or Knowledge Content): More focused on collecting and condensing information to support a topic or subject. Most commonly a reference site usually set up for either internal or external collaboration
Personal Content – less dependent on content management features and capabilites: can either be used for amplification (self-branding) or condensing (information).

The question I would like to pose is who visits these curated sites and what are their preferences. The following poll offers choices in the style and content of curated sites. Please let me know which sites you prefer to access for either information or shareable content. I have made a further distinction for sites that are the result of either employee or community collaboration as they possibly differ from information sites in the degree of social participation (ie more social).

Big Data: Open to Definition

May 1, 2012 in Big Data, Information Lifecycle Management | Tags: Analytics, Application programming interface, Architecture, Big Data, collaboration, Data Administration, Data Mining, Definition, Global, Information, Networks, Open, Signal, Social media, Standards, Streams, Walled Garden | 3 comments

Image: nuttakit / FreeDigitalPhotos.net

How does one define Big Data and is “big” the best adjective to describe it? There are many voices trying to come up with answers to this topical question. Gartner and Forrester both agree that a better word would be “extreme”. Between the two major consulting firms they have determined four characteristics that extreme can qualify: they are agreed on three: volume, velocity and variety. On the fourth they diverge, Forrester postulates variability while Gartner prefers the word complexity. These are reasonable contributions and may form the foundation for the definition of big data that the Open Methodology Group is seeking to create within their open architecture Mike 2.0.

However the definition still falls short of the mark, as any combination of these characteristics can be found in many of today’s large data warehouses and parallel databases operating in outsourced or in-house data centers. No matter how extreme the data eventually Moore’s Law* and technology will asymptotically accommodate and govern the data. I could suggest that the missing attribute is volatility or the rate of change, but that too can be applied to current serviced capabilities. Another important attribute that is all too often missed by analysts is that Big Data is world data, it is data in many formats and many languages contributed by almost every nationality and culture and the noise generated by the systems and devices they employ.

Yet the characteristic that seems to address this definition shortfall best is openness, where openness means accessible (addressable or through API), shareable and unrestricted. This may be controversial as it raises some key issues around privacy, property and rights, but these problems for big data still need to be resolved independent of any definition. Why openness? Here are six observations:

Any data that is not open, ie that is private, covert or obscured is by default protected and confined to the private architecture and data model(s) of that closed system. While sharing many of the attributes of “big data” and possibly the same data sources at best this can only represent a subset of big data as a whole.
Big data does not and cannot have a single owner, supplier or agent (heed well ye walled gardens), and is the sum of many parts including amongst others social media streams, communication channels and complex signal networks
There will never be a single Big Data Analytic Application/Engine , but there will be a multitude of them , each working on different or slightly different subsets of the whole.
Big Data analysis will demand multi-pass processing including some form of abstract notation, private systems will develop their own notation but public notation standards will evolve, and open notation standards will improve the speed and consistency of analysis.
Big Data volumes are not just expanding, they are accelerating especially as visual/graphic data communications becomes established (currently trending). Cloning and copying of Big Data will expand global storage requirements exponentially. Enterprises will recognize the impractical economy of this model and support industry standards that provide a robust and accessible information environment.
As enterprises cross into crowd-sourcing and collaboration in the public domains it will be increasingly difficult and expensive to maintain private information and integrate or cross reference with public Big Data. The need to go open to survive will be accompanied by the recognition that contributing private data and potentially intellectual property is more economic and supportive of rapid open innovation.

The conclusion remains that one of the intrinsic attributes of Big Data is that it is and must be maintained as “open”.

Tag Archive

Recent Posts

Archives

Categories

Meta

Twitter Updates