Stephen E. Arnold: A Fresh Look at Big Data & Big Data (-) Human Factor (+) Transformation (+) RECAP

Stephen E. Arnold

Stephen E. Arnold

A Fresh Look at Big Data

May 8, 2013

Next week I am doing an invited talk in London. My subject is search and Big Data. I will be digging into this notion in this month’s Honk newsletter and adding some business intelligence related comments at an Information Today conference in New York later this month. (I have chopped the number of talks I am giving this year because at my age air travel and the number of 20 somethings at certain programs makes me jumpy.)

I want to highlight one point in my upcoming London talk; namely, the financial challenge which companies face when they embrace Big Data and then want to search the information in the system and search the Big Data system’s outputs.

Click on Image to Enlarge

Click on Image to Enlarge

Notice that precision and recall has not improved significantly over the last 30 years. I anticipate that many search vendors will tell me that their systems deliver excellent precision and recall. I am not convinced. The data which I have reviewed show that over a period of 10 years most systems hit the 80 to 85 percent precision and recall level for content which is about a topic. Content collections composed of scientific, technical, and medical information where the terminology is reasonably constrained can do better. I have seen scores above 90 percent. However, for general collections, precision and recall has not been improving relative to the advances in other disciplines; for example, converting structured data outputs to fancy graphics.

 

I don’t want to squabble about precision and recall. The main point is that when an organization mashes Big Data with search, two curves must be considered. The first is the complexity curve. The idea is that search is a reasonably difficult system to implement in an effective manner. The addition of a Big Data system adds another complex task. When two complex tasks are undertaken at the same time, the costs go up.

Read the rest of this entry »

Comments Off
May 9

Stephen E. Arnold: The Honk

Categories: Access
Stephen E. Arnold

Stephen E. Arnold

EXTRACT:

The Times does mention that reputable organizations including the Huffington Post, The Atlantic and Business Insider all use branded content. (In fact The Atlantic apologized for one instance of sponsored content from the Church of Scientology in January). The Times suggests the reason for this shift in advertising technique below,

“Publishers are largely being driven to support the use of sponsored content because of fewer people clicking on banner ads, the abundance of advertising space and other factors make it more difficult to make money from traditional online advertising. As advertising technology becomes more sophisticated, ads can be bought and sold at cheaper rates across the Web. Often they are ignored by the very customers advertisers are trying to reach.”

The Honk is available by free subscription; it is not posted online.  Visit  http://arnoldit.com/wordpress/honk/ and send an email to thehonk@yandex.com.

Comments Off
Apr 22

Neal Reauhauser: Exploring E-International Relations

Neal Rauhauser

Neal Rauhauser

Exploring e-International Relations

When I was checking out the Think Tanks & Civil Societies Program I noticed e-International Relationsthe world’s leading website for students of international politics. They had an About page similar to that of Wikistrat, listing all of their volunteer editors and some additional information on them.

Last night I entered most of that information into e-IR-base, a Maltego graph. Those who want to follow along can download the graph file, get the free Maltego Community Edition, and do a portion of the things I do with it. The free version has very limited access to Paterva‘s transform servers, so I will provide the necessary intermediate files.

Click on Image to Enlarge

Click on Image to Enlarge

This is a top level view of the e-IR graph. What I say next presumes some knowledge of hands on work with Maltego.

The lavender dots are Person entities – a place for a first and last name, and like every entity you can makes notes and attach files to it. The blue dots at the upper right are URL entities and they contain links to an editor’s profile on the official site. Not everyone has a profile – this seems to be for people who produce their own content as well as work as editors. The five green dots are Twitter accounts, the five blue dots with an orange dot in the middle are LinkedIn profiles and an entity for the domain itself.

Maltego provides different types of entities, but here at the start we are only using Person, Domain, URL, and Phrase. Maltego provides a way to group different types of entities using colored stars – blue, green, yellow, orange, and red. This is useful for searching and organizing tasks – if you run a transform that starts with the five Twitter accounts shown here, but gets back over a thousand responses, how do you spot your originals?

Read full post with additional graphics and links.

Comments Off
Apr 17

Michel Bauwens: Open Access & 3-D Printed Car

Michel Bauwens

Michel Bauwens

Open Access: a remedy against bad science

Who has never been in the situation that he had a set of data where some of them just didn’t seem to fit. A simple adjusting of the numbers or omitting of strange ones could solve the problem. Or so you would think. I certainly have been in such a situation more than once, and looking back, I am glad that I left the data unchanged. At least in one occasion my “petty” preformed theory proved to be wrong and the ‘strange data’ I had found were corresponding very well with another concept that I hadn’t thought of at the time.

Click on Image to Enlarge

Click on Image to Enlarge

3-D Printed Car Is as Strong as Steel, Half the Weight, and Nearing Production | Autopia | Wired.com

Kor and his team built the three-wheel, two-passenger vehicle at RedEye, an on-demand 3-D printing facility. The printers he uses create ABS plastic via Fused Deposition Modeling (FDM). The printer sprays molten polymer to build the chassis layer by microscopic layer until it arrives at the complete object. The machines are so automated that the building process they perform is known as “lights out” construction, meaning Kor uploads the design for a bumper, walk away, shut off the lights and leaves. A few hundred hours later, he’s got a bumper. The whole car – which is about 10 feet long – takes about 2,500 hours.

Comments Off
Feb 28

SchwartzReport: End of Bulk Cable, Begining of Selective Wireless Channel Access

Categories: Access,Cloud

schwartz reportIntel Is Reportedly Going To Destroy The Cable Model By Offering People The Ability To Subscribe To Individual Channels

Intel is reportedly on the cusp of delivering something that consumers around the world have been wanting for a long, long time.Kelly Clay at Forbes reports Intel is going to blow up the cable industry with its own set-top box and an unbundled cable service.Clay says Intel is planning to deliver cable content to any device with an Internet connection. And instead of having to pay $80 a month for two hundred channels you don’t want, you’ll be able to subscribe to specific channels of your choosing.

Read full article.

 

Comments Off
Jan 3

SmartPlanet: $41 Tablet for India

smartplanet logoThe world’s cheapest tablet, improved (and reviewed)

By Betwa Sharma | December 31, 2012,

Click on Image to Enlarge

Click on Image to Enlarge

DELHI — In 2012, SmartPlanet reported on a series of inexpensive tablets from India especially the $41 one called Aakash, which was launched by the Indian government.

Datawind Inc., a Montreal-based tech company, made the tablet in response to the Indian government’s challenge to create the world’s cheapest tablet.

Aakash, which was further subsidized for students to $35, received bad reviews. Critics said it had poor battery life, an unresponsive screen, absence of useful apps, less storage space and a slow processor.

In November, Datawind relaunched its tablet as Aakash 2. The improved tablet is powered by Android 4.0 Ice Cream Sandwich run on 1 GHz processor and 512 MB RAM with 4 GB internal storage and 32 GB microSD support. Its basic features include 7-inch capacitative touch screen, battery life of three hours, 0.3 megapixel front camera and WiFi connectivity.

The Indian government will buy about 100,000 units from Datawind for Rs. 2263 ($41) and make it available to students for Rs.1130 ($20). The commercial version of the tablet can be bought online for Rs. 4499 ($81)

This time, it was launched not only in India but also unveiled at the United Nations.

“India is a critical player on security issues … but you are also a leader on development and technology,” U.N. Secretary General Ban Ki-moon said at the unveiling in November. “Indeed, India is a superpower on the information superhighway.”

“We need to do more to help all children and young people make the most of the opportunities provided by information and communications technology – especially all those who are still unconnected from the digital revolution,” he added.

SmartPlanet spoke with tech expert Prasnato Roy, editorial adviser at CyberMedia India, on what’s new with the tablet and will it work better.

Read full article with interview.

Phi Beta Iota: Combine with with Open Cloud and Open Spectrum, among other opens, and we create a prosperous world at peace, a world that works for all.

See Also:

21st Century Intelligence Core References 2.8

Owl: $20 Table Storms the World — Four Million Back Ordered

Search: openbts [as of 30 Oct 2012]

SmartPlanet: Mobile Phones Lifting Global Economy

 

Comments Off
Dec 31

Talking Frog: Linked Open Data (LOD) 101

Categories: Access
Click on Image to Enlarge

Click on Image to Enlarge

Wikipedia / Linked Data

Tim Berners-Lee, director of the World Wide Web Consortium, coined the term in a design note discussing issues around the Semantic Web project

The goal of the W3C Semantic Web Education and Outreach group’s Linking Open Data community project is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources. In October 2007, datasets consisted of over two billion RDF triples, which were interlinked by over two million RDF links.

Phi Beta Iota:  Wikipedia has its limitations.  This is not a very old idea as they suggest, but rather an extraordinary new idea, word-level linking.  Doug Engelbart’s Open Hyperdocument system (OHS) and Pierre Levy’s Internet Economy Meta Language (IEML) are related ideas.  What this huge new idea does is go beyond the “thing” to provide its attributes in Resource Description Framework (RDF).  The attributes that are of interest from a public intelligence point of view are those of “true cost” — time, space, energy, water, child labor, tax avoidance, etcetera.  Hence, each datum will be “context aware” and a specific item from a specific company will know where it is in time and space, its costs to date, and its projected costs into the future, all in relation to the specifics of its being.

DuckDuckGo / Linked Open Data (LOD)

Here is the updated image from Wikipedia of the linked datasets as of 2011:

Click on Image to Enlarge

Click on Image to Enlarge

 

Comments Off
Dec 18

Rickard Falkvinge: Four More Reasons Open File Sharing is a Virtual Public Library

Rickard Falkvinge

Rickard Falkvinge

Four More Reasons The Pirate Bay Is Effectively A Public Library – And A Great One

Posted: 13 Dec 2012 06:57 AM PST

Infopolicy:  File sharing fulfills the exact same need and purpose as public libraries did when they first appeared, and is met with the exact same resistance – even in the same words. This article follows the previous observation that The Pirate Bay is the world’s most efficient public library.

Zacqary Adam Green’s piece comparing The Pirate Bay to the New York Public Library the other day was spot on, and we’ve seen it travel a lot around the world – in excess of 3,000 shares and counting. File sharing (and The Pirate Bay) is the most efficient public library ever invented, and its invention is a quantum leap for civilization as such. Imagine every human being having 24/7 access to humanity’s collective knowledge and culture!

Moreover, it’s not even a pipe dream that needs to be funded with forty gazillion eurodollars. All the technology has already been developed, all the infrastructure has already been rolled out, and the tools already distributed. All we have to do to realize this is, frankly, to remove the ban on using it.

In the book The case for copyright reform (download here), we can read the following:

Read the rest of this entry »

Comments Off
Dec 14

2012 Steele for Branson: The Virgin Truth 2.6

Tags:
Click on Image to Enlarge

Click on Image to Enlarge

DOCUMENT (1 Page):  Virgin Truth 2.6R

Comments Off
Nov 28

Rickard Falkvinge: Brasil Kills Internet Bill, Loses Way

Rickard Falkvinge

Brazil Squanders Chance At Geopolitical Influence; Kills Internet Rights Bill In Political Fiasco

Infopolicy: Yesterday, the Brazilian parliament effectively killed the much-heralded Internet Bill of Rights, the Marco Civil, that had been praised by entrepreneurs and free-speech activists worldwide. This follows a ridiculous watering-down and dumbing-down of the bill, at the request of obsolete industry lobbies. Having been permanently shelved, this means that Brazil has practically killed its chance of leapfrogging other nations’ economies – BRICS is now just RICS.

The Internet Rights bill in Brazil, the Marco Civil, was a marvel. It would have enabled Brazil to leapfrog most other economies today, skipping a whole generation of industries.

The Marco Civil would have established that;

  • Internet access is a precondition for exercising citizenship;
  • As such, nobody may be cut off from the Internet for any other reason than failure to pay the connection fees;
  • The messenger immunity was almost absolute – nobody had any kind of accountability for carrying messages for a third party unless explicitly told so by a judge on a case-by-case basis;
  • Net neutrality was written into law;
  • All Internet regulation had to be based on preserving openness, participatory culture, and the open entrepreneurship that the Net brings;
  • Privacy applies online and must not be violated;
  • and much more.

Really, it was that good. Read it for yourself (in English).

Read the rest of this entry »

Comments Off
Nov 22