You did not understand the big data: human power

the author: Liu Shan cloud

about “big data” you must listen to abuse, but also didn’t understand what is going on. Even though you feel it all the time: the “some” advantages of

such as a week ago you do the interview questions in a social networking site, when you finish the interview just know, that test the real intention is to match your personality and team of fit. Another example, in 11 golden weeks ago you received a ticket application delivery of information, the above tips you book train tickets by telephone, 16 days ahead than on the same day tickets online success probability is high. When you really played in the past, found that the telephone booking system is in the busy, and had to wait for two hours after the call, only booking success. But better than the same time stuck in 12306 website and can’t pay the money.

big data is penetrating in all walks of life, and even can with you test capacity test and the probability of suffering from a disease very lifelike scenes are closely linked. The next big data in our life is like water and electricity, let the whole society information quality better, make information efficiency is more efficient.

in this system, the data of sampling and analysis of later period still need to rely on human power to complete.

crowdsourcing data sampling more automated

“in the future will be more and more unnecessary manual intervention, at least in the front-end data.” Product manager James to tencent when it comes to science and technology, now a lot of data collection from the interaction with the user behavior, such as search, weibo interactions, such as “like”, “like”, “lost a wastepaper basket” in the application of the design of the small and medium-sized, such as long as the user complete, data quality can be calculated in the background.

onion prices rises and determines the trend of India’s inflation rate, a start-up, called the Premise by every day more than 700 installed his own development and application of the user to upload real-time regional different retail price of onion.

the company’s co-founder David’s (David Soloff) think that this is a real time awareness of the global financial dynamic effective channels, because local shops are generally according to the change of economic environment, including factors such as wholesale prices and consumer confidence) adjust commodity prices.

“Premise has proved that the proposed analysis method according to the data collected in the part of the economic environment, inflation index prediction 4 to 6 weeks in advance. You completely don’t have to wait after the monthly ‘economic weather.” Rove stressed.

but for retail stores, the shelves of brand display directly decides the sales volume, how to make the brand in the flow of customers has been a good display position, let the job needs both time-consuming and also very trivial.

a company named title qari, by developing an application, called EasyShift let users paid contribution time to go to effort to finish the data collection. Users get the application of task, take pictures of the designated place, at the appointed place uploaded to title qari server, can get the corresponding low pay.

EasyShift concept is easy to understand: the majority of users are now carry a smartphone. Brands want to understand the status of display their goods in large retailers, evaluate the competition dynamics, reports out of stock of the product and pricing information, monitoring promotion and product release. EasyShift paid to consumers and let them to collect the information conveniently when shopping.

in the massive earthquake in Japan, accidentally use a certain brand navigation real-time visualization of data, get through the project of “green life channel” lifeline “connections”.

the project director kanno fume is dentsu creative design center, senior director, he received before the earthquake in Japan a car brand cooperation projects. The project for which a car in a pinch, what time, which is located in latitude and longitude, in which direction to how fast driving, etc., and every minute about one hundred thousand dynamic data will be recorded on a vehicle-mounted navigation database, kanno smoked the data integration within a program, and displayed in the form of a map of Japan.

during an earthquake in Japan, the navigation data can be temporarily come in handy.

“during an earthquake, the communication signals are not very clear, people can only confirm whether the relatives and friends through the network safe, the challenges we face is how to rescue teams to the disaster areas.” Kanno says.

navigation data was used for the traffic congestion and collecting vehicle driving data. “From another perspective, there is traffic data shows that the road can be through.” Kanno smoked, once has the vehicle, after the earthquake to annotate it with green, form a traffic track.

at the same time, the team also on Twitter users real-time organization across Japan to release the full road and road sign information, integrated two types of information, the green life channel data published 20 hours after the earthquake, public download on the net. In addition to the web client, programmers and rapid development of the mobile terminal. At the time of the crisis, the spread of information is extremely fast, soon in on websites and mobile phone applications, more than the green lines are presented one by one, provides reference for rapid rescue team arrived.

big data era human intervention is necessary

machine learning really occupy the leading role in the big data, but really don’t need human intervention? For example, you have been used to flood in the network marketing, but you really recognized by simple mathematical model and the size of the data analysis of marketing recommendations?

ZestFinance is a use of machine learning to increase data analysis for payday loan industry (payday loans, similar to usury short-term high interest loan), to provide customers quality analysis platform.

unlike traditional analysis methods, ZestFinance can run multiple models at the same time analyze the huge amounts of data to determine possibilities, coupled with the more and more data sources and types, and then the information is converted to tens of thousands of to borrowers behavior to make the measurement indicators, such as fraud risk, long-term and short-term credit risk and his ability to repay. Finally the results of the model is integrated into the final result. The platform can be in a few seconds to provide users with the most reliable results. Founder, Merrill said: “we tend to through the machine learning mechanism and to combine with manual intervention.”

in the medical field, for example, data analysis based on machine learning is far from enough. “Because of machine learning the probability of a certain proportion can be calculated, but can’t achieve precise and accurate.” Spring rain palm doctor CTO Ceng Baiyi to tencent technology, for example, as to the design of the disease model, is from an existing database in all, more than 90% of the similarity of questions, will issue the result analysis and summary, making disease probability model, and every problem will be the doctor’s advice, summed up the “fine” and “go to the hospital” proportion, provide patients with intuitive data reference.

“but it’s also take a certain probability is used for users to check list. Is whether can be accurate to patients really conforms to this condition, or the need for human analysis (a doctor), we in the backstage data analyst will go to check, identify the accuracy of the data.” These people said.

source: tencent technology