Category: Internet of things

To work on SPARK

If you look around in the meetups, the talks about adopting the Spark as real time computation engine and using Millb for large scale machine learning is popping up almost every Teck Hub, in San Francisco or London or Beijing. The Spark community is thriving so fast in this year, everyone seems to agree that Spark will be replacing Mapreduce from Hadoop stack in many time-critical scenarios with the use of Spark streaming, and just now we can move from big data to big real time data. So far, some of companies are already adopting Spark, in China like Taobao and Baidu.
What lures me into the Spark has many reasons.

1. Real time performance with Spark streaming

In smart transportation or smart health, although real/ near real time communication becomes available through protocols like MQTT,  high time latency on the analytic layer  was a headache that not so many public user are interested in old data analytic from the past day. Simply because batching processing in MapReduce is based on  usually takes tens of minutes to hours on big dataset. Thanks to the Spark ‘s streaming and in cluster & memory computation so that latency can be minimized from mini-seconds to seconds.(As for the in memory computation, it is also good to see other DRAM based solutions like VOLUME) That is undoubtedly a huge improvements that can aid us to aggregate the big datasets or make predictive analysis in near real time and ultimately acts faster to understand what is going on in the traffic. In our computer vision projects,  make image processing, feature extraction and do classification could be faster to learn the scenes or activities on the scenes.

Behind the scene of Spark, there are two basic concepts RDD and streaming.

1.1 RDD (Resilient Distributed Datasets) 

Basically, you could consider it as a data structure which is designed to be distributed and fault-tolerant on the cluster. For data scientists, it is  easy to manipulate data with RDD like what we do on the single PC; Moreover RDD offers data scientists a set of common operators like transformation operators (for example , map and filter )and action operators (for example , reduce, collect, count)  to apply on the distributed data. Special to note, RDD is  designed to be lazy evaluation, which means Spark executes all previous  transformation operators only when it encounters the action operators,  this mechanism makes data processing more effective. The previous transformation operators in Spark is called as Lineage – a directed acyclic graph structure stores the transformation metadata, children RDD depends on the parent RDD. When the lineage is too long, checkpoint is needed to avoid long rollback.  At last,   intermediate RDD is also cache-able in the memory  thereby operating on it more quickly, making it reusable.

More advanced RDD includes Broadcast, join, accumulators, controllable partitioning. Still to be discussed in real examples.

1.2 Spark streaming

It works by splitting the streaming data into small discretized stream in a batch size (minimal 0.5 – 2 seconds), each discretized stream is then transformed into RDD and later manipulate the RDD with various Spark operators, on AWS Spark is said to be scaled to 100 nodes, each node with 4 cores, processing 6GB/s in seconds latency.  Spark ‘s streaming is often compared to other streaming systems, for example, Twitter Storm. It is said in Storm the minimum batch size is 100ms, but Spark outperforms Strom in scalability and processing capabilities .  Hopefully we can see that in our tests.

2.   Integration with existing HADOOP system

Actually, Spark solves the problem of real time large scale computation in cluster, however it doesn’t invent all the wheels,  it could work with cluster like Mesos, YARN, standalone. Being integrated with Hadoop based file system allows it to work with file format like HDFS, S3 and sequenceFile.

3. Large scalable machine learning

Weka is a easy machine learning platform in Java which appeared in my master time, don’t know how it goes now. Although Weka is so easy use and prototype but can solve the problem of large scalable machine learning on big data. Mahout is good in Hadoop, butMLlib seems to me a better choice. So far MLlib provides linear regression, logistic regression, Kmeans clustering, Native bayes, SVM … Look forwards to Mllib to introduce Softmax, neural networks.

4. Support Python and R

Although made in Scala, for data scientists, Python and R are the most common languages, I am Python guy so PYSPARK is what I am looking forward to program with, combined with scikit-learn ‘s preprocessing utility / perhaps pylearn2 ‘s deep learning model and GPU integration / python opencv

In fact, Spark has provided SparkSQL, GraphX, still needs to understand better how GraphX works.

In the next two month, the junior team is going to experiment on the Spark and hopefully put it on our stack for real time analytic on the latest smart city project, looking forwards to their results.



In the near future, ubiquitous computing will be everywhere, enormous small networked devices, commonly called as smart objects are embedded inside our daily environments, in our cars, houses, our clothes, shoes or even our body. These smart objects reshape our living environment, making it ubiquitously accessible than ever. What is more, by combining these smart objects with personal mobile/web applications, people can access to their environment in more ease, either remote monitor or remote control, more advanced smart applications can also bind the distributed sensors and actuators with user’s preference to make smart environment automated in response to the contextual changes.

Remote monitoring and control based on the device level gives us many benefits such as environment awareness and robot automation. But apart from this, there exists another great benefit derived from the device level – the enormous, continues and real time data streams which can be seen more valuable and exploited for the society as a whole. For example, my health data could be delivered to 3rd party (your personal doctor or hospital), they have the capability and knowledge to analysis your data, read data as health pattern and used to infer my health problem accurately. It gives people more easy and fast analytics on data, but more importantly collaboration between entities, like patients and doctors, household and Electricity Company.

All these sound great, but thinking that in the future, everyone has to manage tens of hundreds of smart devices, a headache problem? Yes, let alone there are more serious issues on the management of bindings between smart device and various applications, meaning user must be also aware of if there are 3rd party applications abusing my device data for benefits without accessing my permission – privacy issue or maliciously control my device remotely. It is not only concerned to network security/privacy problem in virtual reality, but far more stretched to the security and safety of our life in the physical world. So before this vision is truly coming to us, we need to make the worst assumptions and place the management of devices, security and privacy on the top list.

The main purpose of the webinos is to solve the interoperability and deployment issues of web-based application among various platforms. So for a web developer, they can make a web based application making use of web standard API and deploy them ubiquitously in PC, mobile, web, vehicle, embedded system platforms. What is more, the web based application could have the app data being synchronized, shared among the platforms; and different user can get collaborate in more explicit way under the secure framework. While on the user’s perspective, it makes easier to manage his devices and multiple level control on the application and 3rd party access. All these features are provided under the implementation of PZP and PZH.

As we know, PZP can represent one person’s device, a middleware between device, context control and application.  PZH resides on the server, is a web service, managing all this person’s devices, their security, privacy on the data and connection. PZP are implemented on the device level and PZH are implemented on the cloud server level.

Looking back to the vision of IOT whether for the remote monitoring, automation, collaboration, all require devices talking to web or devices talking to each other securely, privately. Concerning the security and privacy, that is to say how are we going to split the keys and certificates into different levels of control and distribute the keys and certificates to others while maintaining easy management for us.

In our digital worlds, people’s identity is relying on the digital signature, certificate based on PKI – public key infrastructure. Therefore, device association with people, device communication, more complex interactions rely on this PKI again, but more webinos does is to maintain the interactions of these, and Webinos gives mechanism  and policy implemented in PZP and PZH to distribute the key and certificate to devices, exchange them with other people. That ensures people three things.

1. People have a personal central hub to manage their devices uniformly in multiple levels, in particular associating personal identity with device.

2. Data connection between device and cloud/ device and device is secure.

3. People with a web trust can also get access to the other people’s device with their permission.

So with certificate exchange between devices, home monitoring device can be added to personal zone – in other words, device management hub, hub helps me authenticate the device, encrypt the data and route the device data to required location, such as my mobile phone. This is the basic for remote monitoring and control.

More complex interactions happens between users, that one trusted user want to use the device of another use, they must exchange their certificate and use authentication token to get access to the control the that device.

For example, Hackers convert MIT building in giant Tetris video game, there is a large building with hundreds of offices; person in each office owns the access to the light control, so he can remotely control the light on/off. So imagine one application wants to display the animation on the building surface with hundreds of windows in the night, so this application needs to get the permission from the person in the office to control that guy’s office light. In the webinos perspective, each light is like the PZP, and each person in the office uses the PZH to manage the certificate of device and application. The application that uses building surface to display is considered as PZH, so this application PZH has to talked to all the PZH in the office, and getting the signed certificate and finally with these certificates, that application PZH can control all the window/office lights at the night.

Or imagine use raspberry pi to teach the children programming, a tutor can start a course, which automatically generate a certificate for the course. Each child has a raspberry pi with certificate in the memory. If the tutor wants to collaborate with child remotely, like passing the codes to child’s device and uses code to control the device for the demo purpose (to show how the code works with the device). This also requires the devices sharing the certificates and communicating trustfully and securely.

So for me, the key of webinos is on the security and privacy but not the deployment of web apps. Simply connecting devices together is not an issue already because of IP, the true issue is people don’t trust connecting their devices together and sharing their data because they felt not protected. I felt the issue of interoperability and deployment of web app could be solved with Mozilla OSor Phonegap, while backbone of webinos provides a high level control on how people work with their devices, data and collaborate with other.

To build a smart city platform, in the simple manager, the platform should have basic middleware control on what, when, how the data can be accessible, and additionally accessible by who – developer. This functionality is often seen in the many application scenarios in the auth2 way, that each developer can register for an application ID and get a secret key, with these the developer can get access to the resource of the platform, the for the platform manager,  they can monitor what resources the developer are using in their platform.

But a real sense of smart city platform, should more concentrate on the data rather management of developer. This can be explained in two perspectives.

One is the data not only points to the raw data but should have a semantic layer. In the semantic web, all the data are being connected; interoperable with different data vendor. With this in place, machine itself can understand the meaning of these data. And application can extract the context data from the data semantic layer that leads to the application more context awareness. Webinos tries to address the context data, but in limited manager, it doesn’t put the semantic web or ontology into the consideration, after all, semantic thing is not the problem it tries to solve.

The second perspective is real time data. The smart city platform is sitting between the data producers and data consumers. Data consumer is commonly referred as the developers who are using the data to make applications. Data producers can often be person or organizations who owns a few or a large number of devices, through the smart city platform, they are manage their devices and data securely. (Meaning even the device data are stored on the cloud/platform, the manager of platform still can’t see through the data due to user has the privacy key.) And in many circumstances, the input the platform is the real time device data, and the platform should route the data in real time to those developer who seen as the data consumers. To address this, one big challenge is the device authentication, web trust on device and user, which is to be addressed by webinos security and privacy framework. Another big thing is the real time data streaming of device data, is believed more efficient with protocol like MQTT. But webinos only supports XMPP, causing much latency in fact.

Ultimately, I don’t know on the interoperability and deployment of web application whether webinos could defeat Mozilla OS or Phonegap. But the security, privacy, policy are the core to be reused for solving the trust for device and data. The webinos security stuff could be implemented as software component under phonegap framework to achieve both security and interoperability.

Management of device and data

Speaking about the smart city platform, most of time we are referring to the open data or big data, using the cloud computing and data analytics to generate the great values out of large volumes of data.  It is true benefits to have government open data access in many areas in the world for the transparency. But when the data analytic really comes into the enterprise level or personal level, more concerns are on how to still hold on the control of the data in the cloud, and still maintaining the privacy and security protected, in particular in the most circumstances now, this data is coming from all various devices in all platforms.  Naturally, it is easier to manage the device data through management the devices, thinking when dragging the devices to the webinos ‘s PZH, personal zone hub, you can manage your own devices by issuing your signed certificate to the device, not only facilitate communications among all your devices through this web and moreover PZH encrypts all the data so yourself with privacy key can get access to.

Management of access control

Moreover, a smart city platform‘s role is not merely processing the big data only, but extended to a diverse hub or data distributor with big data analytics capability, connecting the data publisher with data consumer.  Anyone can be a data publisher, storing your health data, and then you decide who, which doctor can get access to your data, and analysis your data into useful information. When the data is already stored on the cloud – e.x. patient history record, resource access control is much easier done, but when it comes to some circumstance, the doctor needs the real data directly coming from your health device, techniquely it requires the doctor to show his certificate and when approved, the device can authorize the signed certificate to the doctor and thus build the communication.  This scenario is also ensured by the webinos to keep control the access of your personal resources not on the cloud, but to the level of devices. In the smart city platform perspective, that helps the data publisher to control the access of your resources to the device level.

App distribution

At last, owning to the security of device communication and data privacy done in the webinos work, we could have a decent secure communication on the web level. Using the communications, we can distribute the data, but of course making app distributions a reality. So smart city platform could also guarantee from webinos, is a delivery of web based apps in browser based environment.

Helping the data consumers to manage the devices uniformly, keeping the data secure and also helping different collaborators to share and play with their resources securely is two major things in keep the webinos as an important platform and can be regarded as the backbone for building smart city platform.

What is missing?

When dealing with data consumers of smart city, data analytics plays an important role, however when it comes to deal with data publishers, data communication is important, especially in the future, millions of data streams are not coming from people’s hand, but from the ubiquitous devices. Smart city platform needs to connect it with millions of sensors and devices in the internet of things in secure manager.

Interoperability of wireless sensor network

Yes, webinos has make its way to be across platforms in cars, TV, mobile, which is either based in Windows or Linux. Also you can see webinos has been deployed in the raspberry-pi, we can ideally assume webinos comes to the internet of things, smart devices, robots can running webinos, streaming their data to the data platform. For example, using raspberry-pi as the gateway in smart city infrastructure and deploy the webinos PZP as the high level middleware, owning to the communications between PZP and PZH, the manager can manage all the gateways uniformly, securely and can grant the data access to other stakeholders who needs the data.

But there still exists the gaps. Just recently, Google has worked on a wireless sensor network its Google I/O 2013, in which 500 sensor platforms and thousands of sensors were deployed in the conference building, these sensor platforms are generating 4000 data streams in real time, telling people about the environment data. The network is an example of the “internet of things”, where physical objects are digitally interconnected and communicate without human intervention. It reminds me, one most important part of internet of things is the wireless sensor network, consists of many resources constrained platforms with limited memory and battery. Like the big data, the wireless sensor network would be the most important city infrastructure in every area (home automation, car parking) and can’t be ignored by anyone. IPV6 and low6pan has made a stride making wireless sensor network interoperable in IP standards, where means in the future, every sensor can be visited through IP. In fact, in the early webinos project, porting webinos on the arduino has exposed to a challenge, since current version of webinos is implemented in node.js and deployed in the browser or Linux. That requirement has restricted the webinos into the wireless sensor network due to the operating system. For the wireless sensor networks, tiny.os is an open source wireless sensor network operating system platform, popular in academic and industry.  Since webinos is only a specification, porting the webinos in the tinyos, meaning webinos ‘s enabled feature like security and privacy can be maintained in the next important platform in the smart city. By then, with secure communication built in webinos, smart city platform can be marketplaces for the wireless sensor network applications, where you can have the wireless sensor platform, deploy any high level application software upon it.

COAP and MQTT support on PZH

Also for wireless sensor networks with low data rate and bandwidth, http styled – coap based UDP and publish/subscribe styled MQTT – described as machine-to-machine (M2M)/Internet of Things connectivity protocol, are coming into place to play with WSN, building the PZH with new ability to talk with these devices in these protocols enables its role in the internet of things. by which sensors, control systems, embedded systems and mobile devices can publish and subscribe low-level, technically-orientated data

ZIGBEE standard and web standard

Also regarding the wireless sensor network, ZIGBEE is an application standard addressing different applications, for example home automation, in which it defines the command and protocol for the interoperability of communications. And webinos is addressing the web standard. When combining with wireless sensor network, not only has to address the software interoperability by using tinyos, standard interoperability between ZIGBEE and web standard is also needed to be addressed, so webinos as a middleware, the web interface is still defined by web standard to get access to sensor and actuator, while implementation of ZIGBEE lies on the

Hooking into hardware Security

webinos is trying to address the web security between device and cloud, device and device, still security is based on web level, depending on the secure the private key. If the private key is secure, then your personal communication is secure. The problem is raised to devices (such as unmanned devices served as city infrastructure) which are not maintained by people, physical hacking the device by reverse engineering the device , tamping into the ROM of device, hackers can get access to the key and then can manipulate the device communication and data. Many chip solution provider are looking into to securing the hardware security on the chip level. For example, ARM is building a trustzone into the chip level, including the secure storage of private key. Therefore, to secure a communication as a whole, web level combined with chip level is to be maintained in depth.

Amsterdam Smart City (video).

1.  Hunger for energy

One of major issues with the  high-tech in door farming system based on hydroponic, is that the system is of highly hunger of the energy due to use of the LED light, sensors and actuators like valves and pumps. Psychologically, it is a issue that people don’t want to spend too much money on a machine that produces less green.  In fact, I am quite shocked when I hear a ordinary household system costs energy as much as a fridge

It is a business trade off between the green and energy. We surely can have solar power, wind power for energy resource, however, cost, maintenance for the individual or family use are still big problems.

2. A great Design

So unlike conventional large green house or roof farming who focuses really on large farming production, the values of small scale, in-door farming system might not lie on the production, but more on the elegant design that adds green and healthy element and life style to the house, or intelligent apps autonomously help coordinate the resource of water, energy and green based on the sensors and adaptive robot learning algorisms possibly. That is to say we need a good design and good app.

Since now in the long tail economy, it is now getting harder and struggling to say what the customer really want. There is no longer a popular design that dominates and satisfies to all, except in circumstances that you control the whole chain of ecosystem like what Apple does now.

Again a module design on the hardware for adapting to new components, interfaces, i.e. adding a solar system for energy, or adding a camera for visualization. Modularity means the flexibility for the customer’s needs and it help to gain new life cycle when in necessary

A good app for green farming?

apps to make people manage the input and output in terms of finance, material, time -as when it is for the new, how long, and when it will be grown out, useful information,  the condition of the grown, the phase of the grown, the energy produced and used

For a open design on the software, it is more important to have geeks to come in and play their part , prototype with it in a their own intelligent way and easy way. software interface is necessary for this reason. 


For the time being, we are seeing many good designs, one good example is from Danielle Trofe inspired by the green wall.

Trofe’s creation works much like hydroponic systems, which grow plants using nutrients without soil. Electric pumps send water up the stands based on a timing system that can be customized for each pod. Excess water trickles back down the shaft and into the reservoir at the foot of the unit. Each pod has an LED light built into its underside to provide extra ‘sunlight’ for the plant below it.

The Live Screen is a modular design, so it can be adapted to fit any space of a minimum size. Because it is a standalone setup, it doesn’t depend on a wall, thus the "screen" in the name. I could see these being used in lieu of cubicle dividers in offices, providing a much needed dash of green along with some privacy- as long as you keep the plants alive. Plus, there’s the added benefit of growing herbs, vegetables and even fruit at your desk or in your city apartment.

Earl Bellinger

Earl Bellinger

Peng's Blog

where ideas come from

ajduke's blog

technical notes on software development

Urban Armor

DIY wearable electronics for intervening in the everyday

Pod-able Life

Pod, noun: streamlined housing of some kind.

Practical Vision Science

Vision science, open science and data analysis

Dejan Glozic

30% Turtleneck, 70% Hoodie

Representation Learning

Course material for graduate class ift6266h13

Sina Honari's blog for Representation Learning

My blog on the Representation Learning Course

IFT6266 Project

Log of my Representation Learning course project


I am the Bad Wolf, I create myself

Machine Learning on Emotion Recognition

Research Journal of Yangyang Zhao for Machine Learning Course

IFT 6266 H13 Blog

Welcome to the machine l...

Experimenting with representation learning

My journal for IFT6266 projects

My missives

A technologist's view of things ...

Marcos Nieto's Blog

Computer vision, research and more!


Technology news, trends and analysis covering mobile, big data, cloud, science, energy and media

Tickett's Blog

Jibber Jabber!


News About Tech, Money and Innovation