This summer I made use of some of my commuting time by studying network infrastructure via Enki a popular CS app – I highly recommend it. Also I’ve had a little experience setting up small apps on both Amazon’s AWS and Googles Cloud Platform – the AWS UX for a newbie left a lot to be desired!!

I’m a long way from having a clear ability in anything ‘sysadmin’ related but this gave me a good understanding of how things all work together and the transactions that really take place. The operation of invisible network protocols and email systems that support the whole Internet are a modern wonder. Also their open source nature (not owned by anyone) is something to be thankful for, and notable as cloud computing boasts the opposite.

The cloud has enabled some fantastic new applications and some businesses and platforms that simply wouldn’t be possible without the cloud’s unique set of features and benefits with regards to big data. Unfortunately it has also meant that the internet’s big 4 (Msft, AMZ, GGL, APL) have ended up with the majority share of the market – thus you could even say the gatekeepers of the data or the knowledge. As mentioned in a previous post it’s quite telling that the NSA uses AWS! (https://www.google.ae/url?sa=t&source=web&rct=j&url=https://aws.amazon.com/federal/us-intelligence-community/ )

HOW DOES THE CLOUD AFFECT DATA SCIENCE?

I supplemented our suggested reading resources with exploration around some AWS documentation (https://aws.amazon.com/deep-learning/) after a recent podcast discussing the ‘on the cloud’ proposition for reducing the time and complexity needed to achieve results in artificial intelligence. AWS state:

“By using clusters of GPUs and CPUs to perform complex matrix operations on compute-intensive tasks, users can speed up the training of deep learning models.”

As machine learning becomes more ubiquitous and the barriers to entry increasing lower the cloud has an important function to play by providing cutting edge architecture. I’ve seen it described as a plug and play setup, ideal for data scientists looking to run typical experiments.

Another benefit is the opportunity and the reduced cost of using substantial data sets. Again the scalability and portability of the data opens up whole new areas of operations for data intensive businesses.

Cloud architecture also mitigates against data loss as it never resides solely on one server. The AWS solution here is a feature called snapshots (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSSnapshots.html).

Automation and cloud services have enabled a whole new type of business model of which I’ve been a particular beneficiary of when involved in chatbots. Providers usually offer three different types of services: Infrastructure as a Service (IaaS); Platform as a Service (PaaS); and Software as a Service (SaaS).

It mitigates growth problems such as going viral as a startup – I enjoyed reading about Instagrams interesting start! (https://www.npr.org/templates/transcript/transcript.php?storyId=493923472&t=1571122483944).

Kubernetes is a popular cloud based solution that allows for a pre specefied server to be spun up on demand with all the dependencies in place. From their website:

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerised applications. Designed on the same principles that allows Google to run billions of containers a week, Kubernetes can scale without increasing your ops team.” Powerful stuff and something I hope to achieve competence in during my studies.


CHALLENGES OF THE CLOUD

CLOUD SECURITY AND PRIVACY

As technologies big players control the majority of the cloud the question has to be asked, who really owns the data? Of course there are processes and policies and place but it’s not like the providers have a perfect record! :

AWS:
https://www.sumologic.com/blog/aws-security-breaches-2017/

GOOGLE CLOUD PLATFORM:
https://www.zdnet.com/article/hackers-breach-volusion-and-start-collecting-card-details-from-thousands-of-sites/

You have to wonder how great the benefit really is to having data in the cloud if it is very sensitive. Clearly the size of your data set is of key consideration. A benefit of the cloud is that your data cannot be lost as it maintains a place on many servers at the same time world wide. But this also implies a difficulty in having it deleted permanently. It’s an invisible path that is traced by the data!

Having read more about the NSA via Edward Snowden’s autobiography and learning they are even able to crack access to peoples usage of the Tor browser via a Firefox loophole, I’m very sceptical about the true nature or definition of privacy on the cloud.


Leave a comment

Your email address will not be published. Required fields are marked *