A Whirlwind Tour of Azure AI Services

I’m a massive fan of democratizing AI and Machine Learning and putting the technology in the hands of users which is really usable. This is something I find idealized in Microsoft Azure’s Azure AI services suite. In this talk I am going to give a very high overview of what you can do through the services, and am planning to write deep dives into the services I think are the most useful!

Azure AI Services: Overview

The Services are available through the Azure portal as a resource as can be seen below

Upon opening the resource you will be met with the following:

This is all of the AI services available to us. The main services that seem most useful are:

Computer Vision
Speech Service
Language Service
Document Intelligence

Below I will give an overview of what each of these particular services can do. On a side note, you will need to create a instance of each of these services before you start which is a matter of clicking create on the box of the service you want and following instructions (choosing the free price tier where possible).

Computer Vision

You can primarily use this resource via the Vision Studio. In the studio the services are grouped into 4 categories:

Optical Character Recognition
Spatial Analysis
Face
Image analysis

For the sake of this blog I will just go through ‘Optical Character Recognition’ which has just one service which allows us to ‘Extract Text From Images’ which can be used to get information from pictures of receipts or from the back of food packing etc.

Upon clicking this service, you will be taken to a window where you can test out what this service can do in a GUI style sales pitch, but if we really want to use it we can scroll down and get all the documentation and code we would want to kick off:

The quickstart guide gives you all the information you may want to start.

Speech Service

Another service I find extremely useful is the speech service. Within this service which at a high level are split into speech to text, text to speech and voice assistant. You access these resources through a studio as well, which gives you the same format as with the vision studio, where you can select the services you wish you use.

I have previously used the text to speech voice gallery for a client to give voiceover for promotional videos for their products. The voice gallery provides a no code solution to perform this work, giving a choose of voices to be used, with different emotions that be selected. If you go one further and use the ‘Audio Content Creation’ studio which can be found below in the bottom right of the voice gallery:

The Audio content creation studio lets you adapt how the voice sounds in regard to intonation, pitch, speed and how it uses pauses when interpreting you text.

Language Service

This service has tools which allows you to do natural language processing. A natural language being any language which has naturally evolved (English and Spanish for example). The service essentially works exclusively with the written word, and allows users to do things such as extracting key phrases, summarizing texts and translation which is accessible via the language studio. Additionally you can also do something called sentiment analysis and its spin off; opinion mining. These are two interesting terms and essentially they both are ways of summarizing the feeling of texts and what words are conveying these feelings. This is best summed up in the image below which I have taken from Microsoft documentation on the topic:

As you can see sentiment analysis gives you the feel of a whole sentence or block of text, while opinion mining is more granular and tell you the opinions of subjects within the text and the sentiment of those opinions. You will find these services in the ‘Classify Text’ section of the language studio where you can try out the services in a limited no code environment, but in production you would have to connect via code to the APIs.

Document Intelligence

Once again the services are held within a studio. The services here allow for the extraction of information from documents, with a large selection of pre-built models for common documents like invoices, receipts and credit cards.

If we try out the reciept model it has examples of recipets that you can run through the analysis, with it pictorially showing you what it’s picked up.

This is in addition to an output of the data gathered into the fields requested, with confidence values on the information matching the fields in both a cleaned output (directly below) and the raw results from the APIs (second image below).

Additonally, it also gives you the code ran to get these results:

Summary

It is clear that Azure Ai services has a lot of very useful tools which can be used to streamline, enhance and automate processes. At some times too many to choose from hence why I have decided to zero in on the 4 main services highlighted in this blogpost. In the coming months I will release a blog for each service I highlighted above where I will take a deep dive into what the services can do!