Write annotate bibliography for Information governance milestone project.
please write 6 peer review annotate bibliography for Information governance milestone project in Healthcare sector.
that can relevant to this topic
i. Program and technology recommendations, including:
1. Metrics
2. Data that matters to the executives in that industry, the roles for those executives, and some methods for getting this data into their hands.
3. Regulatory, security, and privacy compliance expectations for your company
4. Email and social media strategy
5. Cloud Computing strategy
The future of technological law: The machine state
James G.H. Griffin
∗
Law School, University of Exeter, Exeter, UK
Advances in technology will challenge and change the current manner in which legal
regulation occurs. It has always been possible to describe governance and law as a
form of technology in itself, but the growth of digital technologies provides a new
means by which to regulate the population. This article posits the theory that the
inherent characteristics of technology will become inherent within the digitisation of
law. As law becomes an increasingly digital entity, it will become more concerned
with perfect reproduction of law upon the person, and so more encompassing in its
scope. In addition, the increasing use of digital technologies in augmented reality, in
3D and 4D printing both in solid and biological matter, poses a fundamental change
in the regulatory relationship between the State and the individual – a challenge the
State will need to address.
Keywords: Susskind; Heidegger; Giddens; technology; copyright
If you really want to take a step back . . . I think what happens is that people in their industry
become very self-serving . . . instead of thinking what is good for the world . . . I think you
could become a bit myopic when you only deal with your own industry because you don’t
get a larger picture of what’s happening and so maybe if I really was to extend it out and
get a little more philosophical about it, I would say that as humans one of the things we
want to do is to take the ideas that are in my brain and put them in your brain; and so, we
have developed amazing things to do this.1
The development of technology leads inexorably to the development of a ‘machine state.’
Technology is responsible for enabling innovation, the industrial revolution, the Internet,
war, peace and the extermination machinery of Auschwitz. Within these events, the machi-
nations of technology, both unthinking and deliberate, are such that they cannot be ignored.
In this brief thesis, machination refers to the way in which a machine is capable of influen-
cing not just our thought, our conversation, our dialogue, our ‘being’ within society and the
relations between individuals themselves, but the manner in which individuals relate to the
regulating State and how the State relates to them. Technologies have always influenced us
in this way, but as technology develops, as we enter into a period of augmented reality, of
3D and 4D scanning and printing of both non-biological and biological matter, of DNA pro-
gramming, we are entering a period of hybridised reality and hybridised law. This is the
reality of thought through technology, of not just seeing the world through technology,
through Google Glass,2 but also a world where technology will increasingly become a
# 2014 Taylor & Francis
∗Email: j.g.h.griffin@exeter.ac.uk
International Review of Law, Computers & Technology, 2014
Vol. 28, No. 3, 299 – 315, http://dx.doi.org/10.1080/13600869.2014.932520
mailto:j.g.h.griffin@exeter.ac.uk
part of our inner being. It is a bio-synthesis, a machination of technology that will similarly
lead to a hybridised form of technology and law.
The regulation of tomorrow, if not today, is a technical machination. The scope of law
and the scope of technology are ever increasing, the machinations of the technology leading
to a future of regulation beyond anything yet envisaged: A ‘machine State’ both utopian and
dystopian. This short thesis will focus squarely upon the most divisive, pervasive and
important type of regulation, that of copyright law,3 and why the nature of technology
will shape copyright law and in turn how that shaping of copyright law will fundamentally
change the development of the State. To do so, this paper will follow the following struc-
ture: (1) a theoretical consideration of the assumptions made about technology in society;
(2) an empirical study into those assumptions; (3) the consequence of this for future calls for
legal reform; (4) the consequence of those calls for the development of society; and, lastly,
(5) a conclusion placing the thesis of this essay within context.
There is much literature available that considers the possible functions of technology, and
the degree to which it does or does not inform our beliefs and our desires.4 This paper
does not concern itself directly with that wider question, for the purpose of this paper is
to consider the machination of technology in the legal regulation of society. First, a
word is required on the meaning and purpose of ‘technology.’ It is used in the ‘historical’
Heideggerian sense, namely, to cover everything from traditional technologies such as
spades and hammers through to the technologies of leadership, i.e. bureaucracy (Heidegger
1954). The technology of regulation is not by any means new, but perhaps we should con-
sider more overtly the means by which the tools of that technology operate.
The tools of which technology is comprised have self-evidently played a critical role in
the development of society. It is tools around which prehistoric society developed – an
example is the technology of sticks, axes and so forth.5 It is a means by which culture is
created and recorded. The tools of technology form an integral part of society, to the
extent that over time the physical function can become forgotten and neglected. Similarly,
Heidegger talks of how a person using a hammer as a tool will become one with a tool
(Heidegger 1927).6 However, we can go further when considering this in relation to the
State. Landowners are not so likely to consider their physical land as a tool, i.e. of bureau-
cratic or State regulation. Likewise, with Schools are a physical tool causing a physical
change upon the developing mind and yet this is not the usual way to consider a School.
The School undeniably is a tool, a machinic assemblage (Deleuze and Guattari 1980)7
that leads to physical changes in the developing mind.8 The divergence shows how the
physical tool element in technocratic regulation has become side-lined. The dialogic and
discursive, intangible elements become the focus: the effect of those elements rather than
the physical nature of the tool themselves.
The crucial element in this discussion is not the well-rehearsed nature of being, but the
importance of the tool. We can strip concepts such as land ownership bare of its semantic
and legal connotations. The tool (once identified) is a critical core component. It does not
extend just to land and Schools. Capitalism is a complex physical tool, a representation of
objects. Using something as a unit of exchange may also be described as a technology – the
technology of money. Naturally, the technology of money is something that regulation has
encouraged. It has been encouraged in a way that permits free competition and enables the
individual to easily buy and sell. Most importantly, it is a technology that has allowed the
individual to interact with the law on a daily basis – for instance, through contracts
300 J.G.H. Griffin
governed by contract law. Indeed, the technology itself forms a multifaceted matrix, and
interfaces with the many other types of technology. The selling of objects is often dependent
upon the underlying technology involved in their manufacture. Moving beyond capitalism,
the body itself is even a physical tool – part of the ‘bio-power’, we might argue, of which
Foucault wrote (1976, 1978 – 1979). As with the notion of sexuality, which Foucault ident-
ified in The History of Sexuality (1976), we could suggest that there is within society a
desire to move away from the baseness of the idea of the tool. Indeed, to call someone a
‘tool’ is even considered as an insult in the West. Yet, this lack of openness and engagement
has led to a situation – in which we now find ourselves – of having made assumptions and
overbearing decisions about the nature of the world in which we live. We make assumptions
of the most basic things, yet those assumptions may have an enormous technological impact
in our daily life.
The historical interplay between the individual and technology is complex. Much has
been written about it. For instance, there are views about how technology has affected
the development of the human mind,9 the way in which technology affects society10, and
affects the way in which we see the world around us.11 However, what we tend not to
think about is how the relationship of law and technology affects the nature of legal regu-
lation.12 These assumptions are such that they can pose a considerable impact upon the
involvement of the people within society. Jon Bing, in whose memory and honour this
issue is published, was a great advocator of access to justice. Throughout his many articles
and works he was keen to consider the impact of technology upon both the public and legal
professions. His final article, for instance, focused upon the impact of technical regional
databases upon our understanding of global legal concepts. His approach is neatly
summed up in the following quote:
We need to utilise the advantages of global spread of legal information. We need to find pos-
sibilities of exploiting the advantage of other jurisdictions having legal material which may be
of interest. Current principles of using material across frontiers have been forged in a situation
where it has been difficult to exploit case law or legislative reviews from other countries. . . .
We should be guided by the vision of WorldLII, and look for knowledge-based solutions
that seek out and consolidate material upon request of the professional user. If this is realised,
we will see that the dynamics of the legal system itself, where a legal argument takes into con-
sideration prior decisions, may over time work itself into a more harmonised view as courts and
other institutions puzzle together not only the pieces of their national systems, but also try to
make them fit within a bigger, international picture. (Bing 2010)
The technology itself, the regulation of that technology, and most importantly, the gaps in
between, have a direct impact upon the way in which we interface with our law and our
regulators. Copyright regulation is increasingly important in this relationship because of
the way in which technology is being characterised as a technological work. Computer soft-
ware can be covered by copyright because it comprises human readable code (the source
code), works made using that software will also often be considered a copyrightable
work, and even the hardware can be. With the growth in technologies such as 3D printing
and scanning, objects can begin to be copyrightable under existing law in a way in which
they previously might not have been. The advent of 3D printing with DNA also poses a
challenge in that content containing DNA will, under current law, also be protectable
through copyright law so long as it meets the usual (low) subsistence requirements.13
The more digital technology will enable and regulate uses, the more the reach of copy-
right will spread, protecting the software that is interpreted as an underlying literary work.14
This increasing reach of copyright law, which is occurring through the increasing
International Review of Law, Computers & Technology 301
infiltration of digital technology in everyday life, is an event that could substantially change
the relationship of the individual with the State. It changes the dialogue, with the machina-
tions of the technology acting as an intermediary though which the State can extend the
scope of regulation, or be a new means for individuals to evade it. This is the new
machine State.
So, copyright protection is achieved through forms of technology and laws regulating
technologies, but copyright principles have been increasingly utilised in a technological
manner that can affect individuals’ ability to be able to interact and influence those laws.
The ability of the individual to be able to interface with technology will influence the
freedom with which they can be involved within dialogue – the dialogue between the tech-
nology and the State regulation of that technology. Hence the statement:
. . . when people try to do really innovative things, with business models around creative
content, it is difficult to automate it because no one understands the value of the use.15
Such value needs to be captured though technology and, symbiotically, through law – in the
manner that books were historically protected by the scarcity of the printing press alongside
legal protection. The ability of certain groups, particularly the creators of content, to be
engaged in the technological copyright dialogue has also come under the spotlight in the
minds of the publishers and distributors themselves:
So they [creators] kind of find themselves on the bottom as well – they have the recording but
don’t really know what the requirements are to be able to properly license the product.16
Quite often if you license from an artist they are not quite sure about whether or not they have
done a cover version of somebody else’s musical work, and then they can’t actually give you per-
mission in that they can give you permission in their recording [but not the original work] – they
don’t understand that really.17
It is not just judicial recognition,18 but also technical recognition, of these groups that
affects the manner in which the individual compare and access copyright law. As recog-
nition is affected by the arguments of parties in cases, it is important that the judiciary
and regulators are aware of these differing creative flows.19 For instance, it is clear from
the empirical research that was carried out during this study that there is a tendency of
many firms to favour the collection of information concerning viewing habits. To this
end they will favour and prefer the use of legislation that protects object identifiers and
law that is increasingly technological in nature. This in turn will affect the use of other
types of copyright legislation. Consequently, this is why the judiciary and regulators
need to be aware of different creative flows of different types of companies, because
their competing technical demands in terms of the types of law that they favour will
impact the ability of certain individuals to be able to be involved within the broader
social dialogue.
This paper is, in part, based around empirical research and was carried out between 2011
and 2013. Twenty interviews were carried out that were qualitative in nature, among
book publishers and music distributors. The majority of the interviews lasted for
around one hour in duration. The legal knowledge of these groups was considerable
and reflects the amount of legal interface that takes place on a daily basis – revealing
302 J.G.H. Griffin
far more legal knowledge than that likely to be held by the general public. This in itself
represents a challenge when considering the issues about access to the legal system for
individuals creating or reusing copyright works. What the interviews revealed was a
desire to shift towards using legal technology in digital form to regulate copyright
works, with a focus on that aspect to the detriment of consideration of the wider impact
upon society.
The firms that were interviewed varied considerably in size. Some were self-employed
individuals, others had several hundred employees. As a generalisation, those firms that
were small in size preferred the current operation of copyright law and technology to
those larger companies that were more open to change. The change favoured by large
companies is a shift away from existing copyright protection to focus upon the information
concerning the usage of copyright content – and it is here where legal protection currently
falls short. This is probably not what the reader would expect, and this paper will suggest
that this is due to underlying technological change.
In relation to the point concerning the assumptions made about technologies generally,
it became apparent that some right holders were primarily working within the existing fra-
mework and were not keen to operate outside of that framework. These were companies
that favoured the existing legal and technological structure:
I can’t say that I have had any big cases where things ended up on the internet. I think maybe
ten years ago that happened frequently but quite fast people learned that was not the way to
go.20
It is the interplay between the technology and the regulation that has led to this situation. It
is by no means the first time; specifically within the context of cultural works, we could cite
the historical regulation of the book trade, which was predicated around the technological
characteristics of the printing press. Outside of that context, we can go back to prehistory
and focus upon the technology of those times. The important aspect is how existing tech-
nology influences and informs the perceptions of certain key players as to the scope of
regulation.
I don’t know what to say, other than for me it [property rights in copyright] is a given.21
The existence of the technology provides a horizon (Habermas 1984, 1987) against which
the norm of use is predicated. The above quote was in reply to a question about the central-
ity of property to copyright regulation, and represented the usual response as to whether
copyright required property rights in order to function. Despite the lucidity of replies
when it came to discussing the details of case law, this was not so to the same extent
with the analysis of the property right itself. To recall, property is both a technical and
legal concept, and the technology, so to speak, behind the right is something that provides
the basis for the State – individual dialogue. Property as technology matches in key respects
property as law, at least so far as right-holders of copyright works are concerned. Each
mechanical reproduction of tangible or intangible objects can be perceived as being
owned, whether or not reproduced by that right holder.
The technology of property has allowed right holders to be able to exert their rights over
certain property in order to make financial returns. It is that link through to the technology
of money that has allowed the right holders to be able to produce and distribute works.
When the property technology and the money technology begin to fail to interface and
to engage with the public, a lack of returns ensues for the right holder:
International Review of Law, Computers & Technology 303
The record company is . . . if our bank manager was hearing . . . about this sort of conversation
that we probably are gonna have, he probably would withdraw all his support from my
over-draft because there is no . . . I can’t see any reason why any record company should be
existing anymore. We do because that’s what we do.22
This naturally has had some negative effects for certain right holders:
We never ever gonna see a great album again. You are never gonna hear another . . . a new L.P.
You never gonna hear a new band, no, so there is no justification for doing it [piracy]. So our
record label, we are earning from downloads, a sort of like seven or eight per-cent of what we
have lost on physical sells.23
And again going back to the music industry I’m sure and I can see it happening in published
books, is that people are becoming increasingly reluctant compose music, produce books if
there is no perceived value in it for them. I am not sure how you protect . . . how you
change that by making new laws. I think you have to simply strengthen the laws that there
are and to bring the internet and people, you know Google and so on to heel.24
The future for such right holders is one that reveals a decline in the ability to use the tra-
ditional technologies of legal property for distribution of copyright works due to a split
between that legal regulatory technology, the technology of the physical format, and the
public interfacing with them. In this particular example, the public shifted to other
formats, to other means of distribution, which will include piracy of the copyrighted pro-
prietary works. Importantly, the technology available has enabled this. Despite legal
cases such as MP3.com,25 Napster26 or Grokster,27 file sharing of copyright work continues
apace.28
The question this paper seeks to address is whether this disjunct between the technol-
ogies of distribution, the technologies of the law of the State and the technologies available
to the individual will remain. One historical parallel that has existed is that the lack of ‘bite’
that the law has had with technology has historically mirrored the ability of the technology
to be able to avoid regulation. Difficulties in making copies was mirrored in law by a lack of
regulation – after all, why would there be a need for regulation if the technology did not
enable the reproduction to take place? The difference initially arose with the Internet in
that reproduction could take place on a large scale – of which, of course, much has been
written (inter alia; Benkler 2006; Boyle 1997; Lessig 1999; Vaidhyanathan 2001).
However, the underlying technology is changing and once again the alignment of the
technology with the legal rules is beginning to take place.
The start of the alignment between technology and legal rules concerns State management
and surveillance of the Internet. China was one of the countries that originally focused upon
the issue of surveillance using the Golden Shield system, which predicted the actions of the
populace through IP monitoring (Feir 1997). Of course, since then, the systems used for
surveillance have revealed a much broader approach to the relation with State regulation.29
With that increasing breadth comes three impacts upon the general relationship between the
technologies of regulation and the technologies of distribution. First, the technologies con-
cerned are increasingly digital in nature and thus can more easily interface with each other.
Second, the technologies involved in distribution are being more implicated in the everyday
activities of individuals moving away from just the central core of copyright. Third, the
technologies of surveillance enable the technologies of prediction. These factors combined
304 J.G.H. Griffin
mean an increasing parallel between the technology of State regulation and the digital tech-
nologies used in the distribution of works. The parallel is in part a consequence between the
surveillance of the State and the desired surveillance of the customer of copyright works,
and the capitalist-compliant underpinning of the Internet. In the same way that the Internet
as an open system is beneficial for companies who wish to observe the actions of users, so
the openness of the Internet is an advantage to States who wish to observe what the users are
doing. Much initial technological development went into the systems of surveillance that
are now likely to feed into the systems of observation for commercial advantage:
They created essentially . . . part of what I was describing they are trying to put all the infor-
mation into a single database, they have already done this. Do you know that company at
all? It’s called Decibel, they are based here. It was started by an American because he was
working with both the City of New York with Mayor Giuliani and the FBI and this was
around the 9/11 period, essentially what they are saying was ‘how does this shit happen,
like the first bombing of the world Trade Centre, how come we didn’t know that because
we should have known this.’ If this guy lives here and he is working with that guy and we
know that guy knows the other one in some country in the middle east, how come that we
can’t put that simple relationship together, if A knows B and B knows C. So he built a very
sophisticated database so that they could start to make the connections in terrorist networks
. . . to put all the people together we know this is a known terrorist, who are his friends and
business associates lets load them in a data base and they are related to another group of
people and all of a certain you see relationships . . . so he built this. He then took this and he
said, because he was a big jazz ‘aficionado’ with a massive collection, he said I should do
this for . . . [his] . . . own collection of music.30
The technology of regulation has therefore been able to maintain step with the technological
developments. Of course, in recent history, arguments of right holders have focused on
how the law has not kept pace with technological change (Lessig 1999; Vaidhyanathan
2001) – however, as can be observed, what has instead happened is that regulation of
digital technologies has been achieved through greater use of digital technologies by the
law enforcement agencies. In turn, it is this use of technology by the enforcement agencies
that is leading to similar methods being employed by right holders to monitor users, and to
consequently then lobby for change which is more technologically centric, and more onto-
logically correct, for technical regulation.
This marks a step change away from some of the traditional views about the nature of law
and the relationship of it with the broader populace. The judiciary have been perceived as a
key means through which the population perceive the law as current, by the judicial appli-
cation of it. This is the notion of ‘living law’ (Brandeis 1915 – 16; Ehrlich 1936).
However, what is happening today is not the judiciary keeping the law ‘current’ – instead,
it is the action of public and private enforcement agencies. The law has remained steadfastly
analogue in a digital age. The digital creation and application of law has been achieved in the
field of enforcement. This creates a substantive law deficit in the digital realm.
The substantive law deficit is one that occurs on two levels. First, there is the issue that
the vagueness of analogue rules causes uncertainties alien to a digital world. Digital tech-
nologies have traceable content, a detectable stream of information, which is not acknowl-
edged under our analogue laws but is through their application by enforcement. This is
leading to some changes – for instance, in the UK private copying is (finally) becoming
legalised31 – but the larger issues of infringement remain, namely, the uncertainties
caused through the current qualitative tests of the taking of a substantial part and derivation.
When enforcement takes place, this is enforcement of a vague law, whose vagueness is the
antithesis of digital technology. For example, it may be unclear whether a character in a
International Review of Law, Computers & Technology 305
novel has copyright but an enforcement agency may decide the character does and begin
infringement proceedings. This creates a noticeable deficit between the law and the enfor-
cement of law, made more noticeable in the online context where, for instance, fan fiction
might not be infringing and yet copyright enforcement agencies may decide that it is. Such
works being online means that they are more identifiable as targets of prosecution threats –
and there are no legal provisions to deal with this.32
The second, related, deficit relates to the difference of scope between the digital and the
analogue. Digital works of any sort are typically accessed, and this has led to an increased
emphasis upon the licensing of works (Efroni 2011; Rifkin 2000). Furthermore, conver-
gence and network effects are at play within digital technologies, which means that those
technologies are constantly multiplying and extending their scope. A method of enforcing
law and order generally is likely to spread to other areas, as has indeed happened with State
surveillance methods spreading to more general Internet usage. Likewise, the notion of
licensing of works would have begun with the need to access data on a computer to be
able to use it, and so licensing arose as the technically most appropriate means of regulation
rather than the traditional proprietary sale. Indeed, this has led to calls for reform based
around the notion of licensing:
Very specifically, yes, for me for copyright law I would do very specific things.
One, I would give unlimited access to citizens and compensate rights-holders with some sort of
a levy or tax scheme and I wouldn’t restrict it to music, it should be for everything, because you
can’t do everything at once although later you would be able to stack things and obviously you
can listen to music and read a book at the same time but essentially it’s 24 hours in a day – it’s
not hard to cover it up we can see what they are doing . . . and instead of looking at it based on
file size or anything else, it’s just about time you spent with something which gives you a pro
rata share of that kind of license fee or whatever.
Second, I think on top of that create a scheme where people then build businesses so that it’s not
the end of the monetisation.
Third, I think copyright should be modified so that the term is actually reduced for the exclu-
sivity period and then there is a longer tail of, OK you can just collect some money but you no
longer have a real say of the destiny of your work . . . or what people can do with it . . . how they
can chop it up you get a fee for it . . . you don’t own it for the next 50 years plus now.33
The calls for these reforms stem directly from the deficits between the State, the technology,
and the public. These deficits have been characteristic of the disjuncture between traditional
law and the recent developments in technology. It has primarily been an issue of enforce-
ment but now the means to enforce the law have caught up. What does this imply about the
current and future direction of the law? If we cast our minds back to the mid-1990s, there
was much debate about the introduction of laws for ‘Digital Rights Management’ mechan-
isms, or to give them another name, technological protection measures.34 Even more debate
ensued as to how these laws would be enforced. In the event, whilst many bottles of ink
were spilled discussing how these laws could (and indeed, did) extend beyond the tra-
ditional boundaries of copyright law (inter alia Boyle 1997; Lessig 1999, 2001, 2004;
Reese 2002 – 03; Vaidhyanathan 2001), it remained the case that the mechanisms were cir-
cumvented, that (illegal) copies were distributed, and that the right holders had to turn to
other means, to secondary liability,35 authorisation rights,36 filtering37 and ISP legislation38
to attempt to make it difficult for the works to then be distributed. These were attempts by
analogue laws to regulate what the digital technology could and could not do, and it is best
characterised as a reasonably futile attempt to make the general public adhere to copyright
rules.
306 J.G.H. Griffin
Recent State surveillance techniques mark a shift in regulation in that the means of
surveillance are not analogue but exist embedded within the digital realm. It is, in effect,
a non-democratic enforcement of digital law; a non-democratic detailing of the analogue
domestic law. It is a growth of analogue law that because of network effects, because of
the interconnected nature of digital technologies, has become deeply enshrined within
the connected technologies. It is an effect that we can also expect to see extend into
other laws. We can start to observe it within the field of copyright law where, as mentioned
above, the surveillance tools are being deployed to predict and guess what users will want to
see, to know what their habits are, to know what they think before they themselves know
what to think – and the calls in turn for this to be further protected by digital law, digital
enforcement.
The current legal rules that may protect such measures were passed in the same statutes
as the DRM laws – notably the US Digital Millennium Copyright Act 199839 and the EU’s
Copyright Directive 200140 – and protect what is known as ‘Copyright Management
Information’ (CMI).41 We need to first of all to note that the notion of ‘copyright infor-
mation’ is a bit of a misnomer in that any work involving something copyright related
will likely qualify for protection.42 The names of authors, what can and cannot be
copied, can be stored in this information, as can be more ‘active’ information, for instance
digital watermarks and other forms of tracking technologies.43 However, the legislation
concerned remains analogue law, in that they do not directly interface with the original
digital content. Indeed, it was many years subsequent to the introduction of the specific pro-
visions that they appeared to have any applicability, and they have been the ugly step-sister
to the main set of DRM provisions with which they were enacted.
This situation has begun to change, with an increasing emphasis upon the digital nature
that regulation needs to hold if it is to be effective. It needs, in essence, to be digital
regulation – not an analogue law about the subject matter of digital content, but an actual
interfacing digital law, that is to say, a truly digital law. The proposed EU copyright code
(at the time of writing in 2014) is significant in this regard – the European Commission
consultation paper refers to ‘identifiers’44 which may initially sound redolent of the CMI pro-
visions, but there is a distinction – these identifiers are designed to interface directly with
the digital content. With the CMI provisions there was more of an analogue gap between
the law and the implementation through code, but the identifiers mentioned in the European
Commission paper are quite evidently moving towards a direct digital interface – to quote,
‘to create a linked platform, enabling automated licensing across different sectors.’45
Whilst the formal machineries of Government are working towards an understanding of
the interface with code, private industry is making greater headways into areas beyond State
surveillance. As noted earlier, the open structure of the Internet is such that it mirrors
commercial interests in surveillance. Much of the State surveillance technology was pro-
duced by private companies and it is natural that this has now been utilised by other
private companies to assess the actions of users in establishing, for instance, what sort of
products a consumer may wish to purchase – which is how Netflix operates:46
Web 3.0 is where technology starts to work for us instead of the other way around . . . so instead
of me logging into Spotify to figure out what I want. It’s already created stuff for me, but again
all that’s underpinned by information and the lack of information in this industry is stifling. We
can’t get to a 3.0 world unless we have basic information.47
It is that future, that of being able to assess what users want before they know it, which will
provide the source of financial revenue rather than the sale of copyright content. It is as
International Review of Law, Computers & Technology 307
important as the act of licensing itself. The information about the use of content needs to be
protected, yet there is as of yet no direct means by which all of this information itself can be
protected through legal regulation:
Going back to that iTunes audit that we had . . . their number one asset . . . apart from
obviously their eco system of selling i-products to the world . . . the number one asset for
iTunes is the database, so they jealously guard it and they have very draconian rules on
how the metadata should be presented.48
This could lead – and is likely to lead – to calls for direct protection of this sort of infor-
mation, but care is required for the network effects of the technology are such that there
could be consequences upon future uses. For instance, the existence of tracking technology
in observing how users utilise content49 could extend to cover the use of content in the real
world – especially with the rise in demand of devices such as Google Glass.50 If that sort of
device can recognise real-world objects – which it surely will in time – then it is not such a
jump to realise that licensing of real-world products may be required to view them, that real-
world objects will be altered at the viewpoint of the user – to enhance the object, or to
reduce the usability of the object. The world is likely to become increasingly fluid in its
relationship between the real world and the virtual world,51 and this fluidity is important
if private enforcement of the virtual world is not to trample all over the real world.
Already this overlap has become apparent with the programmes of State surveillance,
but if our everyday activity is affected then, quite clearly, there is a need for direct regulat-
ory intervention to protect the interests of the user.
5. The future – symbiosis
The transposition of State surveillance technologies by private companies further into the
realm of everyday activities reveals an increasing technocratisation not just of our relation-
ship with the State, or even with private companies, but also between ourselves. The tech-
nology of State regulation, of the distributor, and of the individual is moving ever closer
together. The implications of this are way beyond anything we have experienced so far,
and probably beyond our current imagination. The digital future is starting here, the
future in which the technology around us directly interfaces with us. Bio-power, the
brain child of Foucault (1976, 1978 – 1979), is about to take a radical turn, a turn where
we can begin to identify a powerful symbiosis between the individual and the technology
of the State. Google Glass is one such example, where the glasses are a digital interface
between us and the world, but other examples abound. For instance, 3D printing could
lead to a situation where the same content identifiers are used to influence the manner in
which objects can be 3D ‘scanned’ and then reproduced with a printer (Ernesto 2013; Whit-
warn 2012). They could even be used where content is printed biologically, i.e. using what
is known as 4d printing where the printing material also prints itself52 – which could be
done with man-made materials but also biological material using programmed DNA.
This is the future symbiosis of which we need to be aware, if we are to make informed
choices for the future technology of regulation and the future of biosynthesis.53 If regulation
is likely to become symbiotic with the human body, then we should also plan for that as we
plan for the near future.
We can observe the basis for the ever-increasing technological symbiosis in the prehis-
tory of man. The tools, the use of them, the development of State and the regulation of the
State, all is down to the use of technologies, just as Heidegger identified in his hammer
308 J.G.H. Griffin
example where the tool and the man enter a form of symbiosis (Heidegger 1927).54
Technology is the basis and the means by which individuals have come to interact with
one another, from the technology of speech to the technology of the Internet; from the
technology of peace to the technology of war. War machines even bring together human
minds,55 eliminating difference – the unending, ceaseless, sometimes unedifying and
ugly truth of technology. The unavoidable future is an ever greater union between man
and machine, between the technology machines of the State and of the people. Our archi-
tectures, our ways of being, will become increasingly virtual and biological, and thus it is
necessary for an informed future of regulation to be ontologically correct, for the law to be
able to engage with this way of being: to engage with the technology of and within the body,
and to be aware of its thinking and unthinking machinations.
The technology affects the way in which individuals communicate and rationalise with
each other, and with the way in which the State itself has developed. State regulation has
often been invasive in some form or other, be that through principles relating to censorship,
copyright, privacy – the technical State of digital technocracy. The rise first of all of CMI,
then of Identifiers, and then more invasive and involving direct digital regulation, indicates
a phase shift towards the observation of the individual and the actions of that individual.
Furthermore, the technological feed of the future, web 3.0, will provide users with an indi-
vidualised Internet, but in all probability through surveillance algorithms, which will lead
the user in particular directions. The debate as to the rights and wrongs in terms of driving
individuals towards them ‘thinking’ what they want has been well covered by Adorno and
Horkheimer (1947), and their discussants. Future regulation should be aware of the contin-
ued convergence of technologies in addition to the subject matter. The architecture of
control will become increasingly important.
Assumptions have been common in the development of the State – from the assumptions
made about the importance of the technologies, through to the assumptions made about
the necessity of the technologies of capitalism and economy. These assumptions are insi-
dious, in that they can lead to failings in dialogue within society between various groups
(Derrida 1988). Unger (1976), a proponent of group pluralism, identified the importance
of groups to the development of a society. Technology has played a critical role in the
formation of these groups, in the ways in which they interact. Likewise, Foucault, in The
Order of Things (1970, Chapter One esp.), is also indirectly referencing back to the impor-
tance of technologies within the individual’s perception of the world and of history – even
if the perception may not be realised or side-lined by more traditional interpretations. The
assumptions that are made about technology have a direct impact in the initial and
ongoing formation of groups. Bing discussed how the gathering of information could
distort our understanding of our immediate world view. It can also influence us as to the for-
mation of future groups that are the cause and reason for the technology. The structure of the
society, as created by the technology, is the consequence of earlier groups and so future struc-
tures are a combination of these groups. There is an analogy with Gidden’s view of structura-
tion, with the groups’ own dynamics and social life causing change to the structure of society
and vice versa (Giddens 1984). The machination of the technology provides another process
to consider, perhaps even the most central for it is upon technologies that all societies depend.
So, whilst technology has played a critical function in the development of group plur-
ality, technology in future may also be said to play a critical role. It is likely to form the basis
of the development of further group pluralities, because law will form new pluralities
International Review of Law, Computers & Technology 309
between groups if it takes a directly interventionist form. The closer the integration, the
greater we can assume the influence of the law – particularly in relation to digital
factors, where the more certain flows of digital technologies and more certain flows of
law will influence the forming and evolution of pluralist groups. For example, if the law
becomes increasingly ‘coded’ in form and meaning, then the language of that code, the
architecture of the code,56 the dialogue of the code, will form and influence future
debate and future groups. If a dialogue evolves in a language and form technically alien
or opposed to another language or form, then the technology itself, with its network
effects, will lead to clashes and collisions. We see this in the history of clashes between civi-
lisations in different stages of technical development; those tribes who were obliterated by
Western civilisations. We also see it in the current development of coding groups, not just
computer coding groups but groups that define their own coda, their own values and judge-
ments, arising phoenix-like from the ashes of the failed technologies of others: competing
clans of computer uses (Amiga v Atari, Apple v PC), competing re-users of content, users v
creators, creators v publishers, so the codas of groups will clash. The complex technological
interrelations are key. An example of how subtle dialogue change can take place is with
regard to the ways in which re-users alter existing content. Re-users may wish to edit or
make changes to software, but if that software is protected with a DRM or TPM mechanism,
then in many circumstances the re-user will need to know how to either circumvent the
technical mechanism or use other software to do so, which still requires some technical
skill. A technical meritocracy emerges – perhaps the default position of any technical
order. The structure of DRM and TPM is likewise influenced by legal provisions. Taking
this further, consider how certain groups in society may be unable to interact with
technology on an equal footing, due to economic or other social issues (e.g. denying
access to hardware necessary to perform certain actions); likewise, consider the situation
of Google Glass, where it is possible that certain levels of hardware will need to be used
by individuals in certain physical locations in order to be able to fully interact with a hybri-
dised reality.
In essence, the interflow of social groupings and technology could be characterised as
losing its subtlety. In early human history the technologies would act as an initiator, an
enabler, of societal groupings, e.g. the stone axes and other such technical tools. If bureauc-
racy is being characterised as technology (as it was by Heidegger 1954), bureaucracy marks
a gradual shift in that it begins to classify certain acts, certain aspects, groups and layers in
society as less desirable, shifting the development of society in certain nuanced ways.
Limits to regulation exist, for instance in the extent to which it can interface with the
human mind, to its enforceability in general. However, technology that utilises the forms
of enforcement mechanisms of the sort possible with widespread surveillance can be far
more invasive both in terms of the precision and detail of its regulation using that technol-
ogy, in terms of its invasiveness in the actual perception of the world by individuals,57 and
in terms of the creation of new technical zones (Dyer 2012) of the physical world.58
Bronowski (1973) had suggested in his work looking at science and society that complete
control in the scientific way over life was only possible in a ‘push button order,’59 namely to
destroy life completely. However, what we can see is that a form of push-button order is
possible through the nuanced interventions of digital regulation – a complex world of 0s
and 1s, or a complex push-button order. This is not to say that digital technology is inher-
ently nefarious, but that it is a characteristic of the technology. Technology itself is invari-
ably about control, internal and external, and so if a technology develops that inherently
enables complex communications with individuals, then it can inherently lead to more
control within that field.
310 J.G.H. Griffin
The current regulatory system with the emphasis upon surveillance has thus far one sig-
nificant issue, in that the surveillance system and its direct interface with the technologies of
the populace is one that, although it will influence the actions of the populace, is not regu-
lation that has stemmed from the traditional sources of law, namely the democratic insti-
tutions. As argued earlier, the democratic bodies responsible for passing laws have passed
laws that are analogue in nature, in that the technologies of such laws themselves do not
directly interface with the actions of individuals. The surveillance system can directly
enter into dialogue, instruction, with code used by individuals – and because of the increas-
ing symbiosis between this technology and the individual, there exists the risk of non-demo-
cratic control of individual thought. Hence, a starting point for any future regulation would be
some consideration of whether such a situation of direct interface with code is desirable,
whether it is something that should be legislated against, or whether there should be a
direct attempt by legislators to directly regulate through code. With the growth of technol-
ogies such as Google Glass, 3D and 4D printing technologies, and biotech printing, it is argu-
able that ontologically speaking there is a need for the State to engage if it is to remain
relevant, to appear rational, in the way in which it regulates its people.
If that is to be so, then the manner in which the State interfaces with digital technologies
will need to be considered. Already we can see the gradual establishment of such an approach
within fields such as Internet Governance, Pornography and Copyright Licensing. Within
each field bodies either exist or will exist – e.g. inter alia ICANN,60 the Internet Watch Foun-
dation,61 and the proposed Copyright Hub.62 Direct regulation therefore exists in the form of
needing to register new domain names, of the possibility of that registration being revoked
due to the content contained on the computer hardware, or the deletion of content or licensing
links for, e.g., the failure to pay licensing fees. However, the exact nature of the intervention,
the form of that physical technological intervention, is not the sort of issue that tends to
receive public debate but it should be – for it is a form of dialogue that can influence
other group interactions. Ideally, then, there should be a means by which to consider the
digital language of a legislation code – that is, not just the legal code, but also the directly
interfacing code, be that software or hardware based.
The technological spider web of regulation thus has multiple and multifaceted impacts
upon the development of society and of the State itself. Technology today, let alone tomor-
row, is already in essence within one large techno-biosphere, with increasing convergence
and homogeneity among the various component elements. Inherent convergence, and the
inherent network effects, of technologies is such that the digital geography of the space
of the State will become of ever-increasing importance. Much has been written of digital
architectures, and the architectures of the real world, as expressed in code (e.g. Lessig
1999). However, we need to step beyond this, to consider how the State will directly inter-
face with code and how that code will interface with each and every human being. The
techno-biosphere is all encompassing, and an all-embracing representation of the dialogue
of human society.
However, the issue of network effects poses a significant challenge to the rationality of
the State through the eyes of the populace. The machination of the technology is simple – it
is a means by which reproduction becomes possible, not just in the production of things but
in the technology itself. It is a perfect replicator, which stands in direct contrast to the human
being, existing as we do as a result of accidental mutations, of imperfect creation and imper-
fect death. Technology exists akin to a hive colony, its beauty is in perfect reproduction,
perfect harmonisation. In contrast, the beauty of the human mind is in its difference, in
its ability to create and make from a harmonious blank canvas. Technology may gradually
undermine these differences, these mutations, for they do not represent the perfect core of
International Review of Law, Computers & Technology 311
technology, of infinite perfect reproduction. If we think of the future, of the possibility of
printing in DNA strands, of rDNA, and the ability to print life from technology in the
same way as we procreate through our imperfect acts of human reproduction, what will
this pose for the future of the human race? Technology and its perfected-ness does not
need to value our creative values, for just the purity of reproduction is valuable, and there-
fore we are introducing a system of regulation into the human life which, if left unchecked,
will provoke a clash of values, between that of perfect and imperfect reproduction. This is
the ultimate machination of technology, of the machine State. It is the ultimate realisation of
the science – society debate, but one which realises that technology does imply a degree of
determinism, not through its use or even its structure, but because of its existence as a means
of enabling perfect reproduction. Google Glass, all these technologies through which we
will interface with the world, through which we may be ultimately reproduced in the
virtual world, will become a means by which the virtual world will ultimately affect our
own real world. The perfection of reproduction, the innate requirement of sameness, will
come to inflict itself upon us if there is no realisation by the analogue State of the conse-
quences of digital control. In 1973 Bronowski, when he discussed the ‘push-button
order’,63 described a world of blunt technologies, of where 1 and 0 related to the nature
of the technology vis-à-vis the existence of a human life, but today that technology is
more nuanced, more able to interface with the thoughts of a human and the inputs to the
human brains. Furthermore, what of the future, of 3D printing of biological matter with
DNA, of computers built with DNA rather than binary code, which may or may not of
themselves value change due to their DNA makeup? It is these challenges, both current
and potential, and the machinations of the technology, that the State needs to address. To
quote a music distributor on the relationship of technology and the person:
Starting with the lyrics several hundred thousand years ago, it . . . [was not realised by earlier]
. . . humans that what makes homo-sapiens specialist is our ability to do complex language
through voice, [being] able to get my ideas or more complex ideas from my brain to your
brain . . . as we moved from writing, to the printing press, into things like recording devices
. . . creating new forms of getting things from my brain to yours . . . More creative ways of
saying what’s in my brain instead of merely facts . . . we moved to higher levels.64
This research is in part based around qualitative empirical interviews that were funded by BILETA,
and a paper given at the 2013 BILETA conference held at the University of Liverpool. The BILETA
funding was for the project ‘Property in Copyright’. My thanks to all those who contributed comments
and thoughts on the paper and its underlying thesis, in particular the two anonymous reviewers and the
editor of this journal.
1. Quote from research interview. The interviews were conducted on an anonymous basis, as
explained within the body text under Section 3.
2. See Google Glass documentation at http://www.google.co.uk/glass/start/ and see http://
googleglassforum.net/. For an example of this technology, see Matthews, http://www.geek.
com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
3. For reasons explained in Section 2, it is the argument of this paper that copyright law will
become the most important law of the future, pre-empting most other laws.
4. For further discussion see below, this section, paragraph 4.
5. See inter alia Watson (2005), Borstein (1992), Shumaker, Walkup and Beck (2011).
6. Heidegger (1927) Note SUNY edition trans. J Stambaugh (2010) at 100 – 101.
312 J.G.H. Griffin
http://www.google.co.uk/glass/start/
http://googleglassforum.net/
http://googleglassforum.net/
http://www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
http://www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
7. Deleuze and Guattari (1980) at 4 and at 504 – 508. The Rhizome analysis in Chapter 1 could
apply here.
8. This analysis is similar to the notion of the visible and the articulable in Foucault (1966).
9. Meant primarily In the Heideggerian sense that technology is an outcome of our technological
view of the world – Heidegger (1954) but one could also focus upon the use of tools more gen-
erally within animal species, e.g. inter alia Shumaker, Walkup and Beck, Animal Tool Behav-
ior, supra n.5.
10. Steigler, Technics and Time 1: The Fault of Epimetheus (1998), B Latour, Reassembling the
Social: An Introduction to Actor-Network-Theory (2005).
11. Ihde (1990), i.e. embodiment relations (quite literally too – he discusses glasses at 73 and 94).
12. Most works cited above skirt around the issue of legal regulation. An exception would be Sus-
skind (1996).
13. For a comprehensive overview see Caddick, Davies, and Harbottle (1998), Chapter 3.
14. Caddick, Davies, and Harbottle (eds), ‘Copinger and Skone James on Copyright’ ibid., at §3 – 30.
15. Empirical interview, see note 1.
16. Ibid.
17. Ibid.
18. A reference to the rule of recognition – Hart, Concept of Law (1961).
19. See a parallel discussion on information flows, Elkin- Koren (1996), N Elkin- Koren (2002).
20. Empirical interview, see note 1.
21. Ibid.
22. Ibid.
23. Ibid.
24. Ibid.
25. UMG Recordings, Inc. v. MP3.com, Inc., 92 F. Supp. 2d 349 (S.D.N.Y. 2000)
26. A&M Records, Inc. v. Napster, Inc., 239 F.3d 1004 (CA 9, 2001)
27. MGM Studios, Inc. v. Grokster, Ltd., 545 U.S. 913 (US Supreme Court, 2005)
28. See inter alia Lunney (2014), Lessig (2004, 67).
29. Consider the revelations of Snowden (Guardian 2013) and Manning (Wikileaks 2010).
30. Empirical interview, see note 1.
31. See http://www.ipo.gov.uk/types/hargreaves/hargreaves-copyright/hargreaves-copyrighttech
review.htm for recent developments.
32. See Griffin and Nair (2013).
33. Empirical interview, see note 1.
34. In particular see Litman (2001), Lessig (1999), Vaidhyanthan (2001).
35. In the UK, under s.24-s.27 CDPA 1988, in the US, under the vicarious and contributory liability
doctrines.
36. In the UK, s.16 CDPA 1988, in the US, under 17 USC §106.
37. Consider in the EU, C-70/10 Scarlet v SABAM [2011] ECR I-11959 and in the US, MGM v
Grokster 518 F.Supp.2d 1197 (CD Cal, 2007).
38. In the UK, the Digital Economy Act 2010, and in the US, there is the Copyright Alert System of
the Center for Copyright Information – see http://www.copyrightinformation.org/the-
copyright-alert-system/
39. ‘The Digital Millennium Copyright Act’, Pub. L. 105 – 304, 28 October 1998, 112 Stat. 2860.
40. Directive 2001/29 of the European Parliament and of the Council of 22 May 2001 on the har-
monisation of certain aspects of copyright and related rights in the information society, OJ L
167/10.
41. See 17 USC §1202, Art 7 EUCD.
42. The CMI provisions refer to ‘copyright works’, which in practice will be the whole copyright
work as distributed rather than broken down to each copyright element. There is, however, no
concrete authority on the point.
43. For a discussion of those types of watermark see Ferrill and Moyer (2002), Jones, (1999) and
Page (1998).
44. European Commission (2013) at 15.
45. Ibid.
46. A Madrigal, How Netflix reverse engineered Hollywood, The Atlantic, available at http://www.
theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/;
Z Bulygo, ‘How Netflix Uses Analytics To Select Movies, Create Content, and Make
International Review of Law, Computers & Technology 313
http://www.ipo.gov.uk/types/hargreaves/hargreaves-copyright/hargreaves-copyrighttechreview.htm
http://www.ipo.gov.uk/types/hargreaves/hargreaves-copyright/hargreaves-copyrighttechreview.htm
http://www.copyrightinformation.org/the-copyright-alert-system/
http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
Multimillion Dollar Decisions’ KISSmetrics blog, available at http://blog.kissmetrics.com/how-
netflix-uses-analytics/, A Leonard, How Netflix is turning viewers into puppets, Salon, available
at http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/
47. Empirical interview, see note 1.
48. Ibid.
49. For example, see http://www.fastcodesign.com/3025318/asides/eyetracking-study-reveals-
what-people-actually-look-at-when-shopping-online
50. See Google Glass documentation at http://www.google.co.uk/glass/start/ and see http://
googleglassforum.net/. For an example of this technology, see Matthews, http://www.geek.
com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
51. Consider virtual fluid architecture – Novak (1992).
52. See for instance Skylar Tibbits TED lecture of how this works – TED Lectures (2013) see
https://www.youtube.com/watch?v=0gMCZFHv9v8 esp. at 4”10’ for a demonstration.
53. Cf Fukuyama, 2002, focusing on changes to the human body.
54. Supra n. 6.
55. Chiefly a reference to Deleuze and Guattari (1980) supra n 7 Chapter 12. Consider also
Nietzsche (1882) at §109, §110.
56. Recall Lessig (1999) in particular the appendix chapter concerning architecture.
57. Both in terms of augmented technologies and the underlying physical changes caused in the
body (Fukuyama, 2002).
58. Consider, for example, Google Glass being the only means by which to access augmented
reality screens placed around city centres, access dependent on monthly subscription fee.
Note the reference to Dyer in the main text is a work about the Tarkovsky film Stamlfr
(Stalker 1979).
59. Bronowski, (1973) at 374.
60. The Internet Corporation for Assigned Names and Numbers – see www.icann.org
61. The Internet Watch Foundation – see https://www.iwf.org.uk/
62. See the proposals here: http://www.ipo.gov.uk/hargreaves-copyright-dce
63. Bronowski, The Ascent of Man supra n.59.
64. Empirical interview, see note 1.
Adorno, Theodor and Horkheimer, Max. (1974). Dialectic of Entitlement (1947) trans. J Cumming
(1972).
Benkler, Yochai. 2006. Wealth of Networks.
Bing, Jon. 2010. Let There Be Lite: A Brief History of Legal Information Retrieval EJLT 1 (1).
Borstein, Daniel. 1992. The Creators.
Boyle, James. 1997. Shamans, Software and Spleens.
Brandeis, Loius. 1915 – 16. The Living Law 10 Illinois Law Review 461.
Bronowski, Jacob. 1973. The Ascent of Man.
Bulygo, Zach. “How Netflix Uses Analytics To Select Movies, Create Content, and Make
Multimillion Dollar Decisions.” KISSmetrics blog, Available at http://blog.kissmetrics.com/
how-netflix-uses-analytics/
Caddick, Nicholas, Davies, Gillian and Harbottle, Gwilym (eds). (2010). Copinger and Skone James
on Copyright. 16th edition.
Deleuze, Gilles and Guattari, Félix. 1980. A Thousand Plateaus, trans by Massumi, Brian (1987).
Derrida, Jacques. 1988. Limited Inc.
Dyer, Geoff. 2012. Zona.
Ehrlich, Eugen. 1936. Fundamental Principles of the Sociology of Law.
Efroni, Zohar. 2011. Access-Right.
Elkin-Koren, Nina. 1996. “Cyberlaw and Social Change: A Democratic Approach to Copyright Law
in Cyberspace.” Cardozo Arts and Entertainment Law Journal 14: 215.
Elkin-Koren, Nina. 2002. “The Rule of the Law and the Rule of the Code.” In The Commodification of
Information, edited by N. Elkin-Koren and N. Netanel. The Hague: Kluwer.
Ernesto. 2013. 3d Printing Aims to Stop Next-gen Pirates, Torentfreak, at https://torrentfreak.com/3d-
printing-drm-aims-to-stop-next-gen-pirates-130827/
314 J.G.H. Griffin
http://blog.kissmetrics.com/how-netflix-uses-analytics/
http://blog.kissmetrics.com/how-netflix-uses-analytics/
http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/
http://www.fastcodesign.com/3025318/asides/eyetracking-study-reveals-what-people-actually-look-at-when-shopping-online
http://www.fastcodesign.com/3025318/asides/eyetracking-study-reveals-what-people-actually-look-at-when-shopping-online
http://www.google.co.uk/glass/start/
http://googleglassforum.net/
http://googleglassforum.net/
http://www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
http://www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
https://www.youtube.com/watch?v=0gMCZFHv9v8 esp
http://www.icann.org
https://www.iwf.org.uk/
http://www.ipo.gov.uk/hargreaves-copyright-dce
http://blog.kissmetrics.com/how-netflix-uses-analytics/
http://blog.kissmetrics.com/how-netflix-uses-analytics/
https://torrentfreak.com/3d-printing-drm-aims-to-stop-next-gen-pirates-130827/
https://torrentfreak.com/3d-printing-drm-aims-to-stop-next-gen-pirates-130827/
European Commission. Public Consultation on the review of the EU Copyright rules. (2013) http://ec.
europa.eu/internal_market/consultations/2013/copyright-rules/docs/consultation-document_en
Feir, Scott. 1997. “Regulations Restricting Internet Access: Attempted Repair of Rupture in China’s
Great Wall Restraining the Free Exchange of Ideas.” Pacific Rim Law and Policy Journal 6: 361.
Ferrill, Elizabeth and Moyer, Matthew. 2002. “A Survey of Digital Watermarking”, at Elizabeth.ferill.com/
papers/watermarking last accessed in 2002 – no longer available.
Foucault, Michel. The Order of Things, trans Routledge (1970).
Foucault, Michel. 1976. The History of Sexuality, trans Random House (1978).
Foucault, Michel. 1978 – 9. (Ed. Senellart, trans Burchell). The Birth of Biopolitics, Lectures at the
Collège de France (2004/2008).
Fukuyama, Francis. 2002. Our Posthuman Future.
Giddens, Anthony. 1984. The Constitution of Society.
Habermas, Jürgen. 1984 & 1987. Theory of Communicative Action. Vols I & II.
Hart, Herbert. 1961. Concept of Law.
Heidegger, Martin. 1954. The Question Concerning Technology and Other Essays, trans. 1977.
Heidegger, Martin. Being and Time. 1927. SUNY edition trans. J Stambaugh (2010).
Ihde, Don. 1990. Technology and the Lifeworld: From garden to earth.
Jones, Richard. 1999. “Wet Footprints? Digital Watermarks: A Trail to the Copyright Infringer on the
Internet.” Pepperdine Law Review 26: 559.
Latour, Bruno. 2005. Reassembling the Social: An Introduction to Actor-Network-Theory.
Leonard, Andrew. How Netflix is Turning Viewers into Puppets, Salon, available at http://www.salon.
com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/
Lessig, Lawrence. 1999. Code.
Lessig, Lawrence. 2001. The Future of Ideas.
Lessig, Lawrence. 2004. Free Culture.
Litman, Jessica. 2001. Digital Copyright.
Lunney, Glynn. 2014. Empirical Copyright: A Case Study of File Sharing and Music Copyright,
Tulane Public Law Research Paper No. 14-2.
Madrigal, Alexis. How Netflix Reverse Engineered Hollywood, The Atlantic, available at http://www.
theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
Matthews, Lee. Google Glass Becomes Your Personal Translator, Geek.com, available at http://
www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-
1594120/
Nietzsche, Friedrich. 1974. The Gay Science (1882) trans. W Kaufmann.
Novak, Marcos. 1992. Liquid Architectures in Cyberspace, in Benedickt, Cybserspace.
Page, Thomas. 1998. “Digital Watermarking as a Form of Copyright Protection.” Computer Law and
Security Report 14 (6): 390.
Reese, Anthony. 2002 – 2003. “The First Sale Doctrine in the Era of Digital Networks.” Boston
College Law Review 44: 577.
Rifkin, Jeremy. 2000. The Age of Access.
Shumaker, Robert W., Walkup, Kristina R., and Beck, Benjamin B. (2011). Animal Tool Behaviour:
The Use and Manufacture of Tools by Animals.
Steigler, Bernard. 1998. Technics and Time 1: The Fault of Epimetheus.
Susskind, Richard. 1996. The Future of the Law.
Tarkovsky, Andrei. 1979. Stamlfr [Stalker, film].
TED Lectures. 2013. How 4d printing works, see https://www.youtube.com/watch?v=
0gMCZFHv9v8
Unger, Roberto. 1976. Law in Modern Society.
Vaidhyanathan, Siva. 2001. Copyrights and Copywrongs.
Watson, Peter. 2005. Ideas.
Whitwarn, Ryan. 2012. How DRM will Infest the 3d Printing Revolution at http://www.extremetech.
com/extreme/137955-how-drm-will-infest-the-3d-printing-revolution
International Review of Law, Computers & Technology 315
http://ec.europa.eu/internal_market/consultations/2013/copyright-rules/docs/consultation-document_en
http://ec.europa.eu/internal_market/consultations/2013/copyright-rules/docs/consultation-document_en
http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/
http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/
http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
http://www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
http://www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
http://www.geek.com/news/google-glass-becomes-your-personal-translator-with-word-lens-acquisition-1594120/
http://www.extremetech.com/extreme/137955-how-drm-will-infest-the-3d-printing-revolution
http://www.extremetech.com/extreme/137955-how-drm-will-infest-the-3d-printing-revolution
Copyright of International Review of Law, Computers & Technology is the property of
Routledge and its content may not be copied or emailed to multiple sites or posted to a listserv
without the copyright holder’s express written permission. However, users may print,
download, or email articles for individual use.
- Abstract
- 5. The future – symbiosis
1. Introduction
2. The assumptions made about technology
3. Empirical evidence of the assumptions about technology in the copyright sector
4. The desire for surveillance
6. Conclusion: the machine state
Acknowledgements
Notes
References
Information Governance and Assurance;
Reducing Risk, Promoting Policy
Publication info: Records Management Journal ; Bradford Vol. 24, Iss. 3, (2014): 253-255.
ProQuest document link
ABSTRACT (ENGLISH)
Early chapters of the book focus on introducing the concepts of governance and assurance and the UK law and
regulations that drive these requirements; there was a definite bias toward those laws that specifically concern
information, in particular, the Data Protection Act, and I would have liked to have seen more about those laws that
affect the way in which information is managed both in the broader context, e.g. employment law but also a
reference to the implications and issues when working in a global or international context which can present some
quite significant challenges when implementing an Information Governance Framework. Data are the focus of a
whole chapter, and it is a great introduction to the concepts of data management for those who have worked more
around information policy than the operational delivery of data and information services, systems and solutions.
Overall, I think that this is a useful addition to the books that are currently available that attempt to address this
subject area; having worked across the spectrum of information management, records management, information
assurance, information governance, risk and compliance and information security, I was already familiar with
much of the content.
FULL TEXT
This is one of the few books that brings together the concepts of records and information management and
information security and is a really solid introduction to the way in which the various information disciplines,
whether concerned with security and protection or reuse and optimisation, need to come together to ensure that
information remains useful, yet is appropriately secured to minimise risk.
Early chapters of the book focus on introducing the concepts of governance and assurance and the UK law and
regulations that drive these requirements; there was a definite bias toward those laws that specifically concern
information, in particular, the Data Protection Act, and I would have liked to have seen more about those laws that
affect the way in which information is managed both in the broader context, e.g. employment law but also a
reference to the implications and issues when working in a global or international context which can present some
quite significant challenges when implementing an Information Governance Framework.
I really like that this book referenced information in all of its forms, including data, which is all too often considered
as an entirely separate entity, yet remains a challenge when attempting to implement policy or demonstrate or
assure compliance. Data are the focus of a whole chapter, and it is a great introduction to the concepts of data
management for those who have worked more around information policy than the operational delivery of data and
information services, systems and solutions.
The chapter that focusses on the identification and assessment of threats is really useful and this is followed up
with a subsequent chapter on the security and protective measures that can be implemented to mitigate the threat
and any associated risk to the information. Again, this is a useful introduction to the concepts of information risk
management and information security.
While there are a couple of case studies, I would have liked this book to include some practical examples or
potential methodologies that bring together and integrate these information disciplines. Chapter 6 which focusses
on frameworks and “how it all fits together” identifies all of the various components that are referenced in the
broad spectrum of “information governance and assurance” and suggests an approach but does not sufficiently
https://search.proquest.com/scholarly-journals/information-governance-assurance-
reducing-risk/docview/2439157141/se-2?accountid=10378
https://search.proquest.com/scholarly-journals/information-governance-assurance-reducing-risk/docview/2439157141/se-2?accountid=10378
demonstrate its effectiveness. There are many real challenges that will need to be overcome if a truly integrated
approach to the management, governance and assurance of information and data is to be achieved within an
enterprise environment, and it would have been useful to have some tools and techniques that have been proven
elsewhere for consideration by the reader.
The challenge that the author faces is that this is such a broad subject that to try to go into any degree of detail is
not really practical, and this means that much of the content literally introduces a concept rather than go into any
great detail. I do not think that this is a bad thing though; instead, I feel that this book demonstrates the necessary
integration of functions that have previously been seen (and treated) as distinctly different in an organisation. It
highlights that it is no longer practical to produce information policies, to develop and implement security controls
and to operate and support key information management services in isolation of each other. Instead, it is
absolutely necessary to develop a framework approach that highlights the importance of each of these roles and
functions and seeks to establish the way in which they can work together to manage information as an asset in an
enterprise context.
Overall, I think that this is a useful addition to the books that are currently available that attempt to address this
subject area; having worked across the spectrum of information management, records management, information
assurance, information governance, risk and compliance and information security, I was already familiar with
much of the content. Through necessity, I had come by this knowledge the hard way, and I feel that this book is a
really solid introductory resource for those currently working in a specific discipline or those starting their careers.
It introduces integrated concepts and highlights the various information management principles that need to be
considered if we are to truly manage information at an enterprise level and as a key business asset that has long
been a (significant) challenge for many information professionals, irrespective of their specific discipline.
DETAILS
Subject: Integrated approach; Books; Information professionals; Information management
Publication title: Records Management Journal; Bradford
Volume: 24
Issue: 3
Pages: 253-255
Number of pages: 3
Publication year: 2014
Publication date: 2014
Publisher: Emerald Group Publishing Limited
Place of publication: Bradford
Country of publication: United Kingdom, Bradford
Publication subject: Business And Economics–Management
LINKS
Database copyright 2021 ProQuest LLC. All rights reserved.
Terms and Conditions Contact ProQuest
ISSN: 09565698
e-ISSN: 17587689
Source type: Scholarly Journals
Language of publication: English
Document type: Journal Article
DOI: http://dx.doi.org/10.1108/RMJ-08-2014-0034
ProQuest document ID: 2439157141
Document URL: https://search.proquest.com/scholarly-journals/information-governance-assurance-
reducing-risk/docview/2439157141/se-2?accountid=10378
Copyright: © Emerald Group Publishing Limited 2014
Last updated: 2020-09-02
Database: ABI/INFORM Global
http://dx.doi.org/10.1108/RMJ-08-2014-0034
https://search.proquest.com/scholarly-journals/information-governance-assurance-reducing-risk/docview/2439157141/se-2?accountid=10378
https://search.proquest.com/scholarly-journals/information-governance-assurance-reducing-risk/docview/2439157141/se-2?accountid=10378
https://search.proquest.com/info/termsAndConditions
http://about.proquest.com/go/pqissupportcontact
- Information Governance and Assurance; Reducing Risk, Promoting Policy
Image Data Sharing for Biomedical Research—Meeting
HIPAA Requirements for
De-identification
John B. Freymann & Justin S. Kirby & John H. Perry &
David A. Clunie & C. Carl Jaffe
Published online: 29 October 2011
# Society for Imaging Informatics in Medicine 2011
Abstract Data sharing is increasingly recognized as critical
to cross-disciplinary research and to assuring scientific
validity. Despite National Institutes of Health and National
Science Foundation policies encouraging data sharing by
grantees, little data sharing of clinical data has in fact
occurred. A principal reason often given is the potential of
inadvertent violation of the Health Insurance Portability and
Accountability Act privacy regulations. While regulations
specify the components of private health information that
should be protected, there are no commonly accepted
methods to de-identify clinical data objects such as images.
This leads institutions to take conservative risk-averse
positions on data sharing. In imaging trials, where images
are coded according to the Digital Imaging and Communi-
cations in Medicine (DICOM) standard, the complexity of
the data objects and the flexibility of the DICOM standard
have made it especially difficult to meet privacy protection
objectives. The recent release of
DICOM Supplement 142
on image de-identification has removed much of this
impediment. This article describes the development of an
open-source software suite that implements DICOM Sup-
plement 142 as part of the National Biomedical Imaging
Archive (NBIA). It also describes the lessons learned by the
authors as NBIA has acquired more than 20 image
collections encompassing over 30 million images.
Keywords Data sharing . De-identification .
Anonymization . Cross-disciplinary research . Open access .
Open source . DICOM . Supplement 142 . Image archive .
HIPAA . PHI . Common rule
This project has been funded in whole or in part with federal funds
from the National Cancer Institute, National Institutes of Health, under
Contract No. HHSN261200800001E. The content of this publication
does not necessarily reflect the views or policies of the Department of
Health and Human Services, nor does mention of trade names,
commercial products, or organizations imply endorsement by the U.S.
Government.
J. B. Freymann
SAIC-Frederick, Inc.,
EPN, Room 3006, 6130 Executive Blvd,
Rockville, MD 20892, USA
J. S. Kirby
SAIC-Frederick, Inc.,
EPN, Suite 317, 6130 Executive Blvd,
Rockville, MD 20892, USA
e-mail: kirbyju@mail.nih.gov
J. H. Perry
Radiological Society of North America,
820 Jorie Blvd,
Oak Brook, IL 60523, USA
e-mail: johnperry@dls.net
J Digit Imaging (2012) 25:14–24
DOI 10.1007/s10278-011-9422-x
D. A. Clunie
CoreLab Partners, Inc.,
100 Overlook Center,
Princeton, NJ 08540, USA
e-mail: dclunie@dclunie.com
C. C. Jaffe
Boston University School of Medicine,
FGH Building 3rd Floor, 820 Harrison Ave.,
Boston, MA 02118, USA
e-mail: carl.jaffe@bmc.org
J. B. Freymann (*)
SAIC-Frederick, Inc.,
EPN, Room 3006, 6130 Executive Blvd,
Bethesda, MD 20892-7412, USA
e-mail: freymannj@mail.nih.gov
Background
Advancing imaging research to serve as a critical element
in clinical therapeutic trials requires that imaging methods
be developed, optimized, and validated using commercial
clinical imaging instruments. This applies particularly to
quantitative imaging as a bio-marker for drug development
or measurement of drug response. For example, there is a
critical need to harmonize data collection and analysis across
the different commercial platforms used in clinical practice to
ensure robust correlation of image-derived parameters with
clinical outcome. In addition, data integration with other
laboratory-based molecular bio-markers requires a fundamen-
tal understanding of the physical and biological measurement
uncertainty in order to convert data to knowledge or support a
medical intervention. The National Cancer Institute (NCI)
Cancer Imaging Program has supported research initiatives to
improve the performance and reproducibility of imaging
methods, including development of imaging technology,
software tools for clinical decision making, and development
of molecular probes to incorporate the molecular basis for
clinical decision making. Central to these efforts is a
fundamental need for a widely adoptable, image-focused
informatics infrastructure along with data archives that
provide a common framework for data exchange and
shareable methods to validate current and emerging imaging
agents and methods.
Public funding agencies have long recognized the
importance of data sharing in cross-disciplinary research.
National Institutes of Health (NIH), for example, has had a
final statement for grantees on sharing research data since
2003 and a published guidance for grant recipients since
2006 [1]. Nevertheless, little data sharing has occurred
outside the framework of prearranged links between
research groups. One reason for the unwillingness of
institutions to share clinical research data is the variety of
local interpretations of Health Insurance Portability and
Accountability Act (HIPAA) regulations enforced by HHS
Office of Civil Rights. In this environment, the most
comfortable stance for institutional IT departments has
been to adopt risk-averse postures [2].
In the science community, mainstream stakeholders like
NIH, FDA researchers, PhRMA, and the device industry
continue to emphasize the importance of data and image
sharing in policy statements. New societal attitudes toward
funding science have focused renewed attention to data
sharing as a way to break down silos, accelerate progress,
and reduce research redundancy [3]. Besides access to a
greater universe of data available for research purposes and
assuring the validity of scientific claims, data sharing
provides other advantages to individual researchers by
producing more citations [4, 5]. Biomedical research
containing clinical data in particular motivates new justifica-
tion for encouraging data sharing since the bedrock of
disease-based clinical genetics and cellular discovery rests on
data derived from human subjects. Moreover, genetic
research must rely on large population sample sizes, making
conclusions derived from such data too costly to replicate by
other investigators. The data from each individual is obtained
at great cost and effort. If such data were sequestered in
small isolated collections and cannot be cross-queried, the
research community suffers. Investments in large-scale
national and international bio-specimen genetic projects are
underway by the NIH, including The Cancer Genome Atlas
[6] and the Cancer Human Biobank [7]. To be adequately
studied and analyzed, such tissue-specimen genetic data
must be accompanied by the individual’s clinical data, a key
component of which could include non-invasive imaging
obtained for diagnostic purposes. Sharing such images
requires informed consent by the patient and robust removal
of protected health information (PHI) from the images.
At a technical level, the field of diagnostic imaging has
benefited from a long historical investment in the Digital
Imaging and Communications in Medicine (DICOM) standard
by equipment manufacturers and devoted personnel in the
professional radiological societies [8]. In the context of image
sharing, DICOM Working Group 18 has recently developed
Supplement 142 (ftp://medical.nema.org/medical/dicom/final/
sup142_ft , accessed 28 February 2011) that provides
important guidance for de-identification of images and related
data objects.
This manuscript describes the challenges faced and
lessons learned during development and production imple-
mentation of an open-source suite of software that imple-
ments Supplement 142 for de-identification in the context
of an NCI-sponsored public biomedical image archive,
National Biomedical Imaging Archive (NBIA). These tools
have matured through extensive field use over the past
several years and offer a method sufficiently tested to
assure de-identification, transfer, management, and distri-
bution of DICOM images and XML objects. While this
software suite is freely available for download and use [9],
the focus of this paper is not to advocate for these specific
implementations but rather to provide guidance for evalu-
ating tools appropriate to a given context.
Technical Issues in Multi-center Data Sharing
Clinical trials and other research-driven image collection
activities often produce a combination of image and non-
image data objects. Preserving the interrelationships between
these objects while de-identifying their PHI is challenging.
Images are typically encapsulated in DICOM datasets that
contain identifiers for a trial, a patient, a study, a series (of
images), etc. Increasingly, non-image data objects are encap-
sulated in XML files. All data objects in a given research set
J Digit Imaging (2012) 25:14–24 15
ftp://medical.nema.org/medical/dicom/final/sup142_ft
ftp://medical.nema.org/medical/dicom/final/sup142_ft
must share common identifiers if the correspondences among
them are to be preserved. Since the original identifiers inserted
into the data objects when they were created can be PHI, they
are almost always replaced by pseudonymous values (PHI
encrypted by an appropriate authority) that maintain the
relationships among the data objects but break the connection
to the specific human trial participant [10]. When multiple
data object types are present in a trial, the de-identification
mechanism must support all the data types such that the
identifying links between them are maintained.
It is possible to discern subtle differences in the mean-
ings of the words “de-identification” and “anonymization,”
but in this paper, they will be used as synonyms, with the
former being preferred. In a multi-center clinical image
collection project, images are generally received by a data
system via the DICOM protocol, usually from a PACS
workstation or modality. Non-image data objects are
generally transferred to the clinical trial system via HTTP.
Once the data objects have been received, they are de-
identified and then transmitted to a principal investigator
site, contract research organization, or a centralized archive,
usually in another location, via the Internet.
Data Transmission
Although clinical image data are de-identified at the
originating institution before transmission, many trials
require that the data be transmitted using Secure Sockets
Layer to provide encryption. Some trials use Transport
Layer Security (TLS) to provide both data encryption and
client/server authentication.
Most clinical image data transfer on the Internet requires the
penetration of at least one firewall. Most projects employ
software that makes outbound connections from the secure
network at the image acquisition sites to the principal
investigator site. This relieves the image acquisition sites from
having to open a port to the Internet, but it requires one port to
be open to the Internet at the principal investigator site—a
requirement that some IT departments are unwilling to support
(see Fig. 1). Some clinical trial transfer packages allow two
programs to run together at the principal investigator site to
pull data into the secure network from the DMZ without
having to open a port to the secure network. A DMZ
(demilitarized zone) is an interface sub-network that exposes
an organization’s external services to a larger untrusted
network. It provides an additional layer of security to an
organization’s local network. Others use virtual private
network technology to allow image acquisition sites to access
the secure network at the principal investigator site. Most
clinical trial data transfer packages support all those options.
Once in the secure network at the principal investigator
site, data objects must be validated (checked that they belong
to a specific trial), curated (assure that data file structure
allows it to be viewable as an image), organized, and stored.
This process, which varies from project to project, requires
software that is flexible enough to allow human intervention
in the process. In all projects, access to the stored data must be
controlled. In large image archive acquisition projects,
multiple layers of storage in staging servers may be involved
prior to data being made available more generally.
De-identification
The objective of de-identification is to ensure that data
objects cannot be connected to a specific human subject
[11]. The HIPAA Privacy Rule [12] defines two approaches
to removal of PHI: one that leaves the decision as to what
constitutes PHI to a nominal expert and the other that pre-
defines 18 categories of identifiers to specifically remove or
conceal, i.e.,
The following identifiers of the individual or of relatives,
employers, or household members of the individual must
be removed: (1) Names; (2) all geographic subdivisions
smaller than a state, except for the initial three digits of
the ZIP code if the geographic unit formed by combining
all ZIP codes with the same three initial digits contains
more than 20,000 people; (3) all elements of dates except
year, and all ages over 89 or elements indicative of such
age; (4) telephone numbers; (5) fax numbers; (6) email
addresses; (7) social security numbers; (8) medical record
numbers; (9) health plan beneficiary numbers; (10)
account numbers; (11) certificate or license numbers;
(12) vehicle identifiers and license plate numbers; (13)
device identifiers and serial numbers; (14) URLs; (15) IP
addresses; (16) biometric identifiers; (17) full-face photo-
graphs and any comparable images; (18) any other
unique, identifying characteristic or code, except as
permitted for re-identification in the Privacy Rule.
Note the ambiguity of item 18. The Federal Register in
2006 presents the rule [13], and NIH guidance is provided
under the title “Research Repositories, Databases, and the
HIPAA Privacy Rule” [14]. In research data, such informa-
tion is typically replaced with pseudonymous values that
allow trial subjects, studies, and data objects to be related to
one another but not connected to a specific human being.
To fully de-identify a DICOM image, PHI must be
removed from both the metadata elements and the pixels of
the image itself. De-identifying metadata is complicated by
the fact that manufacturers and even end users of medical
imaging equipment often use DICOM elements in a way that
legitimately extends or does not conform to the standard,
resulting in PHI sometimes being found where not normally
expected. In addition, manufacturers sometimes place PHI in
private elements, the contents of which are unspecified in the
16 J Digit Imaging (2012) 25:14–24
DICOM standard, and not reliably clarified in conformance
statements. These complications require a de-identification
system to be flexible enough to be configured to handle
special circumstances as they arise [15].
The removal of PHI burned into the pixels of diagnostic
images is even more difficult. This can be performed
completely manually (http://www.dclunie.com/pixelmed/
software/webstart/DicomCleanerUsage.html—blackout
accessed 28 February 2011), but several groups have developed
approaches for discovering text information burned into the
pixels of an image. In most of these efforts, image processors
use optical character recognition to flag possible PHI. As yet,
none seems provably robust enough to be acceptable for
automatic processing without a human observer in the loop.
The DICOM standard provides an element used to indicate that
an image contains PHI, but the element is not universally
supported, and in any case, it does not indicate where in the
image the PHI is located. The best approach appears to be using
the DICOM metadata elements to identify those images
particularly at risk of containing burned-in PHI, such as specific
modalities including ultrasound, or those images with elements
suggesting that they are screen captures (e.g., of 3D recon-
structions or other post-processed images). In some cases,
specific templates for the locations of burned in text can be
applied based on the device manufacturer and model. Care
needs to be taken to address PHI present in the high (unused)
bits of the pixel data that may be used as overlays.
DICOM Supplement 142
The DICOM standard provides important guidance for de-
identification. In DICOM PS 3.15, Annex E, “Attribute
Confidentiality Profiles,” the standard defines the Basic
Application Level Confidentiality Profile, which specifies
requirements for applications that de-identify and/or re-
identify dataset attributes and (in Table E.1-1) lists a set of
attributes that are subject to the profile (ftp://medical.nema.
org/medical/dicom/2009/09_15pu , accessed 28 February
2011). This profile was added in Supplement 55 in 2002
(ftp://medical.nema.org/medical/dicom/final/sup55_ft ,
accessed 28 February 2011), but it has proven to be insufficient
for robust de-identification. During the development of the IHE
Teaching File and Clinical Trial Export (TCE) profile (http://
www.ihe.net/Technical_Framework/index.cfm#radiology,
accessed 28 February 2011), additional standard material was
added to elaborate on the issues of de-identification and
pseudonymization, but it too does not define a comprehensive
and detailed approach.
Accordingly, Supplement 142 (ftp://medical.nema.org/
medical/dicom/supps/sup142_pc , accessed 28 February
2011) was developed, to provide more detailed guidance for
de-identification of data objects for various purposes. The
supplement is built on a Basic Profile that takes a very
conservative approach to removing or replacing any
information about the identity of the patient, their family
Fig. 1 CTP software performs
custom scriptable
de-identification behind the
institution’s firewall. The files
are then securely transferred
through the Internet to the host
NBIA where they are
re-inspected for DICOM validity
and thorough de-identification
before they are made publically
accessible
J Digit Imaging (2012) 25:14–24 17
http://www.dclunie.com/pixelmed/software/webstart/DicomCleanerUsage.html
http://www.dclunie.com/pixelmed/software/webstart/DicomCleanerUsage.html
ftp://medical.nema.org/medical/dicom/2009/09_15pu
ftp://medical.nema.org/medical/dicom/2009/09_15pu
ftp://medical.nema.org/medical/dicom/final/sup55_ft
http://www.ihe.net/Technical_Framework/index.cfm#radiology
http://www.ihe.net/Technical_Framework/index.cfm#radiology
ftp://medical.nema.org/medical/dicom/supps/sup142_pc
ftp://medical.nema.org/medical/dicom/supps/sup142_pc
members, any personnel involved in the procedure, the
organizations involved in ordering or performing the
procedure, and additional information that might be
combined to associate the object with the patient.
Supplement 142 also provides several options appropriate
to special situations. Two classes of options are defined, those
that require significant and burdensome effort to remove
additional information (and which may not be justified in low
risk scenarios) and those that define retention of information
that would otherwise be removed, but without which a
particular type of research would be impossible. Common
examples of the latter include the need to retain date
information in therapeutic oncology trials, without which
dates of progression or response cannot be determined, the
need to retain patient characteristics related to body size for
whole body PET studies, without which standardized uptake
values cannot be computed, and the need to retain image and
device (but not patient) unique identifiers that may be required
for the audit trail. In such cases, the additional information that
is needed for the conduct of the trial may not be permitted by
regulation, and therefore, additional permission is required
either from the subject or from the institutional review board
(IRB) or ethics committee. The options defined in Supplement
142 are intended to provide a small and tractable set of
standard definitions with accompanying justification, such
that each IRB and consent form can reference the standard
categories, rather than debating the merits of individual
DICOM data elements.
The options defined in the supplement are:
& Clean Pixel Data Option: removal or distortion of the
actual pixel data where there is identification information
burned in as annotation text
& Clean Recognizable Visual Features Option: removal or
distortion of the actual pixel data where there is possibility
of visually identifying the individual in the images
& Clean Graphics Option: removal of identification
information encoded as graphics, text annotations, or
overlays (excluding Structured Report SOP classes)
& Clean Structured Content Option: removal of identifi-
cation information in Structured Report SOP classes
& Clean Descriptors Option: removal of identification
information from descriptive tags which contain unstruc-
tured plain text values over which an operator has control
& Retain Longitudinal Temporal Information Options: re-
tention or modification of tags that contain dates or times
& Retain Patient Characteristics Option: retention of
physical characteristics of the patient that are descrip-
tive rather than identifying information (e.g., metabolic
measures, body weight, etc.)
& Retain Device Identity Option: retention of information
about the identity of the device used to perform the
acquisition
& Retain UIDs Option: retention of the unique identifiers
for studies, series, instances, and other entities in the
DICOM model
& Retain Safe Private Option: retention of private attributes
known to be safe
Supplement 142 was drafted by leading industry experts in
DICOM Working Group 18. In particular, those involved in
international pharmaceutical clinical trials for regulatory sub-
missions were broadly consulted, and indeed, the work effort
was initiated as a consequence of discussion during a Drug
Information Association Medical Imaging Stakeholders Call
for Action in 2007. Global regulations were considered,
including not just the HIPAA Privacy Rule but also the
European Privacy Directive. Supplement 142 provides a
platform for consistent de-identification that meets global
regulatory requirements and is thus a substantial contribution
to medical research.
Methods
The NBIA [16] is an open-source software suite developed
under the aegis of the caBIG program of the NCI’s Center for
Bioinformatics [17] and Information Technology [18]. The
software has been installed at numerous institutions for use in
sharing image collections. This section introduces the
software and describes its use in the acquisition, management,
and distribution of image collections by the NCI’s Cancer
Imaging Program and other institutions running the software.
National Biomedical Imaging Archive Project
NBIA [19] is a highly scalable, DICOM-based image archive
that provides full submission-to-retrieval functionality opti-
mized for the requirements of the in vivo medical imaging
clinical and research communities. It combines image
acquisition and processing capabilities with submission
reporting and quality control tools to facilitate inter-
institution data sharing. NBIA provides query access to more
than 90 DICOM tag elements. These can be queried through
three levels of search interfaces as well as an API. It integrates
cine-view, thumbnails, and full DICOM element previews. A
saved-query feature provides a unique reference keyword for
direct linkage to data sets from publications, etc. Data
download is supported through a Java download manager
for larger collections. Non-DICOM metadata can be contained
in XML or Zip files and linked at the image series level when
appropriate. Images can be grouped within collections for
specific research purposes, and the NBIA supports pop-up
menus that can provide short summaries of these collections
or link to external information sites such as Wikis or other
web sites.
18 J Digit Imaging (2012) 25:14–24
The NBIA web application allows users to search for,
manage, and retrieve DICOM images. The web application is
written in Java and relies on the JSF presentation framework.
It is deployed on a JBoss application server. The image
metadata indexed by the web application is stored in a
MySQL or Oracle database. The DICOM images themselves
are stored in a file system of the administrator’s choice. NBIA
provides a collection- and submission site-based authorization
model that is implemented using NCI’s Common Security
Module. This allows an administrator to create public access
and restricted access data sets as needed. Additionally, the
NBIA system includes a caGrid data service based upon the
caBIG NCIA_MODEL version 3 [20]. The grid service
provides the ability to retrieve DICOM images using the
caGrid Transfer service, allowing for multiple installations of
NBIA to seamlessly communicate and share images in a
federated manner.
NBIA integrates a separate software package, RSNA’s
Clinical Trial Processor (CTP), to manage the transfer of
images into the NBIA system. In a project employing
NBIA, CTP is installed at both the data acquisition sites
and an NBIA site. These sites are often called client and
server sites, respectively. CTP is configured to de-identify
data objects at the client sites to ensure that PHI never
leaves the originating institutions. At the client site, images
are both de-identified and tagged with provenance infor-
mation in private elements for use in indexing the images.
The CTP at the client site then transmits the data objects to
the CTP at the NBIA server site, which stores the images in
a file system and extracts information from the DICOM
elements for storage in the NBIA relational database.
NCI’s Cancer Imaging Program has used NBIA to create
more than 20 research image collections. These collections
and more that will follow are intended to make medical
imaging case studies available to a wide cross-disciplinary
research community. NBIA has also been used to establish a
nationwide infrastructure for sharing images, supporting
stratification of patients in adaptive clinical trials, cross-
disciplinary research on response measurement fundamentals,
and increasing the research community’s awareness of image
reliability analysis.
NBIA’s archive and open-source tools provide:
& Multiple research image data collections, encouraging
development of reliable quantitative measurement of
change over time by supplying longitudinal clinical
response imaging case studies to a wide research
community
& Real-time, multi-institutional image access, supporting
protocol stratification strategies in adaptive trials
& Support for cross-disciplinary research on response
measurement fundamentals and analysis of quantitative
reproducibility studies
For clinical trial data residing in non-public-access
archives, these same software tools implement role-based
security to permit selected PHI to remain in place. In this
situation, access to such images requires formal permission
granted by the signing of a limited dataset agreement [21].
Clinical Trial Processor
CTP is a tool developed by the Radiological Society of North
America (RSNA) for autonomously processing data objects in
clinical trials. It is written entirely in Java and runs on Unix,
Linux, Solaris, Mac OS, and Windows. It runs either stand-
alone or as a Windows service on XP, Vista, and Windows 7.
The program’s interface is provided by an integrated web
server with several servlets that provide access to status and
configuration information. Complete documentation on CTP
is located on the RSNA MIRC Wiki [22].
Processing in CTP is organized into pipelines [19], each
consisting of a sequence of stages, where each pipeline stage
is designed to perform a specific function. CTP is highly
configurable, allowing administrators to construct pipelines
to meet a wide variety of requirements. CTP currently
provides 25 standard pipeline stages in four categories:
& Import Services receive data objects from external
sources and queue them for subsequent processing.
& Processors receive a data object as it flows down the
pipeline, take some action, and pass on the object to
the next stage. Actions can range from simply logging the
passage of the object to modification of the object.
Processors are synchronous stages, not passing on the
object until processing is complete.
& Storage Services receive a data object as it flows down
the pipeline, store a copy of the object in some kind of
storage system, and then pass the object on to the next
stage. Storage Services are synchronous.
& Export Services receive a data object as it flows down the
pipeline, queue a copy of the object for subsequent
transmission to an external system, and then pass the object
to the next stage. The queuing process is synchronous; the
subsequent transmission occurs asynchronously.
CTP is designed to be easily extended by the addition
of new pipeline stages and database adapters.
To be useful, a clinical data object must contain
identifiers that relate it to other data objects. CTP supports
four types of data objects, three of which provide
standardized access to the identifiers and data they contain:
& FileObjects are data objects of indeterminate contents.
This is the superclass of the other three types, but on its
own, it is not useful because it does not provide access
to the required identifiers.
J Digit Imaging (2012) 25:14–24 19
& DicomObjects are DICOM datasets. This type provides
all the necessary identifiers as defined in the DICOM
standard.
& XmlObjects are XML documents. XML provides for the
encapsulation of text-based data. Many XML schemas are
in use in clinical trials today, and there is no standard
definition of how the required identifiers are encoded. The
CTP XmlObject attempts to find identifiers by looking in
a sequence of commonly used schema locations.
& ZipObjects are zip files containing one or more data
files plus a file called manifest.xml which contains the
required identifiers. The manifest.xml file is located in
the root of the zip file’s directory tree, and it obeys a
standard schema. The ZipObject provides a way to
encapsulate collections of related data objects in any
format while still carrying the identifiers which allow
them to be related to other objects in the trial.
Since data objects in clinical image collections are generally
produced by clinical systems, they almost always contain PHI.
Among the most important standard pipeline stages in CTP are
ones for de-identifying data objects. CTP provides four
standard pipeline stages for modifying data objects to remove
PHI and replace it with pseudonymous values:
& The DicomAnonymizer modifies DicomObjects in
accordance with a script. The script is written in a
simple language that provides many functions for
handling specific types of DICOM elements. Both
CTP and the independent clinical trial management
software written by the American College Research
Imaging Network use this language. CTP provides a
special servlet to simplify the process of defining a
DicomAnonymizer script. This servlet allows the
administrator to define the rules for de-identification
of each individual DICOM element. Since de-
identification is a complex technical field, the DICOM
committee has released Supplement 142 to the standard,
specifying de-identification profiles and options for
various purposes. One of the authors (JK) has written
script implementations of all the Supplement 142
profiles and options, and these are built into CTP. The
CTP DICOM Anonymizer Configurator also supports
user-defined profiles. The default de-identification
script is the most stringent one defined in Supplement
142 (the Basic Profile). This provides access to a de-
identification mechanism that is in common use and has
been vetted to meet regulatory requirements for protect-
ing patient privacy. The configurator servlet allows the
administrator to select a profile as a starting point and
modify it to meet any special needs of the trial.
& The DicomPixelAnonymizer modifies DicomObjects
by blanking regions of the pixels in a DicomObject in
accordance with a script. The script consists of a
sequence of signatures and region sets. A signature is
a boolean calculation based on the contents of the
DicomObject’s elements. Each signature is accompa-
nied by a list of rectangular regions to blank in images
that match the signature. When processing a DicomOb-
ject, the DicomPixelAnonymizer computes each signa-
ture value in turn, chooses the first one that matches,
and then blanks the regions associated with it.
& The XmlAnonymizer modifies XmlObjects in accor-
dance with a script written in a language that is inspired
by, but is much simpler than, XPath. CTP provides a
special servlet to simplify the process of defining an
XmlAnonymizer script.
& The ZipAnonymizer modifies the manifest.xml file in a
ZipObject in accordance with a script written in a language
that is identical to that used by the XmlAnonymizer. When
de-identifying ZipObjects in a clinical trial, one must
remember that since the ZipObject can contain files of any
format, PHI may be contained in places that the ZipAno-
nymizer does not modify. For this reason, ZipObjects are
most useful for encapsulating the analytic results of
programs that operate on prior de-identified objects.
Import and export pipeline stages provide for the
reception and transmission of data objects. CTP includes
five standard import stages and five standard export stages
that support the common protocols (HTTP(S), DICOM, and
FTP) as well as manual import from directories and
archives:
& HttpImportService receives data objects via the HTTP
and HTTPS protocols.
& PollingHttpImportService makes an outbound connection
to a PolledHttpExportService and receives data objects in
the input stream of the connection, thus avoiding the
necessity of opening a port for inbound connections.
& DicomImportService implements a DICOM Storage
SCP for the receipt of DICOM data objects.
& DirectoryImportService imports (and removes) data
objects that appear in a directory.
& ArchiveImportService copies data objects from a
directory tree and processes them, leaving the objects
in the original location unmodified.
& HttpExportService transmits data objects via the HTTP
and HTTPS protocols.
& PolledHttpExportService serves data objects in response
to received connections.
& DicomExportService implements a DICOM Storage
SCU for the transmission of DICOM data objects.
& FtpExportService transmits data objects via the FTP
protocol, organizing them on the destination server by
study identifier.
20 J Digit Imaging (2012) 25:14–24
& DatabaseExportService provides a queued interface to
an external database.
In situations where a port to the internet cannot be
opened on the secure network at a principal investigator
site, two instances of CTP can be run, one on the secure
network and one in the DMZ, using the polled HTTP stages
to allow data objects in from the Internet without opening a
port.
The DatabaseExportService interfaces with an external
database through an extension of the standard CTP Data-
baseAdapter class. In the NBIA project, the NCI wrote a
DatabaseAdapter (NCIADatabase [sic]) that receives parsed
data objects from the DatabaseExportService and extracts
information for storage in an external SQL database
(MySQL or Oracle). This mechanism provides a flexible
way to build complex databases without having to manage
the transfer, or even the parsing, of the data objects
themselves.
A Survey of Image Collections and Tools
Several research alliances are actively developing both
publicly accessible biomedical image databases and soft-
ware tools to support them. In some cases, the tools
themselves are accessible for download, allowing new
research groups to utilize them in posting their own
datasets. In other situations, the software is a customized
solution with more limited scalability to other use cases.
To gain a better understanding of the characteristics of
the various approaches, a search for biomedical imaging
tools and archives was carried out. Since any such survey
would be rapidly out of date, the information gathered is
posted on a Wiki [23]. The goal of this resource is to allow
members of the research imaging community to find image
collections and tools for creating new collections, to
participate in the review, and to ensure that posted
information remains as accurate and up to date as possible.
A well-maintained site that catalogues mostly open-source
image software analytic tools is also available on another
Wiki. [24]
Discussion
The process of building the collections housed at the NIH
NCI NBIA produced a number of lessons learned with
regard to effectively managing the process of collection, de-
identification, and distribution of DICOM images for
research. They are presented here as points to consider
not only to users of the NBIA and CTP software suite but
also to anyone developing or assessing similar tools. This
section presents the key lessons learned.
Support Multiple Means for Submitting Data
Data have been submitted to the NCI NBIA archive from
many sources via several communication protocols. Among
the most common ways that DICOM objects have been
imported into CTP at the client site are:
& Transmission via HTTP(S) on the Internet, usually from
a tool such as RSNA’s FileSender
& Transmission via the DICOM protocol from a PACS or
workstation at the submission site
& Physical delivery on CD/Hard Disk via mail, some in
the format of DICOM CDs, others simply image files
Any software suite must be able to import data from all
these media. Although the transport protocol varies,
DICOM is the dominant format for the image data itself.
Occasionally, images have been received in a non-standard
format, but we have found that converting such images to
DICOM expands their utility.
Use DICOM Supplement 142 Profile Templates
Institutions and vendors vary widely in the ways they create
and de-identify images. The de-identification rules for a
collection depend on the intended use of the collection as well
as the initial state of the images as they are acquired. For
example, patient studies containing PHI must be de-identified
fully, but previously de-identified studies obtained from another
collection may require little or no additional modification. The
de-identification process must therefore be very flexible.
Before the publication of Supplement 142, developing
de-identification scripts for a variety of use cases required a
thorough understanding of DICOM, and the scripts them-
selves took substantial time to write and test. Having
implementations of the Supplement 142 profiles available
in the CTP de-identification stages greatly simplifies the
task and improves the confidence of the submitters and
curators that regulatory requirements are being met. It also
allows the de-identification rules to be changed quickly for
specific submissions when necessary. Proper use of the
Supplement 142 profiles also provides a historical record
within each DICOM object detailing the previous profiles
applied to de-identify the images. This practice, discussed
in further detail below, allows consumers of the data to
clearly evaluate how the image was de-identified and also
clarifies what additional steps may need to be taken if the
data are being repurposed for a new audience.
Do Not Overdo De-identification
Image collections generally contain data from many
patients, each often having multiple studies and series. To
J Digit Imaging (2012) 25:14–24 21
maximize the benefits of such collections, the identifiers in
the data objects must retain the ability to distinguish among
patients, studies of a single patient, etc. Any implementa-
tion of the Supplement 142 profiles must be careful to
provide pseudonyms for such identifiers rather than fixed
values. For example, if every patient ID were named to the
same value, then most DICOM software would treat your
entire dataset as though it consisted of only a single patient.
Dates require special attention. Maintaining the temporal
relationships among studies of the same patient adds
significantly to the utility of an image collection, but
original calendar dates themselves are PHI and must
therefore be modified. This is addressed in the Supplement
142 “Retain Longitudinal Temporal Information Options.”
The simplest implementation is to offset dates by an
interval that is the same for all images in the collection.
Prior to the creation of Supplement 142, we had found it
convenient to use intervals large enough that users of the
collection do not question whether the dates have been
modified. However, it was later discovered that offsetting
the dates by large increments can cause problems in some
DICOM software if the resulting dates are prior to the
1980s. Supplement 142 specifies that the Attribute Longi-
tudinal Temporal Information Modified (0028,0303) should
be populated with a value of “MODIFIED” to make it clear
that dates have indeed been altered. This is a simpler and
more effective solution.
Do Not Rely on DICOM to Indicate Burned-in PHI
PHI burned into the pixels of images poses a serious
problem for public research archives. Technologists or
PACS administrators are sometimes unaware that these
types of images exist in their local systems. A wide
variety of such images have been received for the NCI
collections, including not only clinical images containing
patient names in their pixels but also digitized billing
records in DICOM wrappers. Of significant concern is
the recent practice of scanning the patient exam request
document into the DICOM study series to record the
clinical need for the exam and validate billing. That scanned
image, usually a final series in the study, is often full of PHI
both in the DICOM tags and within the image. Most
commercial software intended for de-identification fail to
address the special content of that series.
Strategies for dealing with this issue are provided by the
Supplement 142 “Clean Pixel Data” and “Clean Graphics”
options, but the identification of the images themselves can
be a problem. Some DICOM elements that can be useful
are:
& (0008,0016) SOP Class UID: value indicating Second-
ary Capture and Ultrasound SOP Classes
& (0008,0008) Image Type: The values SECONDARY
and SCREEN SAVE indicate a suspect image, but they
are not definitive
& (0028,0301) Burned-in Annotation: The value YES is
definitive, but this element is often not supplied in
DICOM images, since it is optional for most objects
and a relatively recent addition to the standard
& (0018,1016) Secondary Capture Device Manufacturer:
The value of this element can be used to discriminate
against certain image types that may contain PHI
& (0018,1018) Secondary Capture Device Manufacturer’s
Model name: The value of this element can be used to
discriminate against certain image types that may
contain PHI
Image collection tools must have a means for scanning
such elements and segregating images for special attention
based on defined criteria. CTP provides filter stages driven
by a script language that allows testing the values of all
DICOM elements and automatically quarantining objects
that fail the test.
Keep an Audit Trail of De-identification History
It is often necessary to know the de-identification history of an
image. DICOM Supplement 142 meets this need by defining
standard profiles, the codes for which can be used as an audit
trail. For example, if in the process of de-identification one used
the Basic Application Confidentiality Profile with the option to
Retain Longitudinal With Modified Dates, one would also
populate the De-identification Method Code Sequence
(0012,0064) with the corresponding Coding Scheme Desig-
nators for those changes. If the biomedical image community
were to adopt this standard, it would be much easier to
understand the history of how an image was de-identified and
to make decisions on whether further changes are needed as
images are repurposed for consumption by new audiences.
A separate audit trail of exactly what values have been
replaced may also be maintained, but must be protected since
by definition it may contain PHI. If this is done within the
DICOM image file itself, it must be encrypted, and data
elements are provided for that purpose; their use is deprecated,
however, since any encryption scheme becomes vulnerable
over time and such images may be archived indefinitely.
Supplement 142 warns about this, and accordingly if any such
audit trail is required, it should probably be maintained separately
from the images and both logically and physically protected.
Enable Local Mapping Between Anonymized Identifiers
and PHI
When questions arise about the integrity of the submitted data,
it is often necessary for an administrator at the submitting site
22 J Digit Imaging (2012) 25:14–24
to examine the original data to determine whether the
problems are within the original data or if they were created
during the process of de-identification or transmission. To do
so, the anonymized identifiers obtained from the collection
curator must be translated back to the original PHI. The CTP
IDMap stage can be used to provide this translation of
identifiers. To have access to this function, a user must be
authenticated and have administrator privileges. This func-
tionality may also be necessary in situations where image data
are to be correlated with additional data types that have not or
will not be de-identified.
Provide End-to-End Transport Verification
In many clinical trials, each submission is accompanied by a
case report form or an IHE TCE manifest. In most
submissions to research image collections, however, no
manifest is available to identify the individual images, series,
etc. that have been transmitted. NBIA and CTP therefore
include special tools to verify that the submitted data has been
received and successfully processed.
Once such tool, the CTP Database Verifier, can be used at
the submitting site to ensure that all transmitted data made it all
the way into the NBIA database. This tool tracks the de-
identified UIDs of every object that is sent to the archive and
then periodically queries the NBIA server via its relational
database to confirm the object was received and stored. This
has saved both submitters and curators substantial effort. The
NBIA View Submission Report function is also useful for
comparing totals of data objects received by the system against
counts of the original submissions, although this tool is more
often used for general reporting and auditing of what has been
archived.
Provide Multiple Levels of Data Verification
We have used CTP’s filtering stages to verify that the
metadata of images matches the protocol of a study and to
quarantine images that fail before they are added to the
collection. We have also used the QC Tool in the NBIA
software to verify the content of the data manually. The tool
is designed to allow a curator to see both the images and
corresponding DICOM elements in a single view.
Because PHI can occur both within the image pixels
and the metadata elements, we have found that having
the ability to view both simultaneously substantially
decreases the level of effort involved in managing submitted
data.
We have also found that having a built-in method for
deleting images has been necessary more often than
expected. This allows curators to easily remove data that
have eluded detection for not matching the protocol or for
containing PHI in unexpected places.
Carefully Estimate Resources Required
It is easy to underestimate the time and effort involved in
collecting and managing images for image collections. While
the maturity of CTP and NBIA has grown significantly over
the past few years, it still requires between 1 and 4 h for an
expert CTP user to provide training to a new site manager on
how the submission process works and to do preliminary
setup. In a large project, this justifies setting up a help-desk
function. Preliminary setup is typically followed by small-scale
submission tests to ensure the data arrives as expected
(modality, number of images per study, de-identification
completeness, etc.). Again, the use of CTP implementation of
the Supplement 142 profiles has greatly reduced the amount of
setup time required. It does not, however, completely remove
the need for careful checking by a small test submission of the
implementation before large-scale acquisition is started.
Although the combination of CTP and NBIA can be run
autonomously, it is important to provide human oversight, not
only to ensure that privacy regulations continue to be met as
data from new acquisition sites are received but also to ensure
that the data added to the collection are consistent with the
collection’s intended use. Tools such as the ones described
here reduce the workload of the collection’s human curator,
but they do not eliminate it. Thus, anyone considering hosting
a truly open biomedical image archive should also allocate
staff resources for the collections’ curators.
Conclusion
Publicly shared archives of image data are an increasingly
critical element of cross-disciplinary research, especially for
clinical biomedical research where diagnostic images of the
spectrum of human disease and its response to therapy are a
scarce commodity. As genetic biomedical understanding
develops, one of the significant contributions of clinical
imaging will be to produce very large collections that can be
subjected to statistical tests of validity. Without a greater
confidence in the image de-identification process, open-access
DICOM archives that can be queried to correlate with genetics
will never achieve their potential. Some international efforts
besides those described in this paper are ongoing with the
intent to achieve similar ends [25]. In the absence of
community consensus on image de-identification and user-
friendly tools and SOPs, researchers have been understand-
ably reluctant to create publicly accessible image archives.
This paper suggests that developments in standards and
technology have removed key stumbling blocks to the
creation of these valuable archives. The DICOM Commit-
tee, through Supplement 142, now offers a robust frame-
work for de-identification meeting the privacy regulations.
The incorporation of these guidelines into easy-to-use
J Digit Imaging (2012) 25:14–24 23
image acquisition and management tools, coupled with the
increasing availability of open archive solutions, should
facilitate the creation of the image archives needed for the
next generation of biomedical research.
Acknowledgments The authors wish to acknowledge the support of
the Radiological Society of North America in developing and
promoting the deployment of CTP. We would also like to recognize
the extensive contributions of members of DICOM Working Group 18
in the development of Supplement 142.
References
1. National Institutes of Health. http://grants.nih.gov/grants/policy/
data_sharing/. Accessed 28 February 2011
2. Ohm, Paul, Broken Promises of Privacy: Responding to the
Surprising Failure of Anonymization (August 13, 2009). Univer-
sity of Colorado Law Legal Studies Research Paper No. 09–12.
Available at SSRN: http://ssrn.com/abstract=1450006. Accessed
28 February 2011
3. Nelson B: Empty archives. Nature 461–10:160–163, 2009
4. Vickers AJ: Whose data set is it anyway? Sharing raw data from
randomized trials, Trials 2006. BioMed Central 7:15, 2006
5. Piwowar HA, Day RS, Fridsma DB: Sharing detailed research data is
associated with increased citation rate. PLoS One. 2(3):e308, 2007
6. National Institutes of Health. http://cancergenome.nih.gov/.
Accessed 28 February 2011
7. National Institutes of Health. http://biospecimens.cancer.gov/archive/
cahub/default.asp. Accessed 28 February 2011
8. Branstetter 4th, BF, Uttecht SD, Lionetti DM, Chang PJ:
SimpleDICOM suite: personal productivity tools for managing
DICOM objects. Radiographics. 27(5):1523–1530, 2007
9. National Cancer Institute, Cancer Imaging Program. https://wiki.
nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18
+Supplement+142+into+CTP. Accessed 28 February 2011
10. Noumeir R, Lemay A, Lina JM: Pseudonymization of radiology
data for research purposes. J Digit Imaging. 20(3):284–295, 2007
11. Hrynaszkiewicz I, Norton M, Vickers AJ, Altman DG: Preparing
raw clinical data for publication: guidance for journal editors,
authors, and peer reviewers. Trials 11:9, 2010
12. Health and Human Services. http://www.hhs.gov/ocr/privacy/
hipaa/understanding/summary/. Accessed 28 February 2011
13. National Institutes of Health. http://privacyruleandresearch.nih.
gov/pdf/FinalEnforcementRule06 . Accessed 28 February
2011
14. National Institutes of Health. http://privacyruleandresearch.nih.
gov/research_repositories.asp. Accessed 28 February 2011
15. González DR, Carpenter T, van Hemert JI, Wardlaw J: An open
source toolkit for medical imaging de-identification. Eur Radiol.
20(8):1896–1904, 2010
16. National Institutes of Health. https://imaging.nci.nih.gov/ncia/
login.jsf. Accessed 28 February 2011
17. National Institutes of Health. http://ncicb.nci.nih.gov/. Accessed
28 February 2011
18. National Institutes of Health. https://wiki.nci.nih.gov/dashboard.
action. Accessed 28 February 2011
19. National Institutes of Health. https://wiki.nci.nih.gov/display/CIP/
NBIA+at+CBIIT+Image+Collections. Accessed 28 February
2011
20. National Institutes of Health. https://cabig.nci.nih.gov/tools/sharable/
cagrid_overview?pid=primary.2006-07-07.4911641845&sid=
caGrid&status=True. Accessed 28 February 2011
21. National Institutes of Health. https://wiki.nci.nih.gov/display/Imaging/
Limited+dataset+−+user+agreement. Accessed 28 February 2011
22. National Institutes of Health. http://mircwiki.rsna.org/index.php?
title=CTP_Articles
23. National Institutes of Health. https://wiki.nci.nih.gov/display/CIP/
CIP+Survey+of+Biomedical+Imaging+Archives. Accessed 28
February 2011
24. National Institutes of Health. http://www.idoimaging.com/index.
shtml. Accessed 28 February 2011
25. Lien C-Y, Onken M, Eichelberg M, Kao T, Hein A: “Open source
tools for standardized privacy protection of medical images”,
Proc. SPIE 7967, 79670M, 2011. doi:10.1117/12.877989
24 J Digit Imaging (2012) 25:14–24
http://grants.nih.gov/grants/policy/data_sharing/
http://grants.nih.gov/grants/policy/data_sharing/
http://ssrn.com/abstract=1450006
http://cancergenome.nih.gov/
http://biospecimens.cancer.gov/archive/cahub/default.asp
http://biospecimens.cancer.gov/archive/cahub/default.asp
https://wiki.nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18+Supplement+142+into+CTP
https://wiki.nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18+Supplement+142+into+CTP
https://wiki.nci.nih.gov/display/CIP/Incorporation+of+DICOM+WG18+Supplement+142+into+CTP
http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/
http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/
http://privacyruleandresearch.nih.gov/pdf/FinalEnforcementRule06
http://privacyruleandresearch.nih.gov/pdf/FinalEnforcementRule06
http://privacyruleandresearch.nih.gov/research_repositories.asp
http://privacyruleandresearch.nih.gov/research_repositories.asp
https://imaging.nci.nih.gov/ncia/login.jsf
https://imaging.nci.nih.gov/ncia/login.jsf
http://ncicb.nci.nih.gov/
https://wiki.nci.nih.gov/dashboard.action
https://wiki.nci.nih.gov/dashboard.action
https://wiki.nci.nih.gov/display/CIP/NBIA+at+CBIIT+Image+Collections
https://wiki.nci.nih.gov/display/CIP/NBIA+at+CBIIT+Image+Collections
https://cabig.nci.nih.gov/tools/sharable/cagrid_overview?pid=primary.2006-07-07.4911641845&sid=caGrid&status=True
https://cabig.nci.nih.gov/tools/sharable/cagrid_overview?pid=primary.2006-07-07.4911641845&sid=caGrid&status=True
https://cabig.nci.nih.gov/tools/sharable/cagrid_overview?pid=primary.2006-07-07.4911641845&sid=caGrid&status=True
https://wiki.nci.nih.gov/display/Imaging/Limited+dataset+�+user+agreement
https://wiki.nci.nih.gov/display/Imaging/Limited+dataset+�+user+agreement
http://mircwiki.rsna.org/index.php?title=CTP_Articles
http://mircwiki.rsna.org/index.php?title=CTP_Articles
https://wiki.nci.nih.gov/display/CIP/CIP+Survey+of+Biomedical+Imaging+Archives
https://wiki.nci.nih.gov/display/CIP/CIP+Survey+of+Biomedical+Imaging+Archives
http://www.idoimaging.com/index.shtml
http://www.idoimaging.com/index.shtml
http://dx.doi.org/10.1117/12.877989
- Image Data Sharing for Biomedical Research—Meeting HIPAA Requirements for De-identification
Abstract
Background
Technical Issues in Multi-center Data Sharing
Data Transmission
De-identification
DICOM Supplement 142
Methods
National Biomedical Imaging Archive Project
Clinical Trial Processor
A Survey of Image Collections and Tools
Discussion
Support Multiple Means for Submitting Data
Use DICOM Supplement 142 Profile Templates
Do Not Overdo De-identification
Do Not Rely on DICOM to Indicate Burned-in PHI
Keep an Audit Trail of De-identification History
Enable Local Mapping Between Anonymized Identifiers and PHI
Provide End-to-End Transport Verification
Provide Multiple Levels of Data Verification
Carefully Estimate Resources Required
Conclusion
References
https://doi.org/10.1177/1460458218779101
Health Informatics Journal
2019, Vol. 25(4) 1538 –1548
© The Author(s) 2018
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1460458218779101
journals.sagepub.com/home/jhi
Same, same but different:
Perceptions of patients’ online
access to electronic health records
among healthcare professionals
Sofie Wass
Jönköping University, Sweden
Vivian Vimarlund
Jönköping University, Sweden; Linköping University, Sweden
Abstract
In this study, we explore how healthcare professionals in primary care and outpatient clinics perceive the
outcomes of giving patients online access to their electronic health records. The study was carried out as a case
study and included a workshop, six interviews and a survey that was answered by 146 healthcare professionals.
The results indicate that professionals working in primary care perceive that an increase in information-sharing
with patients can increase adherence, clarify important information to the patient and allow the patient to
quality-control documented information. Professionals at outpatient clinics seem less convinced about the
benefits of patient accessible electronic health records and have concerns about how patients manage the
information that they are given access to. However, the patient accessible electronic health record has not led
to a change in documentation procedures among the majority of the professionals. While the findings can be
connected to the context of outpatient clinics and primary care units, other contextual factors might influence
the results and more in-depth studies are therefore needed to clarify the concerns.
Keywords
electronic health records, healthcare service innovation and IT, organizational change and IT, patient-
centeredness, work impact
Introduction
During the last years, there has been an increasing focus on patient involvement and patient engage-
ment through a more patient-centered approach.1–4 Various strategies have been developed which
emphasize the importance of patient-centered care and the new demands that are associated with
this type of care.5,6 A great deal of effort has been put into, for instance, digital services which
potentially enhance information-sharing and communication with patients.4,7 One example of this
Corresponding author:
Sofie Wass, Jönköping International Business School, Jönköping University, P.O. Box 1026, 551 11 Jönköping, Sweden.
Email: sofie.wass@ju.se
779101 JHI0010.1177/1460458218779101Health Informatics JournalWass and Vimarlund
research-article2018
Original Article
https://uk.sagepub.com/en-gb/journals-permissions
https://journals.sagepub.com/home/jhi
mailto:sofie.wass@ju.se
http://crossmark.crossref.org/dialog/?doi=10.1177%2F1460458218779101&domain=pdf&date_stamp=2018-06-07
Wass and Vimarlund 1539
type of service is patient accessible electronic health records (PAEHRs) which have the potential
to improve healthcare delivery and health outcomes8,9 with benefits like improved recall and
understanding of healthcare information,10,11 increased adherence,12 and improved communication
between patients and healthcare professionals.10,11
However, there exist divergent reports about the effects that PAEHRs have on healthcare profes-
sionals and their daily work.8,9,13–19 A review identifies no negative effects for healthcare profes-
sionals who met with either outpatients or inpatients that could access the EHR.8 These effects are
also reflected in a study of a major project, where few physicians reported that they perceived an
increased workload, because visits took longer time or they received more questions from their
patients.9 Other studies report on concerns, from both primary care physicians and specialist physi-
cians, about patients becoming anxious or misinterpreting the content of the record.13–17 However,
there are studies that report that workload concerns, expressed prior to the system implementation,
diminished when the service was actually put into practice.9,18
In general, studies that describe the benefits of PAEHR and the effects on healthcare profession-
als seem to focus on multi-payer or market-based financial systems.9,12,20,21 In this study, we focus
on the perceptions of healthcare professionals who work in a publicly supported healthcare system
and compare the perceptions of professionals working in primary care and in outpatient clinics. We
seek answers to the following research questions: How do healthcare professionals perceive the
outcomes of PAEHRs, in their daily work activities? Do healthcare professionals perceive different
outcomes depending on the unit that they work in?
Case description
In February 2015, the Region of Jönköping County gave all patients, aged 18 or older, online
access to their EHRs. Through the national patient portal, patients can read their health-related
information, including medical notes, diagnosis and vaccinations. The medical notes that are
accessible on the system include information that was registered after 1 July 2014. If a patient
wants access to notes that were made before this date, they can request and receive these notes on
paper. The decision to make the information available in this manner was made to give healthcare
professionals the opportunity and the time necessary to modify the language that they used in the
medical notes, before patients access the records online. Healthcare professionals have 14 days
after documentation of the event to confirm and, if necessary, correct the medical notes before they
become accessible to the patient. After 14 days, all notes are accessible, whether they have been
confirmed or not. Healthcare professionals can use two keywords to keep information inaccessible
to the patient. These keywords can be used to withhold diagnoses which need further investigation,
and notes about sensitive life-situations; for example, violence against women. At the time of the
study, patients had been able to access the EHR for 15 months.
Methods
This study was performed as an explorative case study22 that included a workshop with six par-
ticipants, six interviews and a survey answered by 146 healthcare professionals. First, we con-
ducted a workshop to identify the expected benefits and drawbacks of the PAEHR. The workshop
participants included the project manager of the PAEHR, an eHealth strategist, a director of
communication, two physicians and the system owner. The participants were selected based on
their knowledge of the PAEHR and their ability to represent different actors in the organization.
During the workshop, we used a technique called “Pains and Gains” to structure the workshop.23
1540 Health Informatics Journal 25(4)
The participants were asked to develop a “persona”24 who represented a healthcare professional
whose patients could access the EHR. The persona was then used as a representation for health-
care professionals experiencing the introduction of the PAEHR and each participant was asked
to indicate the expected benefits and drawbacks that the persona would face due to the PAEHR.
The benefits and drawbacks were written on post-it notes and presented by each participant to
the group. Finally, the group discussed the benefits and drawbacks that were identified and
agreed on the most important outcomes.
During the second step of our research process, we interviewed four physicians, one nurse and
one therapist (from both primary care and clinics) to seek evidence that the results from the work-
shop had some generality. Additional information was gathered during the interviews, which com-
plemented the previously identified benefits and drawbacks. The interviewees were recruited from
a group of professionals that had answered a survey in an earlier study and had indicated their
willingness to participate in further studies. The interviewees were asked to answer questions
regarding their perceived benefits and drawbacks of the PAEHR and whether patient–professional
communication had changed since patient access to the EHRs was granted. The interviews were
recorded, subsequently transcribed and analyzed by means of inductive content analysis.25 The
first author reviewed the transcripts of the interviews and identified sentences that focused on the
benefits and drawbacks of the PAEHR. These different sentences and paragraphs were then labeled
with a code. By collating the codes that were related to each other in terms of their content, differ-
ent themes were derived from the interviews.
After analyzing the interviews, an online survey was created which was based on insights that
were gathered from the workshop and the interviews. The survey was sent out to healthcare
professionals at six different sites in the region: three primary care units and three outpatient
clinics. The respondents included healthcare professionals who were responsible for document-
ing information in the EHR, that is, 47 physicians, 50 nurses, 11 assistant nurses and 38 occupa-
tional therapists. The sites were selected to enable a comparison between the perceptions of
healthcare professionals working at primary care units and at outpatient clinics. Participation
was voluntary and anonymous and the individual answers were not accessible to any executives
at the Region of Jönköping County. The distribution of the surveys lasted for 3 weeks in May
2016 and in November 2016 to extend the data gathering. Two reminders were sent out and, in
total, 146 healthcare professionals completed the survey (total response rate 45%): 75 percent of
the respondents were women, age between 24 and 67, with a median working experience of
16 years or more. Twenty surveys, which did not include answers to all the questions, were
excluded. The respondents answered along a 5-point Likert-type scale whether they agreed or
disagreed to different statements. We examined the results across a three-level grade and com-
parisons between the respondents at primary care units and outpatient clinics were made. A sum-
mary of the data collection activities is presented in Figure 1.
Figure 1. Overview of the data collection activities.
Wass and Vimarlund 1541
The quality of case studies can be judged by a number of criteria.22 In the present study, con-
struct validity has been achieved through the use of multiple sources of evidence and by allowing
respondents to review collected data. To increase reliability, we strove to be transparent about how
the data were collected and we organized and structured the data into different categories.
Results
The results are presented in three sections. First, we present the results from the workshop and then
the results from the interviews and the survey.
Results from the workshop
The results from the workshop indicated that the participants expected the following benefits:
enhanced information-sharing, the possibility of establishing mutual understanding between the
patient and the care provider, increased patient involvement and a better prepared patient. The
expected drawbacks included the following: healthcare professionals being expected to be more
up-to-date about their patients’ situations and healthcare professionals being unable to document
desired information in the EHR. The participants of the workshop also mentioned the risk of
patients misinterpreting and getting upset about the information that is recorded in the EHR.
Results from the interviews
The interviewees noted a number of benefits and drawbacks with the PAEHR. The key themes that
emerged from the interviews are presented in Table 1 and described in more detail, with quotations
from the interviewees, in the following section.
Table 1. Benefits and drawbacks identified from the interviews.
Benefits Respondent Drawbacks Respondent
Enhanced information access P1, P3, N Exposed and vulnerable patients P1, T
Improved understanding P1, P2, P4, T Negative impact on work processes P3, P4
Increased patient involvement P1, P2, T, N Worries and misunderstandings P2, P3, P4
Positive impact on work processes P1, P2, N
“P” represents physicians P1, P2 and so on. “N” represents the nurse and “T” represents the therapist.
The PAEHR was viewed as an initiative that made information easier to access for the patient.
The interviewees spoke about patients accessing the information where and whenever he or she
wants. “… it can be positive for some patients, especially those who cannot take in all the infor-
mation, then they can access and read it at home in peace and quiet.” (P3) Several respondents
mentioned improved understanding as a benefit for the patient. “If you have been to an appoint-
ment and met the doctor, to be able to log in and double-check that you have understood the doc-
tor …” (T) They mentioned the possibility of accessing the EHR to repeat what was said, “It can
be nerve-wracking to have a doctors’ appointment, and you can have it as an extra memory.” (P4)
It was also described as a way to ensure that there was a mutual understanding of the situation,
“… if you need to repeat [the information], then you can understand more of what we really had
a mutual understanding about.” (P1)
Respondents talked about patients who could get more active and involved in healthcare because
of the PAEHR. One respondent said,
1542 Health Informatics Journal 25(4)
So I believe that it gets more transparent, and that we walk away from the hierarchal, patriarchal world
even more. That you actually co-produce … I believe that, that is part of the person-centered care, that the
patient gets more active in his or her own healthcare. (T)
One physician spoke about educating the patient. “You have to give them the tools so that they
really feel safe. It is very much about patient development and patient education.” (P1) Another
respondent hoped that the increase in information-sharing through the service would result in
fewer phone calls:
When you can follow your case, then I think and hope that we can avoid a lot of unnecessary phone calls.
People wonder what is happening and then they can actually find it on their own … A lot of questions are
actually asked because people do not know. (P2)
One respondent also mentioned the possibility of patients identifying and correcting inaccurate
entries in the EHR. “One time we got a notice and then we corrected it. It was not my notes but a
doctors’ note and it feels really good. Because then you can remove incorrect information.” (N)
The type of language used to document in the EHR was also discussed. For instance, one inter-
viewee said, “We have a working language just like any other craftsman or -woman, and that we
must continue to use, but we also have to be pedagogical in that we can write a summary on what
you call ‘simple Swedish’.” (P1)
Concerning drawbacks, two respondents mentioned the risk of other people who might force the
patient to show them their EHR. This included issues related to abortion and domestic violence. “I
see a risk if you are a young woman and have contacted us for an abortion for instance, and if your
parents then force you to access your health record.” (P1) The therapist mentioned domestic vio-
lence. “In destructive relations, where you can be forced to show the information.” (T) Some
respondents were worried that the PAEHR might cause a further strain on the provision of health-
care in that it would result in more questions being asked by patients. One respondent explained
that it was less time-consuming to call patients instead of patients accessing the information online
and then asking questions. “We are scared that it will result in much more work when they intro-
duce test results and radiology, that it will be a lot of extra questions.” (P4) The drawbacks of the
PAEHR also included concerns about patients getting worried or misinterpreting information. One
physician explained that there is sometimes a need to continue with a medical investigation before
one informs the patient about their condition. “What I am worried about is that people might get
hurt because they get a decision before I am able to find out … I might want to talk to colleagues.”
(P4) Another physician said, “… it is not that simple, and they cannot interpret it in the right way
on their own. If you see a blood level on 13, what do you understand of that?” (P3)
Results from the survey
More than 50 percent of the survey respondents reported that the PAEHR was a “good” or “very
good” initiative. When we compared the answers from respondents working in primary care units
and outpatient clinics, the data showed that there were small differences between these two groups
(55% vs 53%). However, the more detailed questions showed that the respondents working in
primary care were more positive toward the use of the PAEHR in comparison to the respondents
working in outpatient clinics.
Half of the respondents working in primary care “agreed” or “somewhat agreed” that the PAEHR
made it easier to clarify what is important to the patient, while only 26 percent of the respondents
from outpatient clinics perceived that the PAEHR contributes to the clarification of what is important
Wass and Vimarlund 1543
to the patient. The same pattern can be observed with issues related to adherence and patients fol-
lowing the advice of healthcare professionals; 50 percent of the respondents in primary care versus
35 percent of respondents in outpatient clinics “agreed” or “somewhat agreed” that patient access to
the EHR contributed to increased adherence; 36 percent of the respondents in primary care per-
ceived that it was easier to communicate with the patients, while 20 percent of the respondents in
outpatient clinics “agreed” or “somewhat agreed.” In general, the results show that the respondents
in the outpatient clinics were less positive to the PAEHR and they “disagreed” or “somewhat disa-
greed” to a larger extent on this point than respondents from primary care (Table 2).
Slightly more respondents in primary care “agreed” or “somewhat agreed” that patient access
had made the patient more involved in his or her treatment and more prepared. However, more
respondents working in primary care units were positive to issues related to quality-control than
the respondents working in outpatient units. 46 percent of the respondents in primary care versus
26 percent of the respondents in outpatient clinics “agreed” or “somewhat agreed” that patient
access contributed to quality-control (Table 3).
Table 2. Healthcare professionals’ perceptions about benefits of the PAEHR.
The service makes it
easier to …
Primary care units/
outpatient clinics
Agree or
somewhat agree
Neither agree
nor disagree
Disagree or
somewhat disagree
Clarify what is
important to the patient
Primary 50% (33) 26% (17) 24% (16)
Outpatient 26% (21) 38% (30) 36% (29)
Increase adherence to
the advice I provide the
patient
Primary 50% (33) 35% (23) 15% (10)
Outpatient 35% (28) 35% (28) 30% (24)
Communicate with the
patient
Primary 36% (24) 36% (24) 27% (18)
Outpatient 20% (16) 39% (31) 41% (33)
Table 3. Healthcare professionals’ perceptions about benefits related to the patient.
The service has
made the patient …
Primary care units/
outpatient clinics
Agree or
somewhat agree
Neither agree
nor disagree
Disagree or
somewhat disagree
More involved in his
or her treatment
and/or rehabilitation
Primary 35% (23) 44% (29) 21% (14)
Outpatient 24% (19) 55% (44) 21% (17)
More prepared for
an appointment
Primary 29% (19) 49% (32) 23% (15)
Outpatient 23% (18) 44% (35) 34% (27)
Able to quality-
control what I
document
Primary 46% (30) 38% (25) 17% (11)
Outpatient 26% (21) 46% (37) 28% (22)
Table 4 shows that respondents working in outpatient clinics are more concerned about patients
becoming upset, worried or misunderstanding information in the EHR. For instance, 45 percent of
the respondents working in outpatient clinics “agreed” or “somewhat agreed” that the patient
becomes upset, while only 26 percent of the respondents from primary care perceived patients
becoming upset. A similar difference exists with respect to issues related to worries and misunder-
standings; 53 percent of the respondents in outpatient clinics “agreed” or “somewhat agreed” that
1544 Health Informatics Journal 25(4)
the patient becomes worried, versus 36 percent of the respondents working in primary care.
Moreover, 49 percent of the respondents in outpatient clinics “agreed” or “somewhat agreed” that
the patient misunderstands information in the health record, compared to 33 percent in primary
care (Table 4).
The PAEHR seems to have little impact on the healthcare professionals’ work and we observed
only small differences between respondents working in primary care units and in outpatient clin-
ics; 8 percent of the respondents working in primary care reported that they spend more time on
appointments versus 10 percent of the respondents working in outpatient clinics. Similarly,
23 percent (primary care) versus 24 percent (outpatient clinics) of the respondents “agreed” or
“somewhat agreed” that they spend more time on writing or dictating notes; 26 percent of those
working in primary care units and 31 percent of the respondents working in outpatient clinics
“agreed” or “somewhat agreed” that they cannot document everything what they want in the
medical notes (Table 5).
Most respondents had not changed the way they document information in the record (79% of
those working in primary care and 65% of those working at outpatient clinics). Those who did,
reported that they had changed the way they record specific symptoms related to mental illness (17
respondents), obesity (11 respondents), cancer (9 respondents) and drug abuse (9 respondents).
These professionals also noted changing the language they use, including less ‘provocative’ lan-
guage (23 respondents), fewer abbreviations (17 respondents), and the use of Latin words (13
respondents). Only 5 percent of the respondents, from both groups, had used the special keywords
to withhold information from patients. Respondents in outpatient clinics seem to be less aware of
Table 4. Healthcare professionals’ perceptions about drawbacks related to the patient.
The service has resulted
in that the patient …
Primary care units/
outpatient clinics
Agree or
somewhat agree
Neither agree
nor disagree
Disagree or
somewhat disagree
Becomes upset about
the information that can
be read
Primary 26% (17) 55% (36) 20% (13)
Outpatient 45% (36) 39% (31) 16% (13)
Becomes worried about
the information that can
be read
Primary 36% (24) 47% (31) 17% (11)
Outpatient 53% (42) 35% (28) 13% (10)
Misunderstands the
information in the
health record
Primary 33% (22) 47% (31) 20% (13)
Outpatient 49% (39) 38% (30) 14% (11)
Table 5. Healthcare professionals’ perceptions about drawbacks of the PAEHR.
The service has resulted
in that I …
Primary care units/
outpatient clinics
Agree or
somewhat agree
Neither agree
nor disagree
Disagree or
somewhat disagree
Use more time for the
appointment/phone call
Primary 8% (5) 38% (25) 55% (36)
Outpatient 10% (8) 45% (36) 45% (36)
Use more time to
dictate/write information
Primary 23% (15) 30% (20) 47% (31)
Outpatient 24% (19) 38% (30) 39% (31)
Cannot document what
I want in the medical
notes
Primary 26% (17) 29% (19) 46% (30)
Outpatient 31% (25) 24% (19) 45% (36)
Wass and Vimarlund 1545
the use of the keywords to withhold information for the patient; 48 percent did not know that the
keywords existed, compared to 29 percent of the respondents in primary care (Table 6).
Discussion
In this study, we have explored how healthcare professionals perceive the outcomes of PAEHRs,
by comparing the perceptions of healthcare professionals working in primary care units and in
outpatient clinics. The results of the study indicate that professionals working in primary care per-
ceive that an increase in sharing information with patients can be beneficial. The majority of them
perceive this as a way to increase adherence and to clarify important information to the patient. It
is also seen as an opportunity for the patient to control what is documented in primary care notes.
This is consistent with the perceptions of patients who have described similar benefits, like
increased adherence12 and enhanced understanding and recall of health information.9–11 Healthcare
professionals at outpatient clinics, on the other hand, seem less convinced about the benefits of the
PAEHR. The survey results indicate that they are neither positive nor negative toward statements
about increased adherence, clarification of information and quality-control.
In previous studies, results show that PAEHRs make patients feel more involved in their own
care.9,10,11,26 In this study, we also found indications for this during the workshop and in the inter-
views. However, the perceptions reported in the survey revealed that most healthcare professionals
do not perceive that their patients are more involved in his or her treatment or rehabilitation, and
neither do they claim that their patients are more prepared for their appointments. Reports of an
increased workload were rare in both groups and the majority of the healthcare professionals
reported that there was no impact on the time that was used for setting up and attending to appoint-
ments or documenting information. Around one-fifth of the respondents reported that they used
more time to dictate or write notes, and that they had to be more up-to-date about their patients’
health records. This finding is consistent with the study conducted by Delbanco et al.,9 which
reported on little impact on the length of appointments or time used for documentation.
In line with previous studies,14–17 the results from the workshop, the interviews and the survey
reflect concerns about increased patient anxiety and misunderstandings of the information in the
EHR. This was especially perceived by healthcare professionals in outpatient clinics, where
approximately 50 percent “agreed” or “somewhat agreed” to the statements about patient anxiety,
misunderstandings and patients getting upset. In addition, half of the professionals working in
outpatient clinics were not aware that it is possible to withhold information from the patient with
respect to on-going diagnoses. Although healthcare professionals are concerned about patient anxi-
ety and misunderstandings of the information in the EHR, the patient access to the EHR has not led
to a change in documentation procedures among the majority of the healthcare professionals. In
those cases that changes were made, specific symptoms, the use of abbreviations and Latin terms
Table 6. Perceptions about the impact on documentation.
Questions Primary care units/
outpatient clinics
Yes No No, I did not know
that they existed
Have you changed the
way you document
information?
Primary 21% (14) 79% (52) −
Outpatient 35% (28) 65% (52) −
Have you ever used
the keywords?
Primary 5% (3) 67% (44) 29% (19)
Outpatient 5% (4) 48% (38) 48% (38)
1546 Health Informatics Journal 25(4)
were items that were affected by the change. One explanation could be that outpatient clinics meet
patients with severe or even life-threatening diseases; however, they also meet chronically ill
patients who tend to be knowledgeable about their condition. In contrast to the concerns raised by
the healthcare professionals, Rexhepi et al.27 found that cancer patients preferred to access their
EHR even if some of the information contained therein was difficult to understand. They also note
that this information did not generate anxiety or undue concern.
One limitation of our study is that we do not compare the perceptions of different types of pro-
fessionals. Despite this limitation, we believe that this study presents a relevant discussion of the
perceptions of healthcare professionals’ working in both primary care units and outpatient clinics
and incorporates several different types of professionals and their perceptions of PAEHRs. It
should also be noted that there is a difference in the response rate between the two groups, and that
additional studies are needed. Since case studies aim to reach analytical generalization,25 our study
provides knowledge about perceptions in a specific context. It is thus of some importance to further
investigate whether these findings can be replicated in other settings.
Conclusion
Healthcare organizations are moving toward increased information-sharing with patients through
the application of various services. Previous studies indicate that patients wish to get online access
to their healthcare information,10,28,29 but there are divergent reports on how healthcare profession-
als perceive an increase in information-sharing. Our study shows that healthcare professionals who
work in primary care find benefits like increased adherence, clarification of important information
and the possibility for patients to control what is documented. Professionals in outpatient clinics
seem to be less convinced about the benefits of PAEHRs. The concerns include patients becoming
upset, unduly worried and misinterpreting information. Nevertheless, patient access to the EHR
has not led to a change in documentation procedures among the majority of the healthcare profes-
sionals. The PAEHR was expected to increase patient involvement and allow patients to be more
prepared. However, the healthcare professionals did not perceive this. While these findings could
be connected to the context of outpatient clinics and primary care units, other contextual factors
might influence the results. More in-depth studies are therefore needed to clarify if and why there
are differences.
Since the concerns that were identified in this study are mainly about how patients will manage
their access to the information and not the impact this may have on the healthcare professionals’
work, it is important for managers to disseminate research findings with respect to patients’ experi-
ences, so as to ease the concerns of healthcare professionals. This seems to be especially important
in outpatient clinics. In addition to this, a greater awareness about the possibility of withholding
information for the patient about incomplete or tentative diagnoses, that need further investigation
until otherwise confirmed, might limit the professionals’ perception of being restricted in terms of
what to document in the EHR.
Future research should be conducted on further investigating if professionals in outpatient
clinics and primary care units have different concerns about their patients accessing the EHR
and possible explanations for such results. It will be of interest to clarify whether this can be
connected to organizational factors, patient groups and the different practices of healthcare
professionals.
Wass and Vimarlund 1547
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publi-
cation of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
References
1. Koch S.Improving quality of life through eHealth—the patient perspective. In: Proceedings of The 24th
medical informatics in Europe conference, Pisa, 26–29 August 2012, pp. 25–29.
2. Barry MJ and Edgman-Levitan S. Shared decision making—the pinnacle of patient-centered care.
N Engl J Med 2012; 366: 780–781.
3. Berwick DM, Nolan TW and Whittington J. The triple aim: care, health, and cost. Health Affairs 2008;
27: 759–769.
4. Dwamena F, Holmes-Rovner M, Gaulden CM, et al. Interventions for providers to promote a patient-
centred approach in clinical consultations. Cochrane Database Syst Rev 2012; 12: CD003267.
5. HealthIT.gov. Meaningful use regulations 2016. Washington, DC: HealthIT.gov, 2016.
6. European Commission. eHealth action plan 2012–2020—innovative healthcare for the 21st century.
Brussels: European Commission, 2012, pp. 1–14.
7. Tang C, Lorenzi N, Harle CA, et al. Interactive systems for patient-centered care to enhance patient
engagement. J Am Med Inform Assoc 2016; 23: 2–4.
8. Ross SE and Lin C-T. The effects of promoting patient access to medical records: a review. J Am Med
Inform Assoc 2003; 10: 129–138.
9. Delbanco T, Walker J, Bell SK, et al. Inviting patients to read their doctors’ notes: a quasi-experimental
study and a look ahead. Ann Intern Med 2012; 157: 461–470.
10. Woods SS, Schwartz E, Tuepker A, et al. Patient experiences with full electronic access to health records
and clinical notes through the My HealtheVet Personal Health Record Pilot: qualitative study. J Med
Internet Res 2013; 15: e65.
11. Esch T, Mejilla R, Anselmo M, et al. Engaging patients through open notes: an evaluation using mixed
methods. BMJ Open 2016; 6: e010034.
12. Wright E, Darer J, Tang X, et al. Sharing physician notes through an electronic portal is associated with
improved medication adherence: quasi-experimental study. J Med Internet Res 2015; 17: e226.
13. Prey JE, Restaino S and Vawdrey DK. Providing hospital patients with access to their medical records.
AMIA Annu Symp Proc 2014; 2014: 1884–1893.
14. Ross SE, Todd J, Moore LA, et al. Expectations of patients and physicians regarding patient-accessible
medical records. J Med Internet Res 2005; 7: e13.
15. Johnson AJ, Frankel RM, Williams LS, et al. Patient access to radiology reports: what do physicians
think? J Am Coll Radiol 2010; 7: 281–289.
16. Grünloh C, Cajander Å and Myreteg G. “The record is our work tool!”—physicians’ framing of a patient
portal in Sweden. J Med Internet Res 2016; 18: e167.
17. De Lusignan S, Mold F, Sheikh A, et al. Patients’ online access to their electronic health records and
linked online services: a systematic interpretative review. BMJ Open 2014; 4: 1–11.
18. Ålander T and Scandurra I. Experiences of healthcare professionals to the introduction in Sweden
of a public eHealth service: patients’ online access to their electronic health records. In: MEDINFO,
São Paulo, 19–23 August 2015, pp. 153–7. Amsterdam: IOS Press.
19. Oster NV, Jackson SL, Dhanireddy S, et al. Patient access to online visit notes: perceptions of doctors
and patients at an urban HIV/AIDS clinic. J Int Assoc Provid AIDS Care 2015; 14: 306–312.
20. Vodicka E, Mejilla R, Leveille SG, et al. Online access to doctors’ notes: patient concerns about privacy.
J Med Internet Res 2013; 15: e208.
1548 Health Informatics Journal 25(4)
21. Mafi JN, Mejilla R, Feldman H, et al. Patients learning to read their doctors’ notes: the importance of
reminders. J Am Med Inform Assoc 2016; 23: 951–955.
22. Yin RK. Case study research: design and methods. 5th ed. Thousand Oaks, CA: SAGE, 2014.
23. Gray D, Brown S and Macanufo J. Gamestorming: a playbook for innovators, rulebreakers, and change-
makers. Sebastopol, CA: O’Reilly, 2010.
24. Cooper A. The inmates are running the asylum. Indianapolis, IN: Sams Publishing; Macmillan, 1999.
25. Graneheim UH and Lundman B. Qualitative content analysis in nursing research: concepts, procedures
and measures to achieve trustworthiness. Nurse Educ Today 2004; 24: 105–112.
26. Wass S, Vimarlund V and Ros A. Exploring patients’ perceptions of accessing electronic health records:
innovation in healthcare. Health Informatics J 2019; 25(1): 203-215. DOI: 10.1177/1460458217704258.
27. Rexhepi H, Åhlfeldt RM, Cajander Å, et al. Cancer patients’ attitudes and experiences of online access
to their electronic medical records: a qualitative study. Health Informatics J 2016; 24: 115–124.
28. Nazi KM, Hogan TP, McInnes DK, et al. Evaluating patient access to electronic health records: results
from a survey of veterans. Medical Care 2013; 51: S52–S56.
29. Baer D. Patient-physician e-mail communication: the Kaiser permanente experience. J Oncol Pract
2011; 7: 230–233.
Review
Requirements of Health Data Management Systems for Biomedical
Care and Research: Scoping Review
Leila Ismail1, PhD; Huned Materwala1, MS; Achim P Karduck2, PhD; Abdu Adem3, PhD
1Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, Abu
Dhabi, United Arab Emirates
2Faculty of Informatics, Furtwangen University, Furtwangen, Germany
3College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, Abu Dhabi, United Arab Emirates
Corresponding Author:
Leila Ismail, PhD
Department of Computer Science and Software Engineering
College of Information Technology
United Arab Emirates University
Al Maqam Campus
Al Ain, Abu Dhabi, 15551
United Arab Emirates
Phone: 971 37673333 ext 5530
Email: leila@uaeu.ac.ae
Abstract
Background: Over the last century, disruptive incidents in the fields of clinical and biomedical research have yielded a tremendous
change in health data management systems. This is due to a number of breakthroughs in the medical field and the need for big
data analytics and the Internet of Things (IoT) to be incorporated in a real-time smart health information management system. In
addition, the requirements of patient care have evolved over time, allowing for more accurate prognoses and diagnoses. In this
paper, we discuss the temporal evolution of health data management systems and capture the requirements that led to the
development of a given system over a certain period of time. Consequently, we provide insights into those systems and give
suggestions and research directions on how they can be improved for a better health care system.
Objective: This study aimed to show that there is a need for a secure and efficient health data management system that will
allow physicians and patients to update decentralized medical records and to analyze the medical data for supporting more precise
diagnoses, prognoses, and public insights. Limitations of existing health data management systems were analyzed.
Methods: To study the evolution and requirements of health data management systems over the years, a search was conducted
to obtain research articles and information on medical lawsuits, health regulations, and acts. These materials were obtained from
the Institute of Electrical and Electronics Engineers, the Association for Computing Machinery, Elsevier, MEDLINE, PubMed,
Scopus, and Web of Science databases.
Results: Health data management systems have undergone a disruptive transformation over the years from paper to computer,
web, cloud, IoT, big data analytics, and finally to blockchain. The requirements of a health data management system revealed
from the evolving definitions of medical records and their management are (1) medical record data, (2) real-time data access, (3)
patient participation, (4) data sharing, (5) data security, (6) patient identity privacy, and (7) public insights. This paper reviewed
health data management systems based on these 7 requirements across studies conducted over the years. To our knowledge, this
is the first analysis of the temporal evolution of health data management systems giving insights into the system requirements
for better health care.
Conclusions: There is a need for a comprehensive real-time health data management system that allows physicians, patients,
and external users to input their medical and lifestyle data into the system. The incorporation of big data analytics will aid in
better prognosis or diagnosis of the diseases and the prediction of diseases. The prediction results will help in the development
of an effective prevention plan.
(J Med Internet Res 2020;22(7):e17508) doi: 10.2196/17508
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 1https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
mailto:leila@uaeu.ac.ae
http://dx.doi.org/10.2196/17508
http://www.w3.org/Style/XSL
http://www.renderx.com/
KEYWORDS
big data; blockchain; data analytics; eHealth; electronic medical records; health care; health information management; Internet
of Things; medical research; mHealth
Introduction
The notion of health data management systems has evolved
during the last century. With the evolution of medical records
from paper charts to electronic health records (EHRs) [1], health
data management has undergone disruptive transitions to provide
more accurate and better patient care and make qualitative use
of these records. This shift is underpinned by the advancement
in information technologies that led to the development of
several notions of health data management systems. Those
health data management systems were often misaligned with
the goals of biomedical care and research. This misalignment
is caused particularly by the discrepancies between advanced
technologies and their adoption for biomedical care and research.
Consequently, it becomes vital to address this gap by developing
a new framework for the health data management system. In
this paper, we provide a broader history and evolution of health
data management systems underpinned by the changing
definition of medical records, discuss the issues prevailing
within, introduce the modern aspects of health data management
systems supporting the growing size of medical data, and discuss
insights and provide solutions aiming for a better health care
ecosystem.
The introduction of EHRs has transformed the health care
industry by providing more services, improving the quality of
patient care, and enhancing the data access ability in real time,
thereby creating a diverse set of health data management systems
[2]. Our understanding of EHRs is that it provides a longitudinal
view of a patient’s medical history over his or her lifetime
generated by one or more health care providers or medical
organizations delivering treatments to that patient. These
cohesive and summarized records include the patient’s
demographic and personal information, past and current
diagnoses and treatments, progress notes, laboratory and
radiology results, allergies, and immunizations [1]. However,
an earlier form of EHRs referred to as paper charts involves
written records of a patient’s diagnosis and treatments for the
purpose of medical teaching. Next, the term has been revised
to computer-based patient records, electronic medical records,
and currently EHRs. With the advancement in technological
developments and the goal to provide better and efficient health
care, health data management systems have evolved from a
computer-based approach to client-server–based, cloud, the
Internet of Things (IoT), and finally to blockchain-based system.
With the rise of big health care data and the realization of using
medical data for governance and research, it becomes necessary
to integrate big data analytics within health data management
systems [3]. However, this brings new challenges of data
aggregation and preprocessing from multiple sources to develop
insights, data security, and privacy to cope with an increasing
number of data breaches and hacking incidents [4]. Further
challenges have been imposed on biomedical care and research
by the nature and types of medical data being generated at a
rapid pace. These challenges have developed the need for a new
health data management framework.
This paper analyzes the requirements for better patient care and
predictive analysis that must be considered when implementing
a health data management system. Considering these
requirements will make the health care data management system
more accurate, efficient, and cost-effective. To our knowledge,
this is the first analysis of the temporal evolution of health data
management systems to give insights into the system
requirements for better health care.
The contributions of this paper are three-fold. First, the paper
provides a taxonomy of health data management systems based
on their technological advancement, and the inherent challenges
and issues are discussed therein. Second, we present the
reforming definitions of medical records and extract the
requirements of a health data management system. Third, the
paper provides insights into the health data management system
research and guidelines for the future research area.
Related Works
Health data management systems are evolving for better health
care. Literature reviews on these systems are classified into 2
categories: (1) electronic health (eHealth) [5-8] and (2) mobile
health (mHealth) [9].
Regarding eHealth, the study by Jamal et al [5] reviews the
impact of a computerized system on the quality of health care.
The results showed that a health information system, if properly
designed, can prevent medical errors and can support doctors
and medical providers in diagnosis. The study by Van De Belt
et al [6] reviews the definitions of health and medicine over 2
years (from 2007 to 2009), coming up with a common definition
involving the web, patients, professionals, social networking,
health information content, and collaboration. In this study, we
reveal additional requirements needed for better health care:
privacy, security, public insights, and patient participation in
accessing and monitoring medical data. The studies by Hans et
al [7] and Cunningham et al [8] focus on the definitions of
eHealth from 1999 to 2004. The authors found that the themes
health and technologies are most recurrent in all definitions.
Concerning mHealth, Silva et al [9] provide a review of mHealth
apps and services. It highlights that the coordination, integration,
and interoperability between different mHealth apps is important
for better health care as well as improved performance of mobile
devices in terms of device battery, storage, computation, and
network.
In this study, we reviewed health data management systems
based on the following 7 requirements across studies conducted
over the years: (1) medical record data, (2) real-time data access,
(3) patient participation, (4) data sharing, (5) data security, (6)
patient identity privacy, and (7) public insights.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 2https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Methods
For the analysis and study of the evolution of health data
management systems, we reviewed published research articles,
reports, medical lawsuits, and health care regulations; acts about
the methods of organizing medical record data; and the needs
of a health data management system. The literature was searched
in the Institute of Electrical and Electronics Engineers,
Association for Computing Machinery, Elsevier, MEDLINE,
PubMed, Scopus, and Web of Science databases from 1793 to
2020. We selected the papers that included incidents that
involved the definitions of a health data management system,
triggered the introduction of a new system, and/or implemented
technologies for better health care. The analysis of these papers
shows that advances in technologies are being adopted for
accurate and efficient patient care.
Results
Taxonomy of Health Data Management Systems
Before satisfying the requirements of biomedical care and
research, the evolution of the underlying health data
management systems and their limitations must be understood.
The capabilities of the health data management should ensure
that the requirements of patient care are met. Health data
management systems have undergone multiple transitions over
the years alongside the advancement in information technologies
as shown in Figure 1. During this evolution, several programs
were established and regulation acts were passed to improve
the quality of patient care. Table 1 presents the events that
triggered the evolution of health data management systems.
Table 2 presents the limitations of health data management
systems.
Figure 1. Evolution of the health data management system.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 3https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Table 1. Events that have triggered the evolution of health data management systems.
Evolutionary changeResponsible authorityYear
Rule to record patients’ data for hospital expenditure justification was passed [10].
Board of Governors of the Society of
the New York Hospital
1793
Rule to record major medical cases for education was passed provoked by a fatal dispute between
an American statesman and an American politician. According to the rule, the recorded cases
Board of Governors of the Society of
the New York Hospital
1805
should be bounded in a book for inspection by the governors, medical professionals and students,
and the friends of the patients [11].
Rule to maintain a record of all the medical cases [11].Board of Governors of the Society of
the New York Hospital
1830
A hospital standardization program was established to standardize the format of medical records
for improved patient care [12].
American College of Surgeons1918
American Association of Record Librarians of North America was established to enhance the
standards of medical records [13].
American College of Surgeons1928
The idea to use computers for medical records was proposed to allow doctors to track a patient’s
medical history and provide evidence for the treatment [14].
A problem-oriented medical records model was developed to standardize the method of EMRsa
that provided a structure to help doctors record their notes [14].
Lawrence WeedThe 1960s
Paper charts were termed as EMRs.N/AbThe 1960s
Rule to record patients’ data by medical nurses for medical insurance reimbursement with the
introduction of Medicare and Medicaid laws [15].
Centers for Medicare and Medicaid
Services
1965
First commercial computerized health data management system known as Clinical Information
System was developed for El Camino Hospital. The system included features for laboratory tests,
appointment scheduling, and pharmacy management [16].
Lockheed Corporation1965
First clinical decision support system known as Health Evaluation of Logical Processing was
developed to support clinical operations. The system helped doctors to identify cardiac contraction
University of Utah, 3M and Latter-
Day Saints Hospital
1967
based on a patient’s test results’ analysis and to select an appropriate medication for infectious
disease cases [17].
The first modular computer-based health data management system known as Computer Stored
Ambulatory Record was implemented. The system accommodated clinical vocabularies through
clinical mapping to recognize different terms used for the same disease [18].
Massachusetts General Hospital and
Harvard University
1968
MPIc was introduced to keep track of patients’ medical data to reduce unnecessary testing and
adverse drug effects [19].
Indian Health ServiceThe 1980s
Electronic standards were developed to address the standardization issues of health data manage-
ment system development and adaption. The standards allowed the use of components from
different vendors in a health data management system [20].
Health Level Seven1987
The term computer-based patient records was introduced in a report studying the benefits of
electronic management of health records [21].
Institute of Medicine1991
The Health Insurance Portability and Accountability Act was passed to safeguard patients’
medical records by involving role-based access control, automatic data backup, audit trails, and
data encryption [22].
US Congress1996
The term eHealthd was coined that refers to the integration of electronic communication and in-
formation technologies for electronic transmission, storage, and retrieval of medical records both
locally and remotely [23].
John Mitchell1999
The term mHealthe was coined that refers to wireless telemedicine using mobile telecommunica-
tions and multimedia technologies for the new mobile health care system [24].
S Laxminarayan and Robert SH Is-
tepanian
2000
The definition of eHealth was expanded by incorporating business and public health to health
services and defining the outcomes and stakeholders of eHealth [25].
Gunther Eysenbach2001
The term uHealthf was coined that refers to the use of biometric sensors and medical devices to
monitor and improve a patient’s medical health [26].
Stephen S Intille2004
Proposed a formal definition of the term personal health records that allows patients to access
their medical history and to manage it by making part of it available to selected participants by
defining access control rights [27].
Health care organizationsThe 2000s
The term electronic health records [28] was coined.Institute of Medicine2003
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 4https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Evolutionary changeResponsible authorityYear
Massachusetts health care reform law was passed that mandated for residents to have minimum
medical insurance coverage and for employers with more than 10 full-time employees to provide
medical insurance coverage [29].
Commonwealth of Massachusetts2006
The term Accountable Care Organizations was coined that refers to a group of doctors, hospitals,
and other health care providers who volunteer to give high-quality care to their patients to avoid
unnecessary duplication of services and reduce medical errors [30].
Elliott Fisher2006
The Health care Fraud Prevention and Enforcement Action was established to strengthen the
existing programs to prevent and reduce Medicare and Medicaid frauds [31].
US Department of Justice, Office of
Inspector General, and Human and
Health Services
2009
The Patient Protection and Affordable Care Act was signed into law with an objective to provide
an expansion of medical insurance coverage [32].
US President Barack Obama2010
aEMR: electronic medical record.
bN/A: not applicable.
cMPI: master patient index.
deHealth: electronic health.
emHealth: mobile health.
fuHealth: ubiquitous health
Table 2. Limitations of health data management systems.
LimitationHealth data management
system
Illegible handwriting resulting in incorrect treatments [33] and deaths [34,35]. Requires physical storage and are sus-
ceptible to unplanned destruction such as flood, fire, rodents, and degradation. Physically cumbersome to read, understand,
and search for specific information. The cost and time required for paper charts to be requested for duplication and then
delivered are unacceptably high.
Paper charts
Medical records are managed by the physicians and cannot be accessed by the patients. Physicians visiting a patient
have to note down or memorize the patient’s medical data to return to the hospital and record it digitally, which may
lead to error.
Computer-based
A patient has no traceability on how his or her data are used. The issues of security, privacy, and single point of failure.
In addition, a cohesive view of a patient’s medical data from multiple hospitals is difficult. Requires repeating medical
tests at times, which results in more time, cost, and effect on health conditions.
Client-server-based
Single point of failure, loss of data control and stewardship, a requirement of steady internet connection, and data reli-
ability [36,37].
Cloud-based
Data security and patient privacy are a major concern.
IoTa-based
The process of data aggregation from different storage sites is time consuming, complex, and expensive. The data are
stored using different formats and requires preprocessing. In addition, preserving the security of the data and privacy
of the patient identity while maintaining the usefulness of data for analysis and studies is quite challenging.
Big-data–based
The process of ledger update on multiple nodes is energy consuming [38] and suffers from the issue of low throughput
[39].
Blockchain-based
aIoT: Internet of Things.
Requirements of a Health Data Management
System
Over the last century, the definition of health data management
has undergone numerous reformations to address the need for
better and advanced patient care alongside technological
advances. We evaluated these differing examinations and
rationalized the definition used in the remainder of the paper.
It is important to note that, as the term health data management
is rather recent, the listed definitions were taken from different
legislations and health data recording systems, even if the exact
phrase health data management was not used. Table 3 shows
the evolving definitions of health data management systems
from being purely medical practice and learning-based
definitions to being more patient-centric and research-based
definitions. We classified health data management systems
based on 7 requirements that underpin the evolution in the field
as shown in Figure 2. Each number in the figure represents a
definition stated in Table 3.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 5https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Table 3. The definitions of health data management systems.
DefinitionSourceYearNumber
“[…] Names and Diseases of the Persons, received, deceased or discharged in the same, with the date of
each event, and the Place from whence the Patients last came […]”
Siegler [10]17931
“The house physician, with the aid of his assistant, under the direction of the attending physician, shall keep
a register of all medical cases which occur in the hospital, and which the latter shall think worthy of
preservation, which book shall be neatly bound, and kept in the library for the inspection of the friends of
the patients, the governors, physicians and surgeons, and the students attending the hospital.”
Siegler [10]18052
“Accurate and complete medical records […] which includes identification data; complaint; personal and
family history; history of the present illness; physical examination; special examinations such as consultations,
clinical laboratory, x-ray and other examinations; provisional or working diagnosis; medical or surgical
treatment; gross or microscopical pathological findings; progress notes; final diagnosis; condition on dis-
charge; follow-up; and, in case of death, autopsy findings.”
Sayles and Gordon
[12]
19413
“The computer is making a major contribution […] the patient will gain from his physician an immediate
sympathetic understanding […] inadequate analysis by the medical profession can be avoided.”
Weed [14]19684
“[…] orient data around each problem […] complete list of all the patient’s problems […] diagnosis and all
other unexpected findings or symptoms […] The list is separated into active and inactive problems, and in
this way, those of immediate importance are easily discernible […] orders, plans, progress notes and numer-
ical data can be recorded under the numbered and titled problem […]”
Weed [14]19685
“Digital versions of paper charts that contain the medical and treatment history of the patients from one
practice for providers to use for diagnosis and treatment”
Cynthia [40]19936
“Electronic patient record […] support users through availability of complete and accurate data, practitioner
reminders and alerts, clinical decision support systems, links to bodies of medical knowledge, and other
aids.”
Dick et al [21]19977
“[…] medical informatics, public health and business, referring to health services and information delivered
or enhanced through the Internet and related technologies […] an attitude, and a commitment for networked,
global thinking, to improve health care locally, regionally, and worldwide by using information and com-
munication technology.”
Eysenbach [25]20018
“The subjective component contains information about the problem […] objective information consists of
those observations made by the counselor […] assessment section demonstrates how […] data are formulated,
interpreted, and reflected upon, and the plan section summarized the treatment direction.”
Cameron and Turtle-
Song [41]
20029
“[…] electronic application through which individuals can access, manage and share their health information,
and that of others for whom they are authorized, in a private, secure, and confidential environment.”
Markle Foundation
[42]
200310
“[…] longitudinal electronic record of patient health information generated by one or more encounters […]
patient demographics, progress notes, problems, medications, vital signs, past medical history, immunizations,
laboratory data and radiology reports […] automates and streamlines the clinician’s workflow. The EHRs
has the ability to generate a complete record of a clinical patient encounter […] evidence-based decision
support, quality management, and outcomes reporting.”
HIMSSa [1]200311
“The Electronic Health Record (EHR) is a secure, real-time, point-of-care, patient-centric information re-
source […] decision making by providing access to patient health record information where and when they
need it and by incorporating evidence-based decision support […] billing, quality management, outcomes
reporting, resource planning, and public health disease surveillance and reporting.”
HIMSS [43]200312
“[…] lifelong resource of health information needed by individuals to make health decisions. Individuals
own and manage the information […] is maintained in a secure and private environment, with the individual
determining rights of access […]”
AHIMAb [44]200513
“Health data management […] acquiring, entering, processing, coding, outputting, retrieving, and storing
of data gathered in the different areas of health care […] also embraces the validation and control of data
according to legal or professional requirements.”
Böcking and Tro-
janus [45]
200814
“A major goal […] to protect the privacy of individuals’ health information […] adopt new technologies
to improve the quality and efficiency of patient care.”
HIPAAc [22]201315
aHIMSS: Healthcare Information and Management Systems Society.
bAHIMA: American Health Information Management Association.
cHIPAA: Health Insurance Portability and Accountability Act.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 6https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Figure 2. Requirements of a health data management system.
Medical Record Data
Medical record data that describes the identity and health of a
patient based on the personal and demographic identity, history
of the medical condition, ongoing treatment, laboratory tests,
and radiology results are the common requirement of a health
data management system. The medical records have been a
primary component throughout the evolution of health data
management systems, whether in the form of printed documents
or digital records.
Real-Time Data
To improve the quality of patient care, the requirement of
real-time data access was highlighted in the definitions of health
data management systems. This requirement reduces the medical
incidents owing to the delay in data updates by the physicians.
However, this requirement cannot be fulfilled by the paper-based
and computer-based health data management systems. This
requirement was introduced with the client-server–based
management system [46-52] that enables the physicians to
access and update the patient medical records in real time.
Patient Participation
With the medical records maintained by the hospitals or
third-party cloud service providers, the patients cannot track
how their medical data are used. Consequently, patient
participation in accessing and monitoring medical data is a key
requirement to develop trust in health data management systems.
In addition to data access, the participation of patients in
providing health conditions and lifestyle data to the physicians
will aid in better prognosis and diagnosis. The introduction of
IoT-based health data management system involving sensors
and medical devices that monitor a patient’s health and lifestyle
conditions enables the patient to input their medical conditions
to the system [53-59]. An analysis of personal health records
management platforms based on users’ perception shows that
a simple easy-to-use system is required for patient engagement
and satisfaction [60].
Sharing
Sharing of medical records is a vital requirement with the
patient’s treatment being spread across various health care
providers. This is to aid other physicians to study the patient’s
medical history for better treatment and to avoid repetition of
laboratory and radiology tests. On the basis of the list of
definitions in Table 3, we classified sharing based on the users
allowed to access the data into 3 different categories: (1) degree
1, where the information is shared within the same medical
organization where the patient is currently receiving treatment,
(2) degree 2, where the information is shared with the patient,
patient’s friends, and family, and (3) degree 3, where the
information is shared with other medical organizations and
government. The requirement of sharing is complemented by
the introduction of the cloud-based health data management
system [61-63]. However, to share medical record data between
different health care organizations and to efficiently use the
shared information, the systems should support interoperability.
Interoperability can be achieved by using a standard format to
store, manage, and share the medical data. There are several
standard formats to store medical data and images [64]. Some
of the major file formats used for medical images are Analyze
[65], Neuroimaging Informatics Technology Initiative [66],
Minc [67], and Digital Imaging and Communications in
Medicine [68]. Health Level 7 International, standardized by
the American National Standards Institute, is a health care
protocol for sharing medical data [20]. It includes the rules for
the integration, exchange, and management of EHRs. Wen et
al [69] assessed the interoperability of eHealth systems in
Taiwan for exchanging data. This is to reduce repeated medical
examinations and medications for better health care. They
concluded that the government should define policies to enforce
interoperability.
Security
With increasing incidents of data breaching and phishing attacks,
and the adoption of a third-party service provider, the security
of the patients’ sensitive and important data is essential.
Compared with 477 health data breaches reported in 2017,
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 7https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
affecting 5,579,438 patient records in 2017, 503 breaches
affecting 15,085,302 records were reported in 2018 [70]. The
requirement of security is even high when patients’ medical
records are handled by a cloud service provider or when medical
sensors and devices are used to gather patients’ medical and
lifestyle data. According to a report by Intel Security, the use
of cloud services by the health care provider has reduced owing
to the lack of cyber security methods implemented by the cloud
service provider [71]. A report states that, on average, hospitals
lose track of around 30% of their networked medical devices,
making it much harder to protect against vulnerabilities [72].
More than 61% of all IoT devices and sensors on a hospital
network are at high risk of cyber-attack. In recent years,
blockchain technology [73,74] has gained wide popularity and
has penetrated into the domain of health care to address the need
for a more patient-centric supportive system for the
professionals, to connect disparate systems for improved patient
care, and to increase the accuracy of EHRs [75-81].
Privacy
The privacy requirement of a patient’s identity in a health data
management system is crucial with the increasing number of
medical frauds and fake medications. The privacy of the patient
cannot be compromised, especially with the rise of data
analytics, where the medical record data of the patients are used
for analysis. The blockchain-based health data management
system aims to address this issue.
Public Insights
Prediction of health conditions is important to avoid
life-threatening situations. The increasing amount of health care
data [82], if properly analyzed, can facilitate the prediction of
health conditions. The process of gathering, organizing, storing,
and analyzing big data to discover correlations, hidden patterns,
and other valuable insights is known as big data analytics. Figure
3 shows the life cycle of big data analytics.
Figure 3. Lifecycle of big data analytics.
The prediction from health care data for public insights allows
to actively improve public health and to react faster to a situation
[83-91]. Using personal health care data requires, of course, a
well-defined balance between the assurance of the privacy of
personal health care data with respect to transparency, for
example, toward insurance companies. Insights into genetical
personal risk factors for chronic diseases should not lead to a
situation where a person has disadvantages concerning the
insurance status. Moreover, the monitoring of the public health
situation has to be based on the health care data of individuals.
Consequently, research projects have recently addressed the
balance of personal health care data as a public good [92]. Figure
4 [92] shows the relationship between the 3 key stakeholders
for defining the balance between personal health care data and
the potential of these data as a public good. Companies could
be health insurance providers, hospitals, pharmaceutical
companies, and government organizations.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 8https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Figure 4. Personal health care data ecosystem.
The diabetes mellitus crisis or the growth of cardiovascular
problems caused by nutrition patterns and lifestyle behavior in
many countries and regions of the world, changing patterns of
Alzheimer and dementia, or microbiome research, and the abuse
of antibiotics would benefit tremendously from personal health
care data as a public good [93,94]. Bringing together the insights
of large initiatives such as the Health Data Exploration Project
and Computational Health Sciences [92,94] promises the key
for future advancement in the area of private and personal health
care data for the public good. Health care data analytics can
help researchers and government officials for better prediction
of chronic diseases, the development of effective therapeutic
drugs, more accurate patient care, and the development of a
nation-wide effective prevention plan.
Table 4 shows health data management systems presented in
the taxonomy and evaluates them in terms of their adherence
to the defined requirements.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 9https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Table 4. Health data management systems in the literature vs the requirements.
Public
insights
PrivacySecuritySharingPatient participationReal-time dataMedical
record data
System
Degree 3Degree 2Degree 1Data in-
put
Data ac-
cess
Does
not sup-
Does not
provide
Does not
provide
Does not al-
low data shar-
Allows
data shar-
Supports
data shar-
Does not
allow pa-
Does not
allow pa-
Encounters
high delays
Allows
recording
of medical
Paper-based
port pre-
diction
methods
for pre-
serving a
methods
against cyber-
security at-
tacks
ing with the
patient, pa-
tient’s friends,
and family
ing with
the pa-
tient, pa-
tient’s
friends,
ing only
within the
same hos-
pital
tients to
provide
their
health
condi-
tions
tients to
track the
use of
their med-
ical data
data for
eventual
use patient’s
privacy
and fami-
ly
Does
not sup-
Does not
provide
Does not
provide
Allows data
sharing with
Allows
data shar-
Supports
data shar-
Does not
allow pa-
Does not
allow pa-
Encounters
high delays
Allows
recording
of medical
Computer-
based
port pre-
diction
methods
for pre-
serving a
methods
against cyber-
security at-
tacks
other medical
organizations
and govern-
ment
ing with
the pa-
tient, pa-
tient’s
friends,
ing only
within the
same hos-
pital
tients to
provide
their
health
condi-
tions
tients to
track the
use of
their med-
ical data
data for
eventual
use patient’s
privacy
and fami-
ly
Does
not sup-
Does not
provide
Does not
provide
Allows data
sharing with
Allows
data shar-
Supports
data shar-
Does not
allow pa-
Allows
patients
Allows data
retrieval in re-
al time
Allows
recording
of medical
data for
Client-serv-
er–based
port pre-
diction
methods
for pre-
serving a
methods
against cyber-
security at-
tacks
other medical
organizations
and govern-
ment
ing with
the pa-
tient, pa-
tient’s
friends,
ing only
within the
same hos-
pital
tients to
provide
their
health
condi-
tions
to access
and moni-
tor their
medical
data
eventual
use patient’s
privacy
and fami-
ly
Does
not sup-
Does not
reveal a
Does not
provide
Allows data
sharing with
Allows
data shar-
Supports
data shar-
Does not
allow pa-
Allows
patients
Allows data
retrieval in re-
al time
Allows
recording
of medical
data for
Cloud-based
port pre-
diction
patient’s
identity
methods
against cyber-
security at-
tacks
other medical
organizations
and govern-
ment
ing with
the pa-
tient, pa-
tient’s
friends,
ing only
within the
same hos-
pital
tients to
provide
their
health
condi-
tions
to access
and moni-
tor their
medical
data
eventual
use
and fami-
ly
Pro-
vides
Does not
provide
Does not
provide
Allows data
sharing with
Allows
data shar-
Supports
data shar-
Allows
patients
Allows
patients
Allows data
retrieval in re-
al time
Allows
recording
of medical
data for
IoTa-based
methods
for the
methods
for pre-
methods
against cyber-
other medical
organizations
ing with
the pa-
ing only
within the
to pro-
vide
to access
and moni-
eventual
use
predic-
tion of
health
serving a
patient’s
privacy
security at-
tacks
and govern-
ment
tient, pa-
tient’s
friends,
and fami-
ly
same hos-
pital
health
condi-
tions
tor their
medical
data
condi-
tions
Pro-
vides
Does not
reveal a
Does not
provide
Allows data
sharing with
Allows
data shar-
Supports
data shar-
Allows
patients
Allows
patients
Allows data
retrieval in re-
al time
Allows
recording
of medical
data for
Big data analyt-
ics
methods
for the
patient’s
identity
methods
against cyber-
security at-
tacks
other medical
organizations
and govern-
ment
ing with
the pa-
tient, pa-
tient’s
friends,
ing only
within the
same hos-
pital
to pro-
vide
health
condi-
tions
to access
and moni-
tor their
medical
data
eventual
use
predic-
tion of
health
condi-
tions
and fami-
ly
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 10https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Public
insights
PrivacySecuritySharingPatient participationReal-time dataMedical
record data
System
Degree 3Degree 2Degree 1Data in-
put
Data ac-
cess
Pro-
vides
methods
for the
predic-
tion of
health
condi-
tions
Does not
reveal a
patient’s
identity
Ensures the
protection of
medical data
against cyber-
security at-
tacks
Allows data
sharing with
other medical
organizations
and govern-
ment
Allows
data shar-
ing with
the pa-
tient, pa-
tient’s
friends,
and fami-
ly
Supports
data shar-
ing only
within the
same hos-
pital
Allows
patients
to pro-
vide
health
condi-
tions
Allows
patients
to access
and moni-
tor their
medical
data
Allows data
retrieval in re-
al time
Allows
recording
of medical
data for
eventual
use
Blockchain-
based
aIoT: Internet of Things.
Discussion
Principal Findings
This study revealed that there is a need for a secure and efficient
health data management system that will allow physicians and
patients to update decentralized medical records and to analyze
the medical data for supporting more precise diagnoses,
prognoses, biomedical research, and public insights. The early
form of health data management using the manual recording of
a patient’s diagnosis and treatment on sheets of paper was
introduced almost a century ago. Later, with the advancement
in technology, health data management systems evolved to web,
cloud, IoT, big data analytics, and blockchain-based systems.
The definition of medical records has reformed alongside this
temporal evolution of the system. The requirements for a health
data management system extracted from these definitions are
medical record data, real-time data, patient participation, sharing,
security, privacy, and public insights. The paper-based health
data management system fulfills the requirements of medical
record data and sharing. However, paper charts are prone to
misplacement, occupy large physical space, and involve a
time-consuming and expensive data sharing process. Over time,
the paper charts were replaced by electronic records in the
computer-based system with the same requirements.
To achieve the requirement of real-time data access in addition
to medical record data and sharing, a client-server–based health
data management system was introduced. This system allows
patients and health care providers to access medical data over
the internet using a mobile device or a desktop computer.
However, it suffers from the issues of single point of failure,
data fragmentation, system vulnerability, low scalability, and
high data security and patient privacy risks. To minimize the
infrastructural cost and to address the issue of data
fragmentation, the medical organizations and health care
providers transitioned to a cloud-based system. The cloud
service provider ensures the requirement of privacy of patient
identity, but the security of the data is not ensured in addition
to the issue of a single point of failure.
The requirement of patient participation to feed their medical
data and lifestyle conditions for better prognosis and diagnosis
was achieved with the introduction of the IoT-based
management system. However, with the increasing number of
data breaches and hacking of the medical sensors and devices,
there prevails a constant threat to the security of data and privacy
of a patient’s identity. With the advancement in big data
analytics, increasing amount of health care data are being studied
to gain insights for better prognosis and diagnosis of diseases.
However, the privacy of a patient’s identity still remains a
concern.
The blockchain technology, which recently attracted the
attention of industries, shows potential in the field of health
care. A blockchain-based health data management system
satisfies all the requirements needed for better patient care.
However, it consumes a high amount of energy [95,96] and has
low throughput [39]. There are increasing research efforts to
solve these issues. For instance, to address the problem of energy
consumption, Milutinovic et al [97] proposed the proof of luck
consensus mechanism that ensures energy-efficient and
low-latency transaction validation. Ismail et al [98] and Dorri
et al [99] proposed scalable blockchain architectures for health
care that use a clustering approach to increase transactions
throughput.
The main requirements of a health care data management system
are security and privacy, especially with the increasing number
of data breaching and hacking attacks. Furthermore, the adoption
of patient participation to feed health data to a health system is
increasing with the introduction of disruptive technologies, such
as the IoT and big data analytics. Big data analytics requires
the sharing of medical information among hospitals to get
insights and predictive analysis from the data. This paves the
way toward a health data management system as a support to
physicians and medical professionals for better diagnosis and
prognosis of chronic diseases. In addition, such a system allows
to derive public insights from data to develop a nation-wide
prevention plan for certain diseases. The traceability feature of
the blockchain ensures that the data used for developing the
predictive models is accurate, leading to a precise prognosis,
diagnosis, and decision support system. Consequently, we
suggest an integrated blockchain-, IoT- and big data–based
health data management system to ensure the requirements of
smart health care: real-time access to data by physicians and
patients, health data input from patients through medical sensors
and lifestyle, security, privacy, and public insights. This
integrated health management system should be scalable and
energy-efficient, presenting new research challenges in the
research era of a smart health data management system.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 11https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.w3.org/Style/XSL
http://www.renderx.com/
Conclusions
The objective of this paper was to highlight the requirements
of a health data management system for biomedical care and
research. In summary, it discussed the temporal evolution of
health data management systems from paper charts to
blockchain-based systems, along with the reformation of the
definition of what we call EHRs today. The system should
satisfy the requirements of medical record data, real-time access,
patient participation, data sharing, data security, patient identity
privacy, and public insights. The incorporation of big data
analytics aids in better prognosis and diagnosis of the diseases
and the prediction of risk for the development of chronic
diseases.
Acknowledgments
This work was supported by the Emirates Center for Energy and Environment Research of the United Arab Emirates University
under grant 31R101. The authors would like to thank the anonymous reviewers for their valuable comments, which helped them
improve the content, quality, and presentation of this paper.
Conflicts of Interest
None declared.
References
1. Healthcare Information and Management Systems Society. What Are Electronic Health Records (EHRs)? URL: https:/
/www.himss.org/electronic-health-records [accessed 2020-02-13]
2. King J, Patel V, Jamoom EW, Furukawa MF. Clinical benefits of electronic health record use: national findings. Health
Serv Res 2014 Feb;49(1 Pt 2):392-404 [FREE Full text] [doi: 10.1111/1475-6773.12135] [Medline: 24359580]
3. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2014;2(1):3
[FREE Full text] [doi: 10.1186/2047-2501-2-3] [Medline: 25825667]
4. Kuo MH, Sahama T, Kushniruk AW, Borycki EM, Grunwell DK. Health big data analytics: current perspectives, challenges
and potential solutions. Int J Big Data Intell 2014;1(1/2):114. [doi: 10.1504/ijbdi.2014.063835]
5. Jamal A, McKenzie K, Clark M. The impact of health information technology on the quality of medical and health care: a
systematic review. Health Inf Manag 2009;38(3):26-37. [doi: 10.1177/183335830903800305] [Medline: 19875852]
6. Van De Belt TH, Engelen LJ, Berben SA, Schoonhoven L. Definition of health 2.0 and medicine 2.0: a systematic review.
J Med Internet Res 2010 Jun 11;12(2):e18 [FREE Full text] [doi: 10.2196/jmir.1350] [Medline: 20542857]
7. Oh H, Rizo C, Enkin M, Jadad A. What is ehealth (3): a systematic review of published definitions. J Med Internet Res
2005 Feb 24;7(1):e1 [FREE Full text] [doi: 10.2196/jmir.7.1.e1] [Medline: 15829471]
8. Cunningham SG, Wake DJ, Waller A, Morris AD. Definitions of eHealth. In: Gaddi A, Manca M, editors. eHealth, Care
and Quality of Life. New York, USA: Springer; 2013:15-30.
9. Silva BM, Rodrigues JJ, de la Torre Díez I, López-Coronado M, Saleem K. Mobile-health: a review of current state in
2015. J Biomed Inform 2015 Aug;56:265-272 [FREE Full text] [doi: 10.1016/j.jbi.2015.06.003] [Medline: 26071682]
10. Siegler EL. The evolving medical record. Ann Intern Med 2010 Nov 16;153(10):671-677. [doi:
10.7326/0003-4819-153-10-201011160-00012] [Medline: 21079225]
11. Thomas E, John M, Charles R, Society of the New York Hospital. An Account of the New York Hospital. Medical Center
Archives 1811.
12. Sayles NB, Gordon LL. Health Information Management Technology: An Applied Approach. Chicago, USA: American
Health Information Management Association; 2013.
13. American Health Information Management Association. AHIMA History URL: http://bok.ahima.org/doc?oid=58133#.
XnNMSIgzbIU [accessed 2020-02-13]
14. Weed LL. Medical records that guide and teach. N Engl J Med 1968 Mar 14;278(11):593-600. [doi:
10.1056/NEJM196803142781105] [Medline: 5637758]
15. Centers for Medicare & Medicaid Services. 1965. CMS’ Program History URL: https://www.cms.gov/About-CMS/
Agency-information/History/ [accessed 2020-02-13]
16. Tripathi M. EHR evolution: policy and legislation forces changing the EHR. J AHIMA 2012 Oct;83(10):24-9; quiz 30.
[Medline: 23061349]
17. Gardner RM, Pryor T, Warner HR. The HELP hospital information system: update 1998. Int J Med Inform 1999
Jun;54(3):169-182. [doi: 10.1016/s1386-5056(99)00013-1] [Medline: 10405877]
18. Amatayakul MK. Electronic Health Records: A Practical Guide for Professionals and Organizations. Chicago, USA:
American Health Information Management; 2004.
19. IHS Markit. 1980. Master Patient Index (MPI) URL: https://www.ihs.gov/hie/masterpatientindex/ [accessed 2020-02-13]
20. Hammond WE. Health level 7: an application standard for electronic medical data exchange. Top Health Rec Manage 1991
Jun;11(4):59-66. [Medline: 10112038]
21. Dick RS, Steen EB, Detmer DE. The Computer-based Patient Record: An Essential Technology for Health Care. Washington,
DC: National Academies Press; 1997.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 12https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
https://www.himss.org/electronic-health-records
https://www.himss.org/electronic-health-records
http://europepmc.org/abstract/MED/24359580
http://dx.doi.org/10.1111/1475-6773.12135
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=24359580&dopt=Abstract
http://europepmc.org/abstract/MED/25825667
http://dx.doi.org/10.1186/2047-2501-2-3
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=25825667&dopt=Abstract
http://dx.doi.org/10.1504/ijbdi.2014.063835
http://dx.doi.org/10.1177/183335830903800305
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=19875852&dopt=Abstract
https://www.jmir.org/2010/2/e18/
http://dx.doi.org/10.2196/jmir.1350
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=20542857&dopt=Abstract
https://www.jmir.org/2005/1/e1/
http://dx.doi.org/10.2196/jmir.7.1.e1
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=15829471&dopt=Abstract
https://linkinghub.elsevier.com/retrieve/pii/S1532-0464(15)00113-6
http://dx.doi.org/10.1016/j.jbi.2015.06.003
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=26071682&dopt=Abstract
http://dx.doi.org/10.7326/0003-4819-153-10-201011160-00012
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=21079225&dopt=Abstract
http://bok.ahima.org/doc?oid=58133#.XnNMSIgzbIU
http://bok.ahima.org/doc?oid=58133#.XnNMSIgzbIU
http://dx.doi.org/10.1056/NEJM196803142781105
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=5637758&dopt=Abstract
https://www.cms.gov/About-CMS/Agency-information/History/
https://www.cms.gov/About-CMS/Agency-information/History/
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23061349&dopt=Abstract
http://dx.doi.org/10.1016/s1386-5056(99)00013-1
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10405877&dopt=Abstract
https://www.ihs.gov/hie/masterpatientindex/
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10112038&dopt=Abstract
http://www.w3.org/Style/XSL
http://www.renderx.com/
22. United States Department of Health and Human Services. Summary of the HIPAA Security Rule URL: https://www.hhs.gov/
hipaa/for-professionals/security/laws-regulations/index.html [accessed 2020-02-13]
23. John M. From Telehealth to E-health: The Unstoppable Rise of E-health. Australia: Department of Communications,
Information Technology and the Arts; 1999.
24. Laxminarayan S, Istepanian R. Unwired e-med: the next generation of wireless and internet telemedicine systems. IEEE
Trans Inf Technol Biomed 2000 Sep;4(3):189-193. [doi: 10.1109/titb.2000.5956074] [Medline: 11026588]
25. Eysenbach G. What is e-health? J Med Internet Res 2001;3(2):E20 [FREE Full text] [doi: 10.2196/jmir.3.2.e20] [Medline:
11720962]
26. Intille SS. Ubiquitous computing technology for just-in-time motivation of behavior change. Stud Health Technol Inform
2004;107(Pt 2):1434-1437. [doi: 10.3233/978-1-60750-949-3-1434] [Medline: 15361052]
27. Kim MI, Johnson KB. Personal health records: evaluation of functionality and utility. J Am Med Inform Assoc
2002;9(2):171-180 [FREE Full text] [doi: 10.1197/jamia.m0978] [Medline: 11861632]
28. Gillies J, Holt A. Anxious about electronic health records? No need to be. N Z Med J 2003 Sep 26;116(1182):U604.
[Medline: 14581956]
29. Holahan J, Blumberg L. Massachusetts health care reform: a look at the issues. Health Aff (Millwood) 2006;25(6):w432-w443.
[doi: 10.1377/hlthaff.25.w432] [Medline: 16973652]
30. Centers for Medicare & Medicaid Services. 2006. Accountable Care Organizations (ACOs) URL: https://www.cms.gov/
Medicare/Medicare-Fee-for-Service-Payment/ACO [accessed 2020-02-13]
31. US Department of Justice. 2016. Fact Sheet: The Health Care Fraud and Abuse Control Program Protects Conusmers and
Taxpayers by Combating Health Care Fraud URL: https://www.justice.gov/opa/pr/
fact-sheet-health-care-fraud-and-abuse-control-program-protects-conusmers-and-taxpayers [accessed 2020-02-13]
32. HealthCare. 2010. Patient Protection and Affordable Care Act URL: https://www.healthcare.gov/glossary/
patient-protection-and-affordable-care-act/ [accessed 2020-02-13]
33. Sokol DK, Hettige S. Poor handwriting remains a significant problem in medicine. J R Soc Med 2006 Dec;99(12):645-646
[FREE Full text] [doi: 10.1258/jrsm.99.12.645] [Medline: 17139073]
34. Leonidas LL. Opinion – Inquirer.net. 2014. Death by Bad Handwriting URL: https://opinion.inquirer.net/79623/
death-by-bad-handwriting [accessed 2020-02-13]
35. Charatan F. Family compensated for death after illegible prescription. Br Med J 1999 Dec 4;319(7223):1456 [FREE Full
text] [doi: 10.1136/bmj.319.7223.1456] [Medline: 10582922]
36. Davis J. Healthcare IT News. 2017. eClinicalWorks Sued for Nearly $1 Billion for Inaccurate Medical Records URL:
https://www.healthcareitnews.com/news/eclinicalworks-sued-nearly-1-billion-inaccurate-medical-records [accessed
2020-02-13]
37. Amazon Web Services. 2018. United States District Court Northern District of Illinois URL: https://s3.amazonaws.com/
assets.fiercemarkets.net/public/004-Healthcare/external_Q12018/SurfsidevAllscripts [accessed 2020-02-13]
38. The Guardian. 2017. Bitcoin Mining Consumes More Electricity a Year Than Ireland URL: https://www.theguardian.com/
technology/2017/nov/27/bitcoin-mining-consumes-electricity-ireland [accessed 2020-02-13]
39. Scherer M. UMEA University. 2017. Performance and Scalability of Blockchain Networks and Smart Contracts URL:
https://umu.diva-portal.org/smash/get/diva2:1111497/FULLTEXT01 [accessed 2020-03-24]
40. Miller C. The electronic medical record: a definition and discussion. Top Health Inf Manage 1993 Feb;13(3):20-29. [Medline:
10124869]
41. Cameron S, Turtle-Song I. Learning to write case notes using the SOAP format. J Couns Dev 2002;80(3):286-292. [doi:
10.1002/j.1556-6678.2002.tb00193.x]
42. Markle: Advancing America’s Future. 2003. Personal Health Working Group Final Report URL: https://www.markle.org/
publications/1429-personal-health-working-group-final-report [accessed 2020-02-13]
43. Handler R, Holtmeier R, Metzger J, Overhage M, Taylor S, Underwood C. Healthcare Information and Management
Systems Society. 2003. HIMSS Electronic Health Record Definitional Model URL: http://www.providersedge.com/ehdocs/
ehr_articles/HIMSS_EMR_Definition_Model_v1-0 [accessed 2020-02-13]
44. AHIMA e-HIM Personal Health Record Work Group. Defining the Personal Health Record. J AHIMA 2005 Jun
11;156(24):786-786. [doi: 10.1136/vr.156.24.786-a]
45. Böcking W, Trojanus D. Health Data Management. In: Kirch W, editor. Encyclopedia of Public Health. New York, USA:
Springer; 2008.
46. Kohane IS, Greenspun P, Fackler J, Cimino C, Szolovits P. Building national electronic medical record systems via the
World Wide Web. J Am Med Inform Assoc 1996;3(3):191-207 [FREE Full text] [doi: 10.1136/jamia.1996.96310633]
[Medline: 8723610]
47. Rind DM, Kohane IS, Szolovits P, Safran C, Chueh HC, Barnett GO. Maintaining the confidentiality of medical records
shared over the internet and the world wide web. Ann Intern Med 1997 Jul 15;127(2):138-141. [doi:
10.7326/0003-4819-127-2-199707150-00008] [Medline: 9230004]
48. Schoenberg R, Safran C. Internet based repository of medical records that retains patient confidentiality. Br Med J 2000
Nov 11;321(7270):1199-1203 [FREE Full text] [doi: 10.1136/bmj.321.7270.1199] [Medline: 11073513]
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 13https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html
https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html
http://dx.doi.org/10.1109/titb.2000.5956074
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11026588&dopt=Abstract
https://www.jmir.org/2001/2/e20/
http://dx.doi.org/10.2196/jmir.3.2.e20
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11720962&dopt=Abstract
http://dx.doi.org/10.3233/978-1-60750-949-3-1434
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=15361052&dopt=Abstract
http://europepmc.org/abstract/MED/11861632
http://dx.doi.org/10.1197/jamia.m0978
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11861632&dopt=Abstract
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14581956&dopt=Abstract
http://dx.doi.org/10.1377/hlthaff.25.w432
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=16973652&dopt=Abstract
https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/ACO
https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/ACO
https://www.justice.gov/opa/pr/fact-sheet-health-care-fraud-and-abuse-control-program-protects-conusmers-and-taxpayers
https://www.justice.gov/opa/pr/fact-sheet-health-care-fraud-and-abuse-control-program-protects-conusmers-and-taxpayers
https://www.healthcare.gov/glossary/patient-protection-and-affordable-care-act/
https://www.healthcare.gov/glossary/patient-protection-and-affordable-care-act/
http://europepmc.org/abstract/MED/17139073
http://dx.doi.org/10.1258/jrsm.99.12.645
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17139073&dopt=Abstract
http://europepmc.org/abstract/MED/10582922
http://europepmc.org/abstract/MED/10582922
http://dx.doi.org/10.1136/bmj.319.7223.1456
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10582922&dopt=Abstract
https://www.healthcareitnews.com/news/eclinicalworks-sued-nearly-1-billion-inaccurate-medical-records
https://s3.amazonaws.com/assets.fiercemarkets.net/public/004-Healthcare/external_Q12018/SurfsidevAllscripts
https://s3.amazonaws.com/assets.fiercemarkets.net/public/004-Healthcare/external_Q12018/SurfsidevAllscripts
https://www.theguardian.com/technology/2017/nov/27/bitcoin-mining-consumes-electricity-ireland
https://www.theguardian.com/technology/2017/nov/27/bitcoin-mining-consumes-electricity-ireland
https://umu.diva-portal.org/smash/get/diva2:1111497/FULLTEXT01
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10124869&dopt=Abstract
http://dx.doi.org/10.1002/j.1556-6678.2002.tb00193.x
http://www.providersedge.com/ehdocs/ehr_articles/HIMSS_EMR_Definition_Model_v1-0
http://www.providersedge.com/ehdocs/ehr_articles/HIMSS_EMR_Definition_Model_v1-0
http://dx.doi.org/10.1136/vr.156.24.786-a
http://europepmc.org/abstract/MED/8723610
http://dx.doi.org/10.1136/jamia.1996.96310633
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=8723610&dopt=Abstract
http://dx.doi.org/10.7326/0003-4819-127-2-199707150-00008
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9230004&dopt=Abstract
http://europepmc.org/abstract/MED/11073513
http://dx.doi.org/10.1136/bmj.321.7270.1199
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11073513&dopt=Abstract
http://www.w3.org/Style/XSL
http://www.renderx.com/
49. Uckert F, Görz M, Ataian M, Prokosch HU. Akteonline-an electronic healthcare record as a medium for information and
communication. Stud Health Technol Inform 2002;90:293-297. [Medline: 15460705]
50. Grant RW, Wald JS, Poon EG, Schnipper JL, Gandhi TK, Volk LA, et al. Design and implementation of a web-based
patient portal linked to an ambulatory care electronic health record: patient gateway for diabetes collaborative care. Diabetes
Technol Ther 2006 Oct;8(5):576-586 [FREE Full text] [doi: 10.1089/dia.2006.8.576] [Medline: 17037972]
51. Ross SE, Moore LA, Earnest MA, Wittevrongel L, Lin C. Providing a web-based online medical record with electronic
communication capabilities to patients with congestive heart failure: randomized trial. J Med Internet Res 2004 May
14;6(2):e12 [FREE Full text] [doi: 10.2196/jmir.6.2.e12] [Medline: 15249261]
52. Marceglia S, Bonacina S, Braidotti A, Nardelli M, Pinciroli F. Towards a web-based system for family health record. AMIA
Annu Symp Proc 2006:1023 [FREE Full text] [Medline: 17238642]
53. Laplante PA, Kassab M, Laplante NL, Voas JM. Building caring healthcare systems in the internet of things. IEEE Syst J
2018;12(3):- [FREE Full text] [doi: 10.1109/JSYST.2017.2662602] [Medline: 31080541]
54. Wu F, Wu T, Yuce M. An internet-of-things (IoT) network system for connected safety and health monitoring applications.
Sensors (Basel) 2018 Dec 21;19(1):E21 [FREE Full text] [doi: 10.3390/s19010021] [Medline: 30577646]
55. Meinert E, van Velthoven M, Brindley D, Alturkistani A, Foley K, Rees S, et al. The internet of things in health care in
Oxford: protocol for proof-of-concept projects. JMIR Res Protoc 2018 Dec 4;7(12):e12077 [FREE Full text] [doi:
10.2196/12077] [Medline: 30514695]
56. Mavrogiorgou A, Kiourtis A, Perakis K, Pitsios S, Kyriazis D. IoT in healthcare: achieving interoperability of high-quality
data acquired by IoT medical devices. Sensors (Basel) 2019 Apr 27;19(9):1-24 [FREE Full text] [doi: 10.3390/s19091978]
[Medline: 31035612]
57. Valluru D, Jeya IJ. IoT with cloud based lung cancer diagnosis model using optimal support vector machine. Health Care
Manag Sci 2019 Jul 20:- epub ahead of print. [doi: 10.1007/s10729-019-09489-x] [Medline: 31327114]
58. Ramirez Lopez LJ, Puerta Aponte G, Rodriguez Garcia A. Internet of things applied in healthcare based on open hardware
with low-energy consumption. Healthc Inform Res 2019 Jul;25(3):230-235 [FREE Full text] [doi: 10.4258/hir.2019.25.3.230]
[Medline: 31406615]
59. Qu Y, Ming X, Qiu S, Zheng M, Hou Z. An integrative framework for online prognostic and health management using
internet of things and convolutional neural network. Sensors (Basel) 2019 May 21;19(10):1 [FREE Full text] [doi:
10.3390/s19102338] [Medline: 31117213]
60. Rau H, Wu Y, Chu C, Wang F, Hsu M, Chang C, et al. Importance-performance analysis of personal health records in
Taiwan: a web-based survey. J Med Internet Res 2017 Apr 27;19(4):e131 [FREE Full text] [doi: 10.2196/jmir.7065]
[Medline: 28450273]
61. Bahga A, Madisetti VK. A cloud-based approach for interoperable electronic health records (EHRs). IEEE J Biomed Health
Inform 2013 Sep;17(5):894-906. [doi: 10.1109/JBHI.2013.2257818] [Medline: 25055368]
62. Fernández-Cardeñosa G, de la Torre-Díez I, López-Coronado M, Rodrigues JJ. Analysis of cloud-based solutions on EHRs
systems in different scenarios. J Med Syst 2012 Dec;36(6):3777-3782. [doi: 10.1007/s10916-012-9850-2] [Medline:
22492177]
63. Zangara G, Corso PP, Cangemi F, Millonzi F, Collova F, Scarlatella A. A cloud based architecture to support electronic
health record. Stud Health Technol Inform 2014;207:380-389. [doi: 10.3233/978-1-61499-474-9-380] [Medline: 25488244]
64. Schulz S, Stegwee R, Chronaki C. Standards in healthcare data. In: Kubben P, Dumontier M, Dekker A, editors. Standards
in Healthcare Data. New York, USA: Springer; 2019:19-36.
65. MRC Cognition and Brain Sciences Unit. The Analyze Data Format URL: http://imaging.mrc-cbu.cam.ac.uk/imaging/
FormatAnalyze [accessed 2020-02-13]
66. NIfTI: Neuroimaging Informatics Technology Initiative. URL: https://nifti.nimh.nih.gov/ [accessed 2020-02-13]
67. Vincent RD, Neelin P, Khalili-Mahani N, Janke AL, Fonov VS, Robbins SM, et al. MINC 2.0: a flexible format for
multi-modal images. Front Neuroinform 2016;10:35 [FREE Full text] [doi: 10.3389/fninf.2016.00035] [Medline: 27563289]
68. Digital Imaging and Communications in Medicine. URL: https://www.dicomstandard.org/ [accessed 2020-02-13]
69. Wen H, Chang W, Hsu M, Ho C, Chu C. An assessment of the interoperability of electronic health record exchanges among
hospitals and clinics in Taiwan. JMIR Med Inform 2019 Mar 28;7(1):e12630 [FREE Full text] [doi: 10.2196/12630]
[Medline: 30920376]
70. Davis J. HealthITSecurity. 2019. 15 Million Patient Records Breached in 2018; Hacking, Phishing Surges URL: https:/
/healthitsecurity.com/news/15-million-patient-records-breached-in-2018-hacking-phishing-surges [accessed 2020-02-13]
71. Business Wire. New Intel Security Cloud Report Reveals IT Departments Find It Hard to Keep the Cloud Safe URL: https:/
/www.businesswire.com/news/home/20170212005011/en/ [accessed 2020-02-13]
72. Zorz Z. Help Net Security – Information Security News. 2019. Healthcare’s Blind spot: Unmanaged IoT and Medical
Devices URL: https://www.helpnetsecurity.com/2019/07/22/healthcare-iot/ [accessed 2020-02-13]
73. Nakamoto S. Bitcoin: A Peer-to-Peer Electronic Cash System. New York, USA: BN Publishing; 2008.
74. Ismail L, Heba H, AlShamsi M, AlHammadi M, AlDhanhani N. Towards a Blockchain Deployment at UAE University:
Performance Evaluation and Blockchain Taxonomy. In: Proceedings of the 2019 International Conference on Blockchain
Technology. 2019 Presented at: ICBCT’19; March 15-18, 2019; Hawaii, USA p. 30-38. [doi: 10.1145/3320154.3320156]
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 14https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=15460705&dopt=Abstract
http://europepmc.org/abstract/MED/17037972
http://dx.doi.org/10.1089/dia.2006.8.576
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17037972&dopt=Abstract
https://www.jmir.org/2004/2/e12/
http://dx.doi.org/10.2196/jmir.6.2.e12
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=15249261&dopt=Abstract
http://europepmc.org/abstract/MED/17238642
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17238642&dopt=Abstract
http://europepmc.org/abstract/MED/31080541
http://dx.doi.org/10.1109/JSYST.2017.2662602
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31080541&dopt=Abstract
http://www.mdpi.com/resolver?pii=s19010021
http://dx.doi.org/10.3390/s19010021
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=30577646&dopt=Abstract
https://www.researchprotocols.org/2018/12/e12077/
http://dx.doi.org/10.2196/12077
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=30514695&dopt=Abstract
http://www.mdpi.com/resolver?pii=s19091978
http://dx.doi.org/10.3390/s19091978
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31035612&dopt=Abstract
http://dx.doi.org/10.1007/s10729-019-09489-x
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31327114&dopt=Abstract
https://www.e-hir.org/DOIx.php?id=10.4258/hir.2019.25.3.230
http://dx.doi.org/10.4258/hir.2019.25.3.230
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31406615&dopt=Abstract
http://www.mdpi.com/resolver?pii=s19102338
http://dx.doi.org/10.3390/s19102338
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31117213&dopt=Abstract
https://www.jmir.org/2017/4/e131/
http://dx.doi.org/10.2196/jmir.7065
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28450273&dopt=Abstract
http://dx.doi.org/10.1109/JBHI.2013.2257818
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=25055368&dopt=Abstract
http://dx.doi.org/10.1007/s10916-012-9850-2
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=22492177&dopt=Abstract
http://dx.doi.org/10.3233/978-1-61499-474-9-380
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=25488244&dopt=Abstract
http://imaging.mrc-cbu.cam.ac.uk/imaging/FormatAnalyze
http://imaging.mrc-cbu.cam.ac.uk/imaging/FormatAnalyze
https://nifti.nimh.nih.gov/
https://doi.org/10.3389/fninf.2016.00035
http://dx.doi.org/10.3389/fninf.2016.00035
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=27563289&dopt=Abstract
https://www.dicomstandard.org/
https://medinform.jmir.org/2019/1/e12630/
http://dx.doi.org/10.2196/12630
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=30920376&dopt=Abstract
https://healthitsecurity.com/news/15-million-patient-records-breached-in-2018-hacking-phishing-surges
https://healthitsecurity.com/news/15-million-patient-records-breached-in-2018-hacking-phishing-surges
https://www.businesswire.com/news/home/20170212005011/en/
https://www.businesswire.com/news/home/20170212005011/en/
http://dx.doi.org/10.1145/3320154.3320156
http://www.w3.org/Style/XSL
http://www.renderx.com/
75. Azaria A, Ekblaw A, Vieira T, Lippman A. MedRec: Using Blockchain for Medical Data Access and Permission Management.
In: Proceedings of the 2nd International Conference on Open and Big Data. 2016 Presented at: OBD’16; August 22-24,
2016; Vienna, Austria. [doi: 10.1109/obd.2016.11]
76. Li H, Zhu L, Shen M, Gao F, Tao X, Liu S. Blockchain-based data preservation system for medical data. J Med Syst 2018
Jun 28;42(8):141. [doi: 10.1007/s10916-018-0997-3] [Medline: 29956058]
77. Dagher GG, Mohler J, Milojkovic M, Marella PB. Ancile: privacy-preserving framework for access control and
interoperability of electronic health records using blockchain technology. Sustain Cities Soc 2018 May;39:283-297. [doi:
10.1016/j.scs.2018.02.014]
78. Fan K, Wang S, Ren Y, Li H, Yang Y. MedBlock: efficient and secure medical data sharing via blockchain. J Med Syst
2018 Jun 21;42(8):136. [doi: 10.1007/s10916-018-0993-7] [Medline: 29931655]
79. Yue X, Wang H, Jin D, Li M, Jiang W. Healthcare data gateways: found healthcare intelligence on blockchain with novel
privacy risk control. J Med Syst 2016 Oct;40(10):218. [doi: 10.1007/s10916-016-0574-6] [Medline: 27565509]
80. Dey T, Jaiswal S, SunderKrishnan S, Katre N. HealthSense: A Medical Use Case of Internet of Things and Blockchain.
In: Proceedings of the International Conference on Intelligent Sustainable Systems. 2017 Presented at: ICISS’17; December
7-8, 2017; Palladam, India. [doi: 10.1109/iss1.2017.8389459]
81. Uddin MA, Stranieri A, Gondal I, Balasubramanian V. Continuous patient monitoring with a patient centric agent: a block
architecture. IEEE Access 2018;6:32700-32726. [doi: 10.1109/access.2018.2846779]
82. Evariant: Healthcare’s Only Patient for Life Platform. What is Healthcare Data Management and Why is it Important? URL:
https://www.evariant.com/faq/why-is-healthcare-data-management-important [accessed 2020-02-13]
83. Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, et al. Automated analysis of free speech predicts psychosis
onset in high-risk youths. NPJ Schizophr 2015 Aug 26;1(1):15030-15037 [FREE Full text] [doi: 10.1038/npjschz.2015.30]
[Medline: 27336038]
84. Yu K, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, et al. Predicting non-small cell lung cancer prognosis by fully
automated microscopic pathology image features. Nat Commun 2016 Aug 16;7:12474 [FREE Full text] [doi:
10.1038/ncomms12474] [Medline: 27527408]
85. Cruz-Roa A, Gilmore H, Basavanhally A, Feldman M, Ganesan S, Shih NN, et al. Accurate and reproducible invasive
breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci Rep 2017 Apr
18;7:46450 [FREE Full text] [doi: 10.1038/srep46450] [Medline: 28418027]
86. Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A, et al. arXiv. 2017. Detecting Cancer Metastases on
Gigapixel Pathology Images URL: https://arxiv.org/abs/1703.02442 [accessed 2020-03-24]
87. Richter AN, Khoshgoftaar TM. Efficient learning from big data for cancer risk modeling: a case study with melanoma.
Comput Biol Med 2019 Jul;110:29-39. [doi: 10.1016/j.compbiomed.2019.04.039] [Medline: 31112896]
88. Narula S, Shameer K, Salem Omar AM, Dudley JT, Sengupta PP. Machine-Learning Algorithms to Automate Morphological
and FunctionaMachine-learning algorithms to automate morphological and functional assessments in 2D echocardiographyl
Assessments in 2D Echocardiography. J Am Coll Cardiol 2016 Nov 29;68(21):2287-2295 [FREE Full text] [doi:
10.1016/j.jacc.2016.08.062] [Medline: 27884247]
89. Kiral-Kornek I, Roy S, Nurse E, Mashford B, Karoly P, Carroll T, et al. Epileptic seizure prediction using big data and
deep learning: toward a mobile system. EBioMedicine 2018 Jan;27:103-111 [FREE Full text] [doi:
10.1016/j.ebiom.2017.11.032] [Medline: 29262989]
90. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health
records. Digit Med 2018;1:18.
91. Hsiao P, Chin-Ming CH, Li Y. Applied wearable devices for digital health based on novel cardiac force index of running
performance:cross-sectional study. JMIR mHealth and uHealth (forthcoming). [doi: 10.2196/15331]
92. Health Data Exploration-Personal Data for the Public. 2014. Personal Data for the Public Good: New Opportunities to
Enrich Understanding of Individual and Population Health URL: http://hdexplore.calit2.net/wp-content/uploads/2015/08/
hdx_final_report_small [accessed 2020-02-13]
93. Knight Lab. Personal Health Data as Public Good URL: https://knightlab.ucsd.edu/wordpress/?page_id=19 [accessed
2020-02-13]
94. Bakar Institute. Bringing the Power of Computation to Today’s Spectrum of Data Will Yield Untold Health Insights and
Patterns URL: https://bakarinstitute.ucsf.edu/research/ [accessed 2020-02-13]
95. Digiconomist. Bitcoin Energy Consumption Index URL: https://digiconomist.net/bitcoin-energy-consumption [accessed
2020-02-13]
96. Crush Crypto. What is Practical Byzantine Fault Tolerance (PBFT)? URL: https://crushcrypto.com/
what-is-practical-byzantine-fault-tolerance/ [accessed 2020-02-13]
97. Milutinovic M, He W, Wu H, Kanwal M. Proof of Luck: An Efficient Blockchain Consensus Protocol. In: Proceedings of
the 1st Workshop on System Software for Trusted Execution. 2016 Presented at: SysTEX’16; December 12-16, 2016;
Trento, Italy. [doi: 10.1145/3007788.3007790]
98. Ismail L, Materwala H, Zeadally S. Lightweight blockchain for healthcare. IEEE Access 2019;7:149935-149951. [doi:
10.1109/access.2019.2947613]
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 15https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://dx.doi.org/10.1109/obd.2016.11
http://dx.doi.org/10.1007/s10916-018-0997-3
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=29956058&dopt=Abstract
http://dx.doi.org/10.1016/j.scs.2018.02.014
http://dx.doi.org/10.1007/s10916-018-0993-7
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=29931655&dopt=Abstract
http://dx.doi.org/10.1007/s10916-016-0574-6
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=27565509&dopt=Abstract
http://dx.doi.org/10.1109/iss1.2017.8389459
http://dx.doi.org/10.1109/access.2018.2846779
https://www.evariant.com/faq/why-is-healthcare-data-management-important
http://europepmc.org/abstract/MED/27336038
http://dx.doi.org/10.1038/npjschz.2015.30
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=27336038&dopt=Abstract
http://europepmc.org/abstract/MED/27527408
http://dx.doi.org/10.1038/ncomms12474
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=27527408&dopt=Abstract
http://dx.doi.org/10.1038/srep46450
http://dx.doi.org/10.1038/srep46450
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28418027&dopt=Abstract
https://arxiv.org/abs/1703.02442
http://dx.doi.org/10.1016/j.compbiomed.2019.04.039
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31112896&dopt=Abstract
https://linkinghub.elsevier.com/retrieve/pii/S0735-1097(16)36250-7
http://dx.doi.org/10.1016/j.jacc.2016.08.062
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=27884247&dopt=Abstract
https://linkinghub.elsevier.com/retrieve/pii/S2352-3964(17)30470-X
http://dx.doi.org/10.1016/j.ebiom.2017.11.032
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=29262989&dopt=Abstract
http://dx.doi.org/10.2196/15331
http://hdexplore.calit2.net/wp-content/uploads/2015/08/hdx_final_report_small
http://hdexplore.calit2.net/wp-content/uploads/2015/08/hdx_final_report_small
https://knightlab.ucsd.edu/wordpress/?page_id=19
https://bakarinstitute.ucsf.edu/research/
https://digiconomist.net/bitcoin-energy-consumption
https://crushcrypto.com/what-is-practical-byzantine-fault-tolerance/
https://crushcrypto.com/what-is-practical-byzantine-fault-tolerance/
http://dx.doi.org/10.1145/3007788.3007790
http://dx.doi.org/10.1109/access.2019.2947613
http://www.w3.org/Style/XSL
http://www.renderx.com/
99. Dorri A, Kanhere SS, Jurdak R. Towards an Optimized BlockChain for IoT. In: Proceedings of the Second International
Conference on Internet-of-Things Design and Implementation. 2017 Presented at: IoTDI’17; April 18-21, 2017; Pittsburgh,
Pennsylvania. [doi: 10.1145/3054977.3055003]
Abbreviations
EHR: electronic health record
eHealth: electronic health
IoT: Internet of Things
mHealth: mobile health
Edited by G Eysenbach; submitted 19.12.19; peer-reviewed by Z Sherali, A Behmanesh, CM Chu; comments to author 03.02.20;
revised version received 13.02.20; accepted 01.03.20; published 07.07.20
Please cite as:
Ismail L, Materwala H, Karduck AP, Adem A
Requirements of Health Data Management Systems for Biomedical Care and Research: Scoping Review
J Med Internet Res 2020;22(7):e17508
URL: https://www.jmir.org/2020/7/e17508
doi: 10.2196/17508
PMID:
©Leila Ismail, Huned Materwala, Achim P Karduck, Abdu Adem. Originally published in the Journal of Medical Internet Research
(http://www.jmir.org), 07.07.2020. This is an open-access article distributed under the terms of the Creative Commons Attribution
License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete
bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information
must be included.
J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 16https://www.jmir.org/2020/7/e17508
(page number not for citation purposes)
Ismail et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http://dx.doi.org/10.1145/3054977.3055003
https://www.jmir.org/2020/7/e17508
http://dx.doi.org/10.2196/17508
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=&dopt=Abstract
http://www.w3.org/Style/XSL
http://www.renderx.com/
Review Article
Big Data Management for Healthcare Systems:
Architecture, Requirements, and Implementation
Naoual El aboudi and Laila Benhlima
Department of computer sciencne, Mohammadia School of Engineering, Mohammed V University, Rabat, Morocco
Correspondence should be addressed to Naoual El aboudi; nawal.elaboudi@gmail.com
Received 7 January 2018; Revised 22 May 2018; Accepted 27 May 2018; Published 21 June 2018
Academic Editor: Florentino Fdez-Riverola
Copyright © 2018 Naoual El aboudi and Laila Benhlima. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
The growing amount of data in healthcare industry has made inevitable the adoption of big data techniques in order to improve the
quality of healthcare delivery. Despite the integration of big data processing approaches and platforms in existing data management
architectures for healthcare systems, these architectures face difficulties in preventing emergency cases. The main contribution
of this paper is proposing an extensible big data architecture based on both stream computing and batch computing in order to
enhance further the reliability of healthcare systems by generating real-time alerts and making accurate predictions on patient
health condition. Based on the proposed architecture, a prototype implementation has been built for healthcare systems in order
to generate real-time alerts. The suggested prototype is based on spark and MongoDB tools.
1. Introduction
The proportion of elderly people in society is growing
worldwide [1]; this phenomenon known as humanity’s aging
has many implications on healthcare services, especially
in terms of costs. In the face of such situation, relying
on classical systems may result in a life quality decline
for millions of people. Seeking to overcome this problem,
a bunch of healthcare systems have been designed. Their
common principle is transferring, on a periodical basis,
medical parameters like blood pressure, heart rate, glucose
level, body temperature, and ECG signals to an automated
system aimed at monitoring in real time patients health
condition. Such systems provide quick assistance when
needed since data is analyzed continuously. Automating
health monitoring favors a proactive approach that relieves
medical facilities by saving costs related to hospitalization,
and it also enhances healthcare services by improving waiting
time for consultations. Recently, the number of data sources
in healthcare industry has grown rapidly as a result of
widespread use of mobile and wearable sensors technologies,
which has flooded healthcare area with a huge amount of
data.Therefore, it becomes challenging to performhealthcare
data analysis based on traditional methods which are unfit
to handle the high volume of diversified medical data. In
general, healthcare domain has four categories of analytics:
descriptive, diagnostic, predictive, and prescriptive analytics;
a brief description of each one of them is given below.
Descriptive Analytics. It consists of describing current situa-
tions and reporting on them.
Several techniques are employed to perform this level
of analytics. For instance, descriptive statistics tools like
histograms and charts are among the techniques used in
descriptive analytics.
Diagnostic Analysis. It aims to explain why certain events
occurred and what the factors that triggered them are. For
example, diagnostic analysis attempts to understand the
reasons behind the regular readmission of some patients by
using several methods such as clustering and decision trees.
Predictive Analytics. It reflects the ability to predict future
events; it also helps in identifying trends and determining
probabilities of uncertain outcomes. An illustration of its
role is to predict whether a patient can get complications or
not. Predictive models are often built using machine learning
techniques.
Hindawi
Advances in Bioinformatics
Volume 2018, Article ID 4059018, 10 pages
https://doi.org/10.1155/2018/4059018
http://orcid.org/0000-0001-7718-3405
https://doi.org/10.1155/2018/4059018
2 Advances in Bioinformatics
Figure 1: Analytics for healthcare domain.
Prescriptive Analytics. Its goal is to propose suitable actions
leading to optimal decision-making. For instance, prescrip-
tive analysis may suggest rejecting a given treatment in the
case of a harming side effect high probability. Decision trees
and monte carlo simulation are examples of methods applied
to perform prescriptive analytics. Figure 1 illustrates analytics
phases for healthcare domain [2]. The integration of big
data technologies in healthcare analytics may lead to better
performance of medical systems.
In fact, big data refers to large datasets that combine
the following characteristics (see [3]): volume which refers
to high amounts of data, velocity which means that data is
generated at a rapid pace, variety which emphasizes that data
comes under different formats, and, finally, veracity which
means that data originates from trustable sources.
Another characteristic of big data is the variability. It
indicates variations that occur in the data flow rates. Indeed,
velocity does not provide a consistent description of the data
due to its periodic peaks and troughs. Another important
aspect of big data is complexity; it arises from the fact that
big data is often produced through a bunch of sources, which
implies, to perform many operations over the data, these
operations include identifying relationships and cleansing
and transforming data flowing from different origins.
Moreover, Oracle decided to introduce value as a key
attribute of big data. According to Oracle, big data has a
“low value density,” which means that raw data has a low
value compared to its high volume. Nevertheless, analysis of
important volumes of datamay lead to obtaining a high value.
In the context of healthcare, high volumes of data are
generated by multiple medical sources, and it includes, for
example, biomedical images, lab test reports, physician writ-
ten notes, andhealth condition parameters allowing real-time
patient health monitoring. In addition to its huge volume
and its diversity, healthcare data flows at high speed. As a
result, big data approaches offer tremendous opportunities
regarding healthcare systems efficiency.
The contribution of this research paper is to propose an
extensible big data architecture for healthcare applications
formed by several components capable of storing, processing,
and analyzing the high amount of data in real time and
batch modes. This paper demonstrates the potential of using
big data analytics in the healthcare domain to find useful
information in highly valuable data.
The paper has been organized as follows: In Section 2, a
background of big data computing approaches and big data
platforms is provided. Recent contributions on big data for
healthcare systems are reviewed in Section 3. In Section 4, the
components of the proposed big data architecture for health-
care are described. The implementation process is reported
in Section 5. Conclusion is finally drawn in Section 6, along
with recommendations for future research.
2. Background
2.1. An Overview of Big Data Approaches. Big data tech-
nologies have received great attention due to their success-
ful handling of high volume data compared to traditional
approaches. Big data framework supports all kind of data,
structured, semistructured, and unstructured data, while
providing several features. Those features include predictive
model design and big data mining tools that allow better
decision-making process through the selection of relevant
information.
Big data processing can be performed through two man-
ners: batch processing and streamprocessing; see [4].Thefirst
method is based on analyzing data over a specified period of
time; it is adopted when there are no constraints regarding
the response time. On the other hand, stream processing is
suitable for applications requiring real-time feedback. Batch
processing aims to process a high volume of data by collecting
and storing batches to be analyzed in order to generate
results.
Batch mode requires ingesting all data before processing
it in a specified time. Mapreduce represents a widely adopted
solution in the field of batch computing; see [5]; it oper-
ates by splitting data into small pieces that are distributed
to multiple nodes in order to obtain intermediate results.
Once data processing by nodes is terminated, outcomes
will be aggregated in order to generate the final results.
Seeking to optimize computational resources use, mapreduce
allocates processing tasks to nodes close to data location.This
model has encountered a lot of success in many applications,
especially in the field of bioinformatics and healthcare. Batch
processing framework has many characteristics such as the
ability to access all data and to perform many complex com-
putation operations, and its latency ismeasured byminutes or
more.
Advances in Bioinformatics 3
Stream Computing. In real applications such as healthcare,
intelligent transportation, and finance, a high amount of
data is produced in continuous manner. When the need
of processing such data streams in real time arises, data
analysis takes into consideration continuous evolution of data
and permanent change regarding statistical characteristics of
data streams referred to as concept drift; see [6]. Indeed,
storing a large amount of data for further processing may
be challenging in terms of memory resources. Moreover,
real applications tend to produce noisy data which contain
missing values along with redundant features, making by
the way data analysis complicated, as it requires important
computational time. Stream processing reduces this compu-
tational burden by performing simple and fast computations
for one data element or for a window of recent data, and such
computations spend seconds at most.
Big data stream mining methods including classification,
frequent pattern mining, and clustering relieve computa-
tional effort through rapid extraction of the most relevant
information; this objective is often achieved bymining data in
a distributedmanner.Thosemethods belong to one of the two
following classes: data-based techniques and task-based tech-
niques; see [7]. Data-based techniques allow summarizing
the entire dataset or selecting a subset of the continuous flow
of streaming data to be processed. Sampling is one of these
techniques; it consists of choosing a small subset of data to be
processed according to a statistical criterion. Another data-
based method is load shedding which drops a part from the
entire data, while sketching technique establishes a random
projection on a feature set. Synopsis data structures method
and aggregation method belong also to the family of data-
based techniques, the first one summarizes data streams,
and the second one represents a number of elements in one
element by using a statistical measure.
Task-based techniques update existingmethods or design
new ones to reduce the computational time in the case of
data stream processing. They are categorized into approx-
imation algorithms that generate outputs with acceptable
error margin, sliding window that analyzes recent data under
the assumption that it is more useful than older data, and
algorithm output granularity that processes data according to
the available memory and time constraints.
Big data approaches are essential for modern health-
care analytics; they allow real-time extraction of relevant
information from a large amount of patient data. As a
result, alerts are generated when the predictionmodel detects
possible complications. This process helps to prevent health
emergencies from occurring; it also assists medical care
professionals in decision-making regarding disease diagnosis
and provides special care recommendations.
2.2. Big Data Processing Frameworks. Concerning batch
processing mode, mapreduce framework is widely adopted;
it allows distributed analysis of big data on a cluster of
machines.Thus, simple computations are performed through
two functions that consist of map and reduce. Mapreduce
relies on a master/slave architecture, the master node allo-
cates processing tasks to slave nodes and divides data into
blocks, and, then, it structures data into a set of keys/values
as an input of map tasks. Each worker assigns a map task to
slaves and reads the appropriate input data, and, after that,
it writes generated results of the map task into intermediate
files. The reducer worker transmits results generated by the
map task as an input of the reducer task; finally, the results
are written into final output files. Hadoop is an open source
framework that stores and analyzes data in a parallel manner
through clusters.
It is composed of two main components: Hadoop mapre-
duce and distributed file system. Distributed file system
(HDFS) stores data by duplicating it in many nodes; on
the other hand, hadoop mapreduce implements mapre-
duce programming model, its master node stores metadata
information such as locations of duplicated blocks, and it
identifies locations of data nodes to recover missing blocks
in failure cases. The data are splitted into several blocks and
the processing operations are made in the same machine.
With hadoop, other tools regarding data storage can be used
instead of HDFS, such as HBase, Cassandra, and relational
databases. Data warehousing may be performed by other
tools, for instance, Pig and Hive, while mahout is employed
for machine learning purposes. When stream processing is
required, Hadoop may not be a suitable choice since all
input data must be available before starting mapreduce tasks.
Recently, Storm from Twitter, S4 from Yahoo, and spark were
presented to process incoming stream data. Each solution has
its own principle.
Storm. It is an open source framework to analyze data in
real time, see [8], it is formed by Spouts and Bolts. Spout
can produce data or load data from an input queue and
bolt processes input streams and generates outputs streams.
In storm program, a combination of a bolt and a spout is
named topology. Storm has three nodes that are the master
node named nimbus, the worker node and zookeeper. The
master node distributes and coordinates the execution of
topology while the worker node is responsible for execut-
ing spouts/bolts. Finally, zookeeper synchronizes distributed
coordination.
S4. It is a distributed stream processing engine, inspired by
the mapreduce model in order to process data streams; see
[9]. It was implemented by Yahoo through Java. Data streams
feed to S4 as events.
Spark. It is applied for both batch and stream processing;
therefore, spark may be considered as a powerful framework
compared with other tools such as hadoop and storm; see
[10]. It can access several data sources like HDFS, Cassan-
dra, and HBase. Spark provides several interesting features,
for example, iterative machine learning algorithms through
Mllib library which provides efficient algorithms with high
speed, structured data analysis using Hive, and graph pro-
cessing based on GraphX and SparkSQL that restore data
from many sources and manipulate them using SQL lan-
guages. Before processing data streams, spark divides them
into small portions and transforms them into a set of RDDs
(Resilient DistributedDatasets) namedDStream (Discretised
Stream).
4 Advances in Bioinformatics
Table 1: Big data processing solutions.
Framework Type Latency Developped by Stream Primitive Stream source
Hadoop batch Minutes or more Yahoo Key-value HDFS
Storm streaming Subseconds Twitter Tuples Spouts
Spark streaming Batch/streaming Few seconds Berkley AMPLay DStream HDFS
S4 streaming Few seconds Yahoo Events Networks
Flink Batch/streaming Few seconds Apache Software Foundation key-value KAFKA
Apache Flink. It is an open source solution that analyzes data
in both batch and real-time mode [11]. The programming
models of flink and mapreduce share many similarities. It
allows iterative processing and real-time computation on
stream data collected by tools such as flume and KAFKA.
Apache flink provides several features like FlinkML which
represents a machine learning library capable of providing
many learning algorithms for fast and scalable big data
applications.
MongoDB. It is a NoSQL database capable of storing a high
amount of data. MongoDB relies on JSON standard (Java
Script Object Notation) in order to store records; it consists
of an open, human, andmachine-readable format that makes
data interchange easier compared to classical formats such
as rows and tables. In addition, JSON scales better since join
based queries are not needed due to the fact that relevant data
of a given record is contained in a single JSON document..
Spark is easily integrated with MongoDB; see [12].
Table 1 summarizes big data processing solutions.
3. Big Data-Based Healthcare Systems
The potential offered by big data approaches in healthcare
analytics has attracted the attention of many researchers.
In [13], recent advances in big data for health informatics
and their role to tackle disease management are presented,
for instance, diagnosis prevention and treatment of several
illnesses. The study demonstrates that data privacy and
security represent challenging issues in healthcare systems.
Raghupathi et al. exposed in [14] the architectural frame-
work and challenges of big data healthcare analytics. In
another study (see [15]), the importance of security and
privacy issues is demonstrated in implementing successfully
big data healthcare systems. Belle et al. discuss in [16] the
role of big data in improving the quality of care delivery
by aggregating and processing the large volume of data
generated by healthcare systems. In [17], data mining tech-
niques for healthcare analytics are presented, especially those
used in healthcare applications like survival analysis and
patient similarity. Bochicchio et al. proposed in [18] a big
data healthcare analytics framework for supporting multi-
dimensional mining over big healthcare data. The objective
of this framework is analyzing the huge volume of data by
applying data mining methods. Sakr et al. presented in [19]
a composite big data healthcare analytics framework, called
Smarthealth, whose goal is to overcome the challenges raised
by healthcare big data via ICT technologies. In [20], authors
presented Wiki-Health, a big data platform that processes
data produced by health sensors. This platform is formed by
the three following layers: application, query and analysis,
and data storage. Application Layer ensures data access, data
collection, security, and data sharing. On the other hand,
query and analysis layer provides data management and data
analysis, while data storage layer is in charge of storing data
as its name suggests. Challenges regarding the design of
such platforms, especially in terms of data privacy and data
security, are highlighted in [21]. Baldominos et al. designed in
[22] an intelligent big data healthcare management solution
aimed at retrieving and aggregating data and predicting
future values.
Based on big data technologies, a few data processing
systems for healthcare domain have been designed in order
to handle the important amount of data streams generated
by medical devices; a brief description of the major ones is
provided in the next section.
Borealis-based Heart Rate Variability Monitor, presented
in [23], belongs to the category of big data processing systems
for healthcare systems; it processes data originating from
various sources in order to perform desired monitoring
activities. It is composed of stream transmitter that represents
an interface between sensors collecting data and Borealis
application; it encapsulates the collected data into Borealis
format in order to obtain a single stream. Then, the final
stream is transferred toward Borealis application for pro-
cessing purposes. This system includes also a graphical user
interface (GUI) that allows physicians to select from among
patients those whose health condition is going to be the
subject of closemonitoring.Moreover, the graphical interface
permits the medical staff to choose the parameters they want
to focus on, regarding a monitoring task. Furthermore, it
allows visualization of Borealis application outcomes. The
system has many drawbacks; for instance, it does not include
a machine learning component capable of making accurate
predictions on patient health condition. Furthermore, adding
an alarming component would enhance emergency cases
detection.
Hadoop-based medical emergency management system
using IoT technology relies on sensors measuring medical
parameters through different processes [24]. Those sensors
may be devices mounted on patient body or other types
of medical devices capable of providing remote measuring.
Before being transferred to the component called intelligent
building (IB), the collected data flows through the primary
medical device (PMD). Next, IB starts by aggregating the
input stream thanks to its collection unit; then, the result-
ing data is transferred to Hadoop Processing Unit (HPU)
to perform statistical analyses of parameters measured by
Advances in Bioinformatics 5
sensors based on mapreduce paradigm. The map function
aims to verify sensor readings; this verification occurs by
performing a comparison with their corresponding normal
threshold. If readings are considered to be normal, they
are stored in database without further processing. On the
other hand, if they are alarming, an alert is triggered and
transmitted to the application layer.Meanwhile, when sensors
return values that are neither normal, nor alerting, it is
necessary to analyze them closely. Results of such analysis are
collected by aggregation result unit through a reducer from
different data nodes; then, they are sent to the final decision
server. Finally, decision server receives the current results
and applies machine learning classifiers and medical expert
knowledge to process past patient data for more accurate
decisions and generates outputs based onHadoop Processing
Unit results.This system is based on hadoop ecosystemwhich
is adapted for batch processing, however, it does not support
stream processing.Therefore, it is more recommended to use
spark in order to improve the system performance in terms
of processing time using data stream mining approaches.
A Prototype of Healthcare Big Data Processing System
based on Spark [25] is proposed to analyze the high amount
of data generated by healthcare big data process systems. It is
formed by two logical parts: big data application service and
big data supporting platform performing data analysis. The
first logical part visualizes the processing results and plays the
role of an interface between applications and data warehouse
big data tools such as Hive or Spark SQL. The second one is
responsible for computing operations and distributed storage
allowing high storage capabilities. This solution is based
on spark which is very promising since it handles batch
computing, stream computing, and ad hoc query.The system
has many drawbacks; for instance, it does not include big
data mining and big data analytics in experimental platform,
which hampers prediction possibilities that are vital for
improving the quality of patient outcomes.
In this paper, we summarize the added value of big
data technologies on healthcare analytics by presenting an
extensible big data architecture for healthcare analytics that
combines advantages of both batch and stream computing to
generate real-time alerts andmake accurate predictions about
patient health condition. In this research, an architecture
for management and analysis of medical data was designed
based on big data methods and can be implemented via
a combination of several big data technologies. Designing
systems capable of handling both batch and real-time pro-
cessing is a complex task and requires an effective conceptual
architecture for implementing the system.
4. An Extensible Big Data
Architecture for Healthcare
We are developing a system that has the advantage to be
generic and can deal with various situations such as early
disease diagnosis and emergency detection. In this study,
we propose a new architecture aimed at handling medical
big data originating from heterogeneous sources in different
formats. Data management in this architecture is illustrated
through the following scenario.
Figure 2: The layer architecture.
Indeed, new medical data is sent simultaneously to both
batch layer and streaming layer. In batch mode, data is stored
in data nodes; then, it is transmitted to semantic module
which affects meaning to data using ontology store; after that,
cleaning and filtering operations are applied to the resulting
data before processing it. In the next step, the prepared
data is analyzed through different phases: feature selection
and feature extraction. Finally, the prepared data is used to
design models predicting patients future health condition.
This mode is solicited periodically on an offline basis. In the
stream scenario, data comes from multiple sources such as
medical sensors connected to patient body,measuring several
medical parameters like blood pressure. Then, the collected
data is synchronized based on time and its missing values are
handled.
Based on sliding window technique, the adaptive prepro-
cessor splits data into blocks, and then it extracts relevant
information for the predictor component in order to build a
predictive model for every window tuple. Figure 2 represents
the layer architecture of the proposal.
4.1. Batch Processing Layer. Batch computing is performed
on extracted data from prepared data store through different
phases.
4.1.1. Data Acquisition. When monitoring continuously a
patient health condition, several types of data are generated.
Medical data may include structured data like traditional
Electronic Healthcare Records (EHRs), semistructured data
such as logs produced by somemedical devices, and unstruc-
tured data generated, for example, by biomedical imagery.
Electronic Healthcare Records HER. It contains a complete
patient medical history stored in a digital format; it is
formed by amultitude ofmedical data describing the patient’s
health status like demographics, medications, diagnoses,
laboratory tests, doctor’s note, radiology documents, clinical
6 Advances in Bioinformatics
information, and payment notes. Thus, EHR represents a
valuable source of information for the purpose of healthcare
analytics. Furthermore, EHR allows exchanging data between
healthcare professionals community.
Biomedical Images. Biomedical imaging is considered as a
powerful tool regarding disease detection and care delivery.
Nevertheless, processing this kind of images is challenging as
they include noisy data that needs to be discarded in order to
help physicians make accurate decisions.
Social Network Analysis. Performing social network analysis
requires gathering data from socialmedia like social network-
ing sites. The next step consists of extracting knowledge that
could affect healthcare predictive analysis such as discovering
infectious illnesses. In general, social networks data ismarked
by uncertainty that makes their use in designing predictive
models risky.
Sensing Data. Sensors of different types are employed in
healthcare monitoring solutions. Those devices are essential
inmonitoring a patient health as theymeasure awide range of
medical indicators such as body temperature, blood pressure,
respiratory rate, heart rate, and cardiovascular status. In order
to ensure an efficient health monitoring, patients living area
may be full of devices like surveillance cameras,microphones,
and pressure sensors. Consequently, data volume generated
by healthmonitoring systems tends to increase tremendously
which requires adopting sophisticated methods during the
processing phase.
Mobile Phone. Nowadays, mobile phone represents one of the
most popular technological devices in the world. Compared
to their early beginnings, mobile phones transformed from
a basic communication tool to a complex device offering
many features and services.They are currently equipped with
several sensors like satellite positioning services, accelerome-
ters, and cameras. Due to their multiple capabilities and wide
use, mobile phones are ideal candidates regarding health data
collection allowing the design of many successful healthcare
applications like pregnancy monitoring [26], child nutrition
[27], and heart frequency monitoring [28].
The objective of data acquisition phase is to read the
data gathered from healthcare sensors in several formats
and then data flows through semantic module before being
normalized.
Semantic Module. It is based on ontologies, which constitute
efficient tools when it comes to representing actionable
knowledge in the field of biomedicine. In fact, ontologies
have the ability to extract biomedical knowledge in a formal,
powerful, and incremental way. They also allow automation
and interoperability between different clinical information
systems. Automation has a major benefit; it helps medical
personnel in processing large amounts of patients’ data,
especially when taking into consideration that this personnel
is often overwhelmed by a series of healthcare tasks. Intro-
ducing automation in healthcare application contributes to
providing assistance to humanmedical staff, which enhances
its overall performance. It should be highlighted that automa-
tion will help humans in performing their duties rather
than replacing them. Interoperability is an important issue
when dealing with medical data. In fact, healthcare databases
lack homogeneity as they adopt different structures and
terminologies. Therefore, it is difficult to share information
and integrate healthcare data. In this context, ontologies may
play a determinant role by establishing a common structure
and semantics, which allows sharing and reuse of data across
different systems. In other words, by defining a standard
ontology format, it becomes possible to map heterogeneous
databases into a common structure and terminology. For
instance, the Web Ontology Language (OWL) represents the
standard interchange format regarding ontology data that
employs XML syntax.
4.1.2. Data Preparation. Processing raw data without prepa-
ration routines may require extra computational resources
that are not affordable in big data context. Thus, it is recom-
mended to make sure data is prepared properly, in order to
obtain accurate predictive models and to enhance the relia-
bility of data mining techniques. Data preparation consists of
two steps: data cleaning and data filtering.
Data Filtering. Data filtering in the presence of large size data
is achieved by discarding information that is not useful for
healthcare monitoring based on a defined criterion.
Data Cleaning. It encompasses several components such as
normalization, noise reduction, and missing data manage-
ment.
Several methods are utilized in order to eliminate noisy
data and to find out the values of missing data. In fact,
medical records often include noisy information and may
have missing data. Determining missing values in healthcare
data is a critical process. Making errors in filling miss-
ing values may affect the quality of extracted knowledge
and lead to incorrect results. In healthcare domain, the
handling of missing data should be performed with maxi-
mum precision as wrong decisions may have serious con-
sequences. Data mining field has many powerful algorithms
aimed at handling missing values, for instance, Expectation-
Maximization (EM) algorithm andmultiple Imputation algo-
rithm.
Noise Treatment. In general, noisy data is treated according
to two main approaches. The first one consists of correct-
ing noisy values based on data polishing techniques; these
techniques are difficult to implement and are applied only
in the case of small amounts of noise. The second approach
is based on noise filters, which determine and eliminate
noisy instances in the training data, and those filters do not
introduce modifications on adopted data mining methods.
For instance, electronicmedical records (EMRs) illustrate
well the need for data cleaning as it may provide noisy data
containing incomplete information. Data sparsity in EMRs
finds its origin in irregular collection of parameters over time,
since patient parameters are recorded only when patients are
present in hospitals. In the case of biomedical imagery, many
Advances in Bioinformatics 7
processing techniques have been applied in order to reduce
noise.
Generally, the preparation of biomedical images starts by
the identification (segmentation) of significant objects. On
the other hand, data preparation is more challenging when
dealing with raw social media data. In addition to its huge
volume and its informal content, this kind of data has the
critical aspect of including user’s personal information.Thus,
data cleaning is a key factor for success in social networks
analysis.When data preparation step ends, the processed data
needs to be stored in prepared data store.
4.1.3. Feature Extraction and Feature Selection. The prolifer-
ation of devices designed to collect medical data in recent
years has increased tremendously both the number of features
and instances in the field of healthcaremonitoring.Therefore,
selecting the most significant features becomes crucial when
facing such high volume data; see [29]. In this context,
several techniques have been proposed to manage this issue,
especially when handling thousands of features [29]. On the
other hand, feature extraction represents another approach
that consists of extracting a reduced number of attributes
compared to the original ones. Applying feature selection and
extraction methods requires a statistical tools store. When
this phase terminates, the selected feature subset will be used
to build the predictive model.
4.1.4. Predictive Model Design. The objective of this compo-
nent is to build a model capable of producing predictions
for new observations based on the previous data. The quality
of a given predictive model is evaluated by its accuracy.
Those models are developed based on tools available in
the statistical and machine learning store provided by the
suggested architecture.The results of batch processing will be
stored into model store.
4.2. Data Storage. Data Storage is one of the most challeng-
ing tasks in big data framework, especially in the case of
healthcare monitoring systems which involve large amounts
of data.Therefore, traditional data analysis is unfit to manage
those systems. This component may be HDFS, NoSQL such
as MongoDB and SQL databases, or a combination of all of
them. Therefore, it is more scalable and ensures high storage
capabilities. In the proposed system, the patient data collected
from heterogeneous sources can be classified into structured
data such as EHR, unstructured data like biomedical images,
or semistructured data such as XML and JSON documents.
These data will be stored into raw data store in the target
databases. Streaming data such as social media will be stored
into stream temp data store.
4.3. Stream Processing Layer. Stream data analysis layer is
composed of data synchronization module, adaptive prepro-
cessor module, and adaptive predictor module.
Data Synchronization.The role of data synchronizationmod-
ule is to make sure that data is processed in the correct order
regarding time criterion. In addition, data synchronization
process dismisses measurements that are inconsistent and
takes care of missing values. Detection of inconsistent values
is performed by defining thresholds on the incoming param-
eters.
Adaptive Learning. In many applications, it is assumed that
data preprocessing task is performed by learning algorithms,
or at least that data has been already preprocessed before
its arrival. In the majority of cases, such expectations do
not match reality. This is particularly true for our proposed
systemwhich extracts streaming data from stream temp store.
The need to adapt in the face of data changes led to the
development of adaptive systems, and a key factor for the
long term success of a big data system is adaptability. In
fact, preprocessing is not a task performed in an independent
manner; it is rather a component belonging to the adaptive
system. Moreover, in order to stay reliable and maintain a
certain degree of accuracy, predictive models are supposed
to adapt when data changes occur. As a result, prediction
process may be considered as a part of the adaptive system
that will be the association of two distinct parts that are
adaptive preprocessor and adaptive predictor.
Adaptive Preprocessor. It starts processing operations by
splitting the arriving data flow in time windows. In this
context, sliding window technique is adopted in order to
split data streams into overlapping subsets of tuples (every
tuple is included in multiple windows). For a given window,
the average of every measure is computed and compared to
predefined user’s threshold. If the value of a particular average
exceeds alarming threshold value, it will be stored while an
emergency alert is generated; otherwise, it is simply stored.
When comparisons with threshold values are terminated,
information extraction phase proceeds by selecting relevant
features which will be transmitted to adaptive predictor
component; see [30].
Adaptive Predictor. In order to maintain a certain level
of accuracy, predictors have to update according to data
changes. Otherwise, they will simply become less reliable
through time due to data evolution. Therefore, predictive
model should take into consideration newly arrived samples,
while having, at the same moment, the ability to generate
predictions on a real-time space.
The adaptive feature requires the establishment of a con-
nection between adaptive processor and adaptive predictor.
Through that connection, the predictor sends feedback to the
preprocessor regarding the need to update or not, and then
the preprocessing unit will provide it when necessary with
raw data via a given mapping function. The results of stream
processing will be transferred into stream processing results
store.
4.4. Query Processor. The query processor aims to find the
status of patients by combining the responses of queries sent
to both the stream processing results store and the batch
processing results store.
4.5. Visualization Layer. The analytics layer produces multi-
ple outputs that include, for instance, visualization of patient
8 Advances in Bioinformatics
Figure 3: Big data architecture for healthcare systems.
health monitoring report and predictive decision report.
In healthcare context, the priority in terms of real-time
visualization is given to the most critical information in
order to optimize decision-making and avoid emergencies.
Examples of relevant information encompass patient dash-
boards tracking daily health condition, real-time alerts, and
proactive messages generated by predictions.
Figure 3 shows the proposed big data architecture for
healthcare.
5. Implementation Process for
Detecting Emergency Cases
In this prototype system, we aim to detect potential dangers
of patients. Spark streaming and MongoDB are chosen to
implement themodule of emergency detection figuring in the
visualization layer of the proposed architecture. The system
employs spark to read data fromMongoDB in the batch layer.
The batch jobs run at a regular time interval specified by the
user. Spark streaming is used for processing real-time data
streams; it directly gets data frommedical sources and detects
abnormal situations based on user’s thresholds. Then, spark
streaming sends alerts to MongoDB which will be used to
notify doctors about emergencies. Spark MLLib and spark
streaming techniques are adopted for real-time monitoring
and online learning to predict whether the current state of
patients is danger or not which is the supervised classifica-
tion. The logistic regression model is selected for handling
this supervised classification problem. Figure 4 illustrates the
implementation process of our proposal.
5.1. Diabetic Patient Case Study. Chronic patients must pay
attention to numerous aspects in their daily life. Diet, sport
Figure 4: The implementation process of our proposal.
activity, medical analysis, or blood glucose levels represent
some of those aspects. Medical care of such patients is a
challenging process since a lot of checks are performed
many times during a single day; for instance, some diabetics
measure their blood pressure several times on a daily basis.
The objective of the proposed system is to allow doctors to
perform a real-time monitoring of diabetic patient’s health
condition. First, the real-time alert detection reads directly
from all the incoming data streams provided by sensors
reading; then, for every window data stream, healthcare
measures are compared with user defined thresholds in order
to decide whether the current parameters are abnormal
through mapreduce jobs. In the following step, the average
value of every medical measure is calculated and written
into MongoDB for notification purposes. Figure 5 illustrates
a real-time monitoring of the blood pressure parameter and
Figure 6 visualizes patient measures.
The effectiveness of the proposal is evaluated by con-
ducting experiments with a cluster formed by 3 nodes with
identical setting, configured with an Intel CORE� i7-4770
processor (3.40GHZ, 4 Cores, 16GB RAM, running Ubuntu
12.04 LTS with 64-bit Linux 3.11.0 kernel). Figure 5 illustrates
Advances in Bioinformatics 9
Figure 5: Real-time monitoring of the blood pressure parameter.
Figure 6: Visualization of measured patient parameters.
{
Idpatient: ”PA12”
measuretype: ”blood pressure”
value: ”100mg”
threshold: ”90-120mg”
measuredate: ”2018-04-20”
}
Box 1: JSON document representing patient parameters into Mon-
goDB.
the visualization of patient medical parameters measured by
a given sensor selected by user through a GUI designed for
that purpose.
To evaluate the scalability of the proposal, we used an
Open Source EHR Generator such as Box 1 to produce
medical patient data in HL7 FHIR format which are loaded
into MongoDB as a JSON documents such as Box 1.
6. Conclusion
In this paper, popular healthcare monitoring systems based
on big data have been reviewed. Meanwhile, an overview
of recent big data processing approaches and technologies
has been provided. Then, a big data processing architecture
for healthcare industry has been presented; it is capable of
handling the high amount of data generated by different
medical sources in real time. The proposal is designed
according to big data approaches. The main contribution of
the proposed solution is twofold; first, it proposed a generic
big data architecture for healthcare based on both batch
computing and stream computing providing simultaneously
accurate predictions and online patient dashboards. Then,
a solution prototype implementation based on spark and
MongoDB has been proposed, in order to detect alarming
cases and generate real-time alerts. In the future works,
we project to handle missing value through Expectation-
Maximization (EM) algorithm and we will implement the
semantic module.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
References
[1] WHO, “Global Health and Aging,” 2011, http://www.who.int/
ageing/publications/global health .
[2] A. Gandomi and M. Haider, “Beyond the hype: big data con-
cepts, methods, and analytics,” International Journal of Informa-
tion Management, vol. 35, no. 2, pp. 137–144, 2015.
[3] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile
Networks and Applications, vol. 19, no. 2, pp. 171–209, 2014.
[4] S. Shahrivari, “Beyond batch processing: towards real-time and
streaming big data,”TheComputer Journal , vol. 3, no. 4, pp. 117–
129, 2014.
[5] J. Dean and S.Ghemawat, “MapReduce: simplified data process-
ing on large clusters,” Communications of the ACM, vol. 51, no.
1, pp. 107–113, 2008.
[6] N. Tatbul, “Streaming data integration: Challenges and oppor-
tunities,” in Proceedings of the 2010 IEEE 26th International
Conference on Data Engineering Workshops, ICDEW 2010, pp.
155–158, usa, March 2010.
[7] D. Singh and C. K. Reddy, “A survey on platforms for big data
analytics,” Journal of Big Data, vol. 2, no. 1, 2015.
[8] R. Evans, “Apache Storm, a Hands on Tutorial,” in Proceedings
of the 2015 IEEE International Conference on Cloud Engineering
(IC2E), pp. 2-2, Tempe, AZ, USA, March 2015.
[9] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari, “S4: Dis-
tributed stream computing platform,” in Proceedings of the
10th IEEE International Conference on Data Mining Workshops
(ICDMW ’10), pp. 170–177, Sydney, Australia, December 2010.
[10] M. Zaharia, R. S. Xin, P.Wendell et al., “Apache spark: A unified
engine for big data processing,” Communications of the ACM,
vol. 59, no. 11, pp. 56–65, 2016.
[11] F. Ellen and T. Kostas, Introduction to Apache Flink: Stream
Processing for Real Time and Beyond, Inc, O’Reilly Media, 2016.
http://www.who.int/ageing/publications/global_health
http://www.who.int/ageing/publications/global_health
10 Advances in Bioinformatics
[12] D. Hows, P. Membrey, E. Plugge, and T. Hawkins, “Introduction
to mongodb,” in The Definitive Guide to MongoDB, 16, p. 1,
Springer, Berkeley, CA, 2015.
[13] T. Huang, L. Lan, X. Fang, P. An, J. Min, and F.Wang, “Promises
and challenges of big data computing in health sciences,” Big
Data Research, vol. 2, no. 1, pp. 2–11, 2015.
[14] W. Raghupathi andV. Raghupathi, “Big data analytics in health-
care: promise and potential,” Health Information Science and
Systems, vol. 2, article 3, 2014.
[15] I. Olaronke and O. Oluwaseun, “Big data in healthcare:
Prospects, challenges and resolutions,” inProceedings of the 2016
Future Technologies Conference, FTC 2016, pp. 1152–1157, usa,
December 2016.
[16] A. Belle, R. Thiagarajan, S. M. R. Soroushmehr, F. Navidi, D.
A. Beard, and K. Najarian, “Big data analytics in healthcare,”
BioMed Research International, vol. 2015, Article ID 370194,
2015.
[17] J. Sun and C. K. Reddy, “Big data analytics for healthcare,” in
Proceedings of the 19th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, p. 1525, Chicago, Ill,
USA, August 2013.
[18] M. Bochicchio, A. Cuzzocrea, and L. Vaira, “A big data analytics
framework for supporting multidimensional mining over big
healthcare data,” in Proceedings of the 15th IEEE International
Conference onMachine Learning andApplications, ICMLA 2016,
pp. 508–513, usa, December 2016.
[19] S. Sakr and A. Elgammal, “Towards a Comprehensive Data
Analytics Framework for Smart Healthcare Services,” Big Data
Research, vol. 4, pp. 44–58, 2016.
[20] Y. Li, C. Wu, L. Guo, C.-H. Lee, and Y. Guo, “Wiki-health: A
big data platform for health sensor data management,” Cloud
Computing Applications for Quality Health Care Delivery, pp.
59–77, 2014.
[21] N. Poh, S. Tirunagari, andD.Windridge, “Challenges in design-
ing an online healthcare platform for personalised patient
analytics,” in Proceedings of the 2014 IEEE Symposium on Com-
putational Intelligence in Big Data, CIBD 2014, usa, December
2014.
[22] U. Çetintemel, D. Abadi, Y. Ahmad et al., “TheAurora andBore-
alis Stream Processing Engines,” in Data Stream Management,
Data-Centric Systems and Applications, pp. 337–359, Springer
Berlin Heidelberg, Berlin, Heidelberg, 2016.
[23] X. Jiang, S. Yoo, and J. Choi, “DSMS in ubiquitous-healthcare: A
Borealis-based heart rate variability monitor,” in Proceedings of
the 2011 4th International Conference on Biomedical Engineering
and Informatics, BMEI 2011, vol. 4, pp. 2144–2147, October 2011.
[24] M.M. Rathore, A. Ahmad, and A. Paul, “The Internet ofThings
based medical emergency management using Hadoop ecosys-
tem,” in Proceedings of the 14th IEEE SENSORS, IEEE, Busan,
South Korea, November 2015.
[25] W. Liu, Q. Li, Y. Cai, Y. Li, and X. Li, “A prototype of healthcare
big data processing system based on Spark,” in Proceedings of
the 8th International Conference on BioMedical Engineering and
Informatics, BMEI 2015, pp. 516–520, chn, October 2015.
[26] M. Bachiri, A. Idri, J. L. Fernández-Alemán, and A. Toval,
“Mobile personal health records for pregnancy monitoring
functionalities: Analysis and potential,” Computer Methods and
Programs in Biomedicine, vol. 134, pp. 121–135, 2016.
[27] A. Guyon, A. Bock, L. Buback, and B. Knittel, “Mobile-based
nutrition and child healthmonitoring to informprogramdevel-
opment: An experience fromLiberia,”GlobalHealth Science and
Practice, vol. 4, no. 4, pp. 661–674, 2016.
[28] P. Pelegris, K. Banitsas, T. Orbach, and K. Marias, “A novel
method to detect heart beat rate using a mobile phone.,” Con-
ference proceedings: IEEE Engineering in Medicine and Biology
Society, pp. 5488–5491, 2010.
[29] A. Jović, K. Brkić, and N. Bogunović, “A review of feature
selection methods with applications,” in Proceedings of the 38th
International Convention on Information and Communication
Technology, Electronics and Microelectronics, MIPRO 2015, pp.
1200–1205, Croatia, May 2015.
[30] S. Ramı́rez-Gallego, B. Krawczyk, S. Garćıa, M. Woźniak, and
F. Herrera, “A survey on data preprocessing for data stream
mining: current status and future directions,” Neurocomputing,
vol. 239, pp. 39–57, 2017.
Hindawi
www.hindawi.com
International Journal of
Volume 2018
Zoology
Hindawi
www.hindawi.com Volume 2018
Anatomy
Research International
Peptides
International Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Journal of
Parasitology Research
Genomics
International Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Hindawi
www.hindawi.com
The Scientific
World Journal
Volume 2018
Hindawi
www.hindawi.com Volume 2018
Bioinformatics
Advances in
Marine Biology
Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Neuroscience
Journal
Hindawi
www.hindawi.com Volume 2018
BioMed
Research International
Cell Biology
International Journal of
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Biochemistry
Research International
Archaea
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Genetics
Research International
Hindawi
www.hindawi.com Volume 2018
Advances in
Virolog y Stem Cells International
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com Volume 2018
Enzyme
Research
Hindawi
www.hindawi.com Volume 2018
International Journal of
Microbiology
Hindawi
www.hindawi.com
Nucleic Acids
Journal of
Volume 2018
Submit your manuscripts at
www.hindawi.com
https://www.hindawi.com/journals/ijz/
https://www.hindawi.com/journals/ari/
https://www.hindawi.com/journals/ijpep/
https://www.hindawi.com/journals/jpr/
https://www.hindawi.com/journals/ijg/
https://www.hindawi.com/journals/tswj/
https://www.hindawi.com/journals/abi/
https://www.hindawi.com/journals/jmb/
https://www.hindawi.com/journals/neuroscience/
https://www.hindawi.com/journals/bmri/
https://www.hindawi.com/journals/ijcb/
https://www.hindawi.com/journals/bri/
https://www.hindawi.com/journals/archaea/
https://www.hindawi.com/journals/gri/
https://www.hindawi.com/journals/av/
https://www.hindawi.com/journals/sci/
https://www.hindawi.com/journals/er/
https://www.hindawi.com/journals/ijmicro/
https://www.hindawi.com/journals/jna/
https://www.hindawi.com/
https://www.hindawi.com/
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TAKING INDUSTRY SERIOUSLY IN INFORMATION SYSTEMS RESEARCH1
Chiasson, Mike W;Davidson, Elizabeth
MIS Quarterly; Dec 2005; 29, 4; ABI/INFORM Global
pg. 591
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Introduction
This proposal presents a comprehensive analysis and recommendations to bring the information governance of Jira Healthcare into compliance with industry Best Practices and legal compliance and to better manage our records to streamline the workload of the organization staff, doctors, and nurses to improve patient outcomes.
This proposal will address the following critical issues:
· Regulatory requirements found in the Health Insurance Portability and Accountability Act of 1996 (HIPAA) and the CFR Title 21, Part 11 Pharmaceuticals, and others could later be identified.
· Best Practices that are applicable to the healthcare industry
· Risk Management and Mitigation
· Information Security and Governance
· Records and E-Records Management
· Metrics to evaluate Information Governance (IG) Performance
· Patient record keeping
· Email and social media strategy
· Cloud Computing strategy
· Outline of a Proposed IG Strategic Plan
Information governance is a sensitive part of any organization in the current market, which is a major influence on the company’s functioning and performance. It is important to understand that any organization needs to develop proper information governance to help reduce their current problems, but especially critical to ours because we deal with human life and death. Our practice is having serious issues with the flow of information throughout our practice and the interfaces with other health organizations and with our patients. This is a major threat since one part of the practice might fail to receive information on time, leading to serious repercussion with our patients. Chiasson & Davidson (2018) explained that the lack of the proper flow of information might result to poor management and planning which is a major issue that implies that it is essential to develop the proper understanding of the need for information to be consistent through the workplace. In our case, poor management can also impact patients’ health. In the digital era, confidentiality and privacy of personal data are most important. For our company and patient’s trust, we are compelled to comply with the Health Insurance Portability and Accountability Act of 1996 (HIPAA). This act was passed to improve the US healthcare system’s efficiency and quality through developed information sharing. As well as increasing the use of E-Records, HIPAA has provisions to protect the security and privacy of protected health information. The present issues such as duplication of the workplace information are a threat to the patients’ records which is not appropriate and might discourage the patients from reaching out and opening up regarding their personal issues. The concept is a threat to the workplace’s entire functions because it is a major threat that is not appropriate. Embracing technology within the work environment will help make it easy to store, retrieve, and convey the information to all the available parties within the workplace, which is relevant to how they perform (Ratna, 2019). Technology is essential, but it is important to develop information governance policies early in order to prevent serious missteps, non-compliance and to ensure proper functioning and performance. Also, it is important to introduce and boost information security within the workplace since there is an increased risk of compromise with information technology.
As being addressed, this Information Governance program is a new project that Jira Healthcare has decided to develop and to become an integral part in our business model. For the over 50 years that Jira Healthcare has been operating, we have adapted many changes to keep with the technology and the times. The establishment of this new Information Governance department is one of them. We, as a practice, have struggled with the big change of the digital era in which data and information drive a large part of our business requiring streaming data to be real-time, accurate, accessible, and well protected. Our goal toward this change is to successfully create this new department that will work specifically to governing information throughout the practice.
References
Chiasson, M., & Davidson, E. (2018). Taking Industry Seriously in Information Systems Research. MIS
Quarterly, 29(4), 591-605. doi:10.2307/25148701
Ratna, H. (2019). The Importance of Effective Communication in Healthcare Practice. Harvard Public Health Review, 23, 1-6. doi:10.2307/48546767