Write annotate bibliography for Information governance milestone project.

please write 6 peer review annotate bibliography for Information governance milestone project in Healthcare sector.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

that can relevant to this topic 

 i.  Program and technology recommendations, including:

1.  Metrics

2.  Data that matters to the executives in that industry, the roles for those executives, and some methods for getting this data into their hands.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

3.  Regulatory, security, and privacy compliance expectations for your company

4.  Email and social media strategy

5.  Cloud Computing strategy

The future of technological law: The machine state

James G.H. Griffin

Law School, University of Exeter, Exeter, UK

Advances in technology will challenge and change the current manner in which legal
regulation occurs. It has always been possible to describe governance and law as a
form of technology in itself, but the growth of digital technologies provides a new
means by which to regulate the population. This article posits the theory that the
inherent characteristics of technology will become inherent within the digitisation of
law. As law becomes an increasingly digital entity, it will become more concerned
with perfect reproduction of law upon the person, and so more encompassing in its
scope. In addition, the increasing use of digital technologies in augmented reality, in
3D and 4D printing both in solid and biological matter, poses a fundamental change
in the regulatory relationship between the State and the individual – a challenge the
State will need to address.

Keywords: Susskind; Heidegger; Giddens; technology; copyright

  • 1. Introduction
  • If you really want to take a step back . . . I think what happens is that people in their industry
    become very self-serving . . . instead of thinking what is good for the world . . . I think you
    could become a bit myopic when you only deal with your own industry because you don’t
    get a larger picture of what’s happening and so maybe if I really was to extend it out and
    get a little more philosophical about it, I would say that as humans one of the things we
    want to do is to take the ideas that are in my brain and put them in your brain; and so, we
    have developed amazing things to do this.1

    The development of technology leads inexorably to the development of a ‘machine state.’
    Technology is responsible for enabling innovation, the industrial revolution, the Internet,
    war, peace and the extermination machinery of Auschwitz. Within these events, the machi-
    nations of technology, both unthinking and deliberate, are such that they cannot be ignored.
    In this brief thesis, machination refers to the way in which a machine is capable of influen-
    cing not just our thought, our conversation, our dialogue, our ‘being’ within society and the
    relations between individuals themselves, but the manner in which individuals relate to the
    regulating State and how the State relates to them. Technologies have always influenced us
    in this way, but as technology develops, as we enter into a period of augmented reality, of
    3D and 4D scanning and printing of both non-biological and biological matter, of DNA pro-
    gramming, we are entering a period of hybridised reality and hybridised law. This is the
    reality of thought through technology, of not just seeing the world through technology,
    through Google Glass,2 but also a world where technology will increasingly become a

    # 2014 Taylor & Francis

    ∗Email: j.g.h.griffin@exeter.ac.uk

    International Review of Law, Computers & Technology, 2014
    Vol. 28, No. 3, 299 – 315, http://dx.doi.org/10.1080/13600869.2014.932520


    part of our inner being. It is a bio-synthesis, a machination of technology that will similarly
    lead to a hybridised form of technology and law.

    The regulation of tomorrow, if not today, is a technical machination. The scope of law
    and the scope of technology are ever increasing, the machinations of the technology leading
    to a future of regulation beyond anything yet envisaged: A ‘machine State’ both utopian and
    dystopian. This short thesis will focus squarely upon the most divisive, pervasive and
    important type of regulation, that of copyright law,3 and why the nature of technology
    will shape copyright law and in turn how that shaping of copyright law will fundamentally
    change the development of the State. To do so, this paper will follow the following struc-
    ture: (1) a theoretical consideration of the assumptions made about technology in society;
    (2) an empirical study into those assumptions; (3) the consequence of this for future calls for
    legal reform; (4) the consequence of those calls for the development of society; and, lastly,
    (5) a conclusion placing the thesis of this essay within context.

  • 2. The assumptions made about technology
  • There is much literature available that considers the possible functions of technology, and
    the degree to which it does or does not inform our beliefs and our desires.4 This paper
    does not concern itself directly with that wider question, for the purpose of this paper is
    to consider the machination of technology in the legal regulation of society. First, a
    word is required on the meaning and purpose of ‘technology.’ It is used in the ‘historical’
    Heideggerian sense, namely, to cover everything from traditional technologies such as
    spades and hammers through to the technologies of leadership, i.e. bureaucracy (Heidegger
    1954). The technology of regulation is not by any means new, but perhaps we should con-
    sider more overtly the means by which the tools of that technology operate.

    The tools of which technology is comprised have self-evidently played a critical role in
    the development of society. It is tools around which prehistoric society developed – an
    example is the technology of sticks, axes and so forth.5 It is a means by which culture is
    created and recorded. The tools of technology form an integral part of society, to the
    extent that over time the physical function can become forgotten and neglected. Similarly,
    Heidegger talks of how a person using a hammer as a tool will become one with a tool
    (Heidegger 1927).6 However, we can go further when considering this in relation to the
    State. Landowners are not so likely to consider their physical land as a tool, i.e. of bureau-
    cratic or State regulation. Likewise, with Schools are a physical tool causing a physical
    change upon the developing mind and yet this is not the usual way to consider a School.
    The School undeniably is a tool, a machinic assemblage (Deleuze and Guattari 1980)7

    that leads to physical changes in the developing mind.8 The divergence shows how the
    physical tool element in technocratic regulation has become side-lined. The dialogic and
    discursive, intangible elements become the focus: the effect of those elements rather than
    the physical nature of the tool themselves.

    The crucial element in this discussion is not the well-rehearsed nature of being, but the
    importance of the tool. We can strip concepts such as land ownership bare of its semantic
    and legal connotations. The tool (once identified) is a critical core component. It does not
    extend just to land and Schools. Capitalism is a complex physical tool, a representation of
    objects. Using something as a unit of exchange may also be described as a technology – the
    technology of money. Naturally, the technology of money is something that regulation has
    encouraged. It has been encouraged in a way that permits free competition and enables the
    individual to easily buy and sell. Most importantly, it is a technology that has allowed the
    individual to interact with the law on a daily basis – for instance, through contracts

    300 J.G.H. Griffin

    governed by contract law. Indeed, the technology itself forms a multifaceted matrix, and
    interfaces with the many other types of technology. The selling of objects is often dependent
    upon the underlying technology involved in their manufacture. Moving beyond capitalism,
    the body itself is even a physical tool – part of the ‘bio-power’, we might argue, of which
    Foucault wrote (1976, 1978 – 1979). As with the notion of sexuality, which Foucault ident-
    ified in The History of Sexuality (1976), we could suggest that there is within society a
    desire to move away from the baseness of the idea of the tool. Indeed, to call someone a
    ‘tool’ is even considered as an insult in the West. Yet, this lack of openness and engagement
    has led to a situation – in which we now find ourselves – of having made assumptions and
    overbearing decisions about the nature of the world in which we live. We make assumptions
    of the most basic things, yet those assumptions may have an enormous technological impact
    in our daily life.

    The historical interplay between the individual and technology is complex. Much has
    been written about it. For instance, there are views about how technology has affected
    the development of the human mind,9 the way in which technology affects society10, and
    affects the way in which we see the world around us.11 However, what we tend not to
    think about is how the relationship of law and technology affects the nature of legal regu-
    lation.12 These assumptions are such that they can pose a considerable impact upon the
    involvement of the people within society. Jon Bing, in whose memory and honour this
    issue is published, was a great advocator of access to justice. Throughout his many articles
    and works he was keen to consider the impact of technology upon both the public and legal
    professions. His final article, for instance, focused upon the impact of technical regional
    databases upon our understanding of global legal concepts. His approach is neatly
    summed up in the following quote:

    We need to utilise the advantages of global spread of legal information. We need to find pos-
    sibilities of exploiting the advantage of other jurisdictions having legal material which may be
    of interest. Current principles of using material across frontiers have been forged in a situation
    where it has been difficult to exploit case law or legislative reviews from other countries. . . .
    We should be guided by the vision of WorldLII, and look for knowledge-based solutions
    that seek out and consolidate material upon request of the professional user. If this is realised,
    we will see that the dynamics of the legal system itself, where a legal argument takes into con-
    sideration prior decisions, may over time work itself into a more harmonised view as courts and
    other institutions puzzle together not only the pieces of their national systems, but also try to
    make them fit within a bigger, international picture. (Bing 2010)

    The technology itself, the regulation of that technology, and most importantly, the gaps in
    between, have a direct impact upon the way in which we interface with our law and our
    regulators. Copyright regulation is increasingly important in this relationship because of
    the way in which technology is being characterised as a technological work. Computer soft-
    ware can be covered by copyright because it comprises human readable code (the source
    code), works made using that software will also often be considered a copyrightable
    work, and even the hardware can be. With the growth in technologies such as 3D printing
    and scanning, objects can begin to be copyrightable under existing law in a way in which
    they previously might not have been. The advent of 3D printing with DNA also poses a
    challenge in that content containing DNA will, under current law, also be protectable
    through copyright law so long as it meets the usual (low) subsistence requirements.13

    The more digital technology will enable and regulate uses, the more the reach of copy-
    right will spread, protecting the software that is interpreted as an underlying literary work.14

    This increasing reach of copyright law, which is occurring through the increasing

    International Review of Law, Computers & Technology 301

    infiltration of digital technology in everyday life, is an event that could substantially change
    the relationship of the individual with the State. It changes the dialogue, with the machina-
    tions of the technology acting as an intermediary though which the State can extend the
    scope of regulation, or be a new means for individuals to evade it. This is the new
    machine State.

    So, copyright protection is achieved through forms of technology and laws regulating
    technologies, but copyright principles have been increasingly utilised in a technological
    manner that can affect individuals’ ability to be able to interact and influence those laws.
    The ability of the individual to be able to interface with technology will influence the
    freedom with which they can be involved within dialogue – the dialogue between the tech-
    nology and the State regulation of that technology. Hence the statement:

    . . . when people try to do really innovative things, with business models around creative
    content, it is difficult to automate it because no one understands the value of the use.15

    Such value needs to be captured though technology and, symbiotically, through law – in the
    manner that books were historically protected by the scarcity of the printing press alongside
    legal protection. The ability of certain groups, particularly the creators of content, to be
    engaged in the technological copyright dialogue has also come under the spotlight in the
    minds of the publishers and distributors themselves:

    So they [creators] kind of find themselves on the bottom as well – they have the recording but
    don’t really know what the requirements are to be able to properly license the product.16

    Quite often if you license from an artist they are not quite sure about whether or not they have
    done a cover version of somebody else’s musical work, and then they can’t actually give you per-
    mission in that they can give you permission in their recording [but not the original work] – they
    don’t understand that really.17

    It is not just judicial recognition,18 but also technical recognition, of these groups that
    affects the manner in which the individual compare and access copyright law. As recog-
    nition is affected by the arguments of parties in cases, it is important that the judiciary
    and regulators are aware of these differing creative flows.19 For instance, it is clear from
    the empirical research that was carried out during this study that there is a tendency of
    many firms to favour the collection of information concerning viewing habits. To this
    end they will favour and prefer the use of legislation that protects object identifiers and
    law that is increasingly technological in nature. This in turn will affect the use of other
    types of copyright legislation. Consequently, this is why the judiciary and regulators
    need to be aware of different creative flows of different types of companies, because
    their competing technical demands in terms of the types of law that they favour will
    impact the ability of certain individuals to be able to be involved within the broader
    social dialogue.

  • 3. Empirical evidence of the assumptions about technology in the copyright sector
  • This paper is, in part, based around empirical research and was carried out between 2011
    and 2013. Twenty interviews were carried out that were qualitative in nature, among
    book publishers and music distributors. The majority of the interviews lasted for
    around one hour in duration. The legal knowledge of these groups was considerable
    and reflects the amount of legal interface that takes place on a daily basis – revealing

    302 J.G.H. Griffin

    far more legal knowledge than that likely to be held by the general public. This in itself
    represents a challenge when considering the issues about access to the legal system for
    individuals creating or reusing copyright works. What the interviews revealed was a
    desire to shift towards using legal technology in digital form to regulate copyright
    works, with a focus on that aspect to the detriment of consideration of the wider impact
    upon society.

    The firms that were interviewed varied considerably in size. Some were self-employed
    individuals, others had several hundred employees. As a generalisation, those firms that
    were small in size preferred the current operation of copyright law and technology to
    those larger companies that were more open to change. The change favoured by large
    companies is a shift away from existing copyright protection to focus upon the information
    concerning the usage of copyright content – and it is here where legal protection currently
    falls short. This is probably not what the reader would expect, and this paper will suggest
    that this is due to underlying technological change.

    In relation to the point concerning the assumptions made about technologies generally,
    it became apparent that some right holders were primarily working within the existing fra-
    mework and were not keen to operate outside of that framework. These were companies
    that favoured the existing legal and technological structure:

    I can’t say that I have had any big cases where things ended up on the internet. I think maybe
    ten years ago that happened frequently but quite fast people learned that was not the way to

    It is the interplay between the technology and the regulation that has led to this situation. It
    is by no means the first time; specifically within the context of cultural works, we could cite
    the historical regulation of the book trade, which was predicated around the technological
    characteristics of the printing press. Outside of that context, we can go back to prehistory
    and focus upon the technology of those times. The important aspect is how existing tech-
    nology influences and informs the perceptions of certain key players as to the scope of

    I don’t know what to say, other than for me it [property rights in copyright] is a given.21

    The existence of the technology provides a horizon (Habermas 1984, 1987) against which
    the norm of use is predicated. The above quote was in reply to a question about the central-
    ity of property to copyright regulation, and represented the usual response as to whether
    copyright required property rights in order to function. Despite the lucidity of replies
    when it came to discussing the details of case law, this was not so to the same extent
    with the analysis of the property right itself. To recall, property is both a technical and
    legal concept, and the technology, so to speak, behind the right is something that provides
    the basis for the State – individual dialogue. Property as technology matches in key respects
    property as law, at least so far as right-holders of copyright works are concerned. Each
    mechanical reproduction of tangible or intangible objects can be perceived as being
    owned, whether or not reproduced by that right holder.

    The technology of property has allowed right holders to be able to exert their rights over
    certain property in order to make financial returns. It is that link through to the technology
    of money that has allowed the right holders to be able to produce and distribute works.
    When the property technology and the money technology begin to fail to interface and
    to engage with the public, a lack of returns ensues for the right holder:

    International Review of Law, Computers & Technology 303

    The record company is . . . if our bank manager was hearing . . . about this sort of conversation
    that we probably are gonna have, he probably would withdraw all his support from my
    over-draft because there is no . . . I can’t see any reason why any record company should be
    existing anymore. We do because that’s what we do.22

    This naturally has had some negative effects for certain right holders:

    We never ever gonna see a great album again. You are never gonna hear another . . . a new L.P.
    You never gonna hear a new band, no, so there is no justification for doing it [piracy]. So our
    record label, we are earning from downloads, a sort of like seven or eight per-cent of what we
    have lost on physical sells.23

    And again going back to the music industry I’m sure and I can see it happening in published
    books, is that people are becoming increasingly reluctant compose music, produce books if
    there is no perceived value in it for them. I am not sure how you protect . . . how you
    change that by making new laws. I think you have to simply strengthen the laws that there
    are and to bring the internet and people, you know Google and so on to heel.24

    The future for such right holders is one that reveals a decline in the ability to use the tra-
    ditional technologies of legal property for distribution of copyright works due to a split
    between that legal regulatory technology, the technology of the physical format, and the
    public interfacing with them. In this particular example, the public shifted to other
    formats, to other means of distribution, which will include piracy of the copyrighted pro-
    prietary works. Importantly, the technology available has enabled this. Despite legal
    cases such as MP3.com,25 Napster26 or Grokster,27 file sharing of copyright work continues

    The question this paper seeks to address is whether this disjunct between the technol-
    ogies of distribution, the technologies of the law of the State and the technologies available
    to the individual will remain. One historical parallel that has existed is that the lack of ‘bite’
    that the law has had with technology has historically mirrored the ability of the technology
    to be able to avoid regulation. Difficulties in making copies was mirrored in law by a lack of
    regulation – after all, why would there be a need for regulation if the technology did not
    enable the reproduction to take place? The difference initially arose with the Internet in
    that reproduction could take place on a large scale – of which, of course, much has been
    written (inter alia; Benkler 2006; Boyle 1997; Lessig 1999; Vaidhyanathan 2001).
    However, the underlying technology is changing and once again the alignment of the
    technology with the legal rules is beginning to take place.

  • 4. The desire for surveillance
  • The start of the alignment between technology and legal rules concerns State management
    and surveillance of the Internet. China was one of the countries that originally focused upon
    the issue of surveillance using the Golden Shield system, which predicted the actions of the
    populace through IP monitoring (Feir 1997). Of course, since then, the systems used for
    surveillance have revealed a much broader approach to the relation with State regulation.29

    With that increasing breadth comes three impacts upon the general relationship between the
    technologies of regulation and the technologies of distribution. First, the technologies con-
    cerned are increasingly digital in nature and thus can more easily interface with each other.
    Second, the technologies involved in distribution are being more implicated in the everyday
    activities of individuals moving away from just the central core of copyright. Third, the
    technologies of surveillance enable the technologies of prediction. These factors combined

    304 J.G.H. Griffin

    mean an increasing parallel between the technology of State regulation and the digital tech-
    nologies used in the distribution of works. The parallel is in part a consequence between the
    surveillance of the State and the desired surveillance of the customer of copyright works,
    and the capitalist-compliant underpinning of the Internet. In the same way that the Internet
    as an open system is beneficial for companies who wish to observe the actions of users, so
    the openness of the Internet is an advantage to States who wish to observe what the users are
    doing. Much initial technological development went into the systems of surveillance that
    are now likely to feed into the systems of observation for commercial advantage:

    They created essentially . . . part of what I was describing they are trying to put all the infor-
    mation into a single database, they have already done this. Do you know that company at
    all? It’s called Decibel, they are based here. It was started by an American because he was
    working with both the City of New York with Mayor Giuliani and the FBI and this was
    around the 9/11 period, essentially what they are saying was ‘how does this shit happen,
    like the first bombing of the world Trade Centre, how come we didn’t know that because
    we should have known this.’ If this guy lives here and he is working with that guy and we
    know that guy knows the other one in some country in the middle east, how come that we
    can’t put that simple relationship together, if A knows B and B knows C. So he built a very
    sophisticated database so that they could start to make the connections in terrorist networks
    . . . to put all the people together we know this is a known terrorist, who are his friends and
    business associates lets load them in a data base and they are related to another group of
    people and all of a certain you see relationships . . . so he built this. He then took this and he
    said, because he was a big jazz ‘aficionado’ with a massive collection, he said I should do
    this for . . . [his] . . . own collection of music.30

    The technology of regulation has therefore been able to maintain step with the technological
    developments. Of course, in recent history, arguments of right holders have focused on
    how the law has not kept pace with technological change (Lessig 1999; Vaidhyanathan
    2001) – however, as can be observed, what has instead happened is that regulation of
    digital technologies has been achieved through greater use of digital technologies by the
    law enforcement agencies. In turn, it is this use of technology by the enforcement agencies
    that is leading to similar methods being employed by right holders to monitor users, and to
    consequently then lobby for change which is more technologically centric, and more onto-
    logically correct, for technical regulation.

    This marks a step change away from some of the traditional views about the nature of law
    and the relationship of it with the broader populace. The judiciary have been perceived as a
    key means through which the population perceive the law as current, by the judicial appli-
    cation of it. This is the notion of ‘living law’ (Brandeis 1915 – 16; Ehrlich 1936).
    However, what is happening today is not the judiciary keeping the law ‘current’ – instead,
    it is the action of public and private enforcement agencies. The law has remained steadfastly
    analogue in a digital age. The digital creation and application of law has been achieved in the
    field of enforcement. This creates a substantive law deficit in the digital realm.

    The substantive law deficit is one that occurs on two levels. First, there is the issue that
    the vagueness of analogue rules causes uncertainties alien to a digital world. Digital tech-
    nologies have traceable content, a detectable stream of information, which is not acknowl-
    edged under our analogue laws but is through their application by enforcement. This is
    leading to some changes – for instance, in the UK private copying is (finally) becoming
    legalised31 – but the larger issues of infringement remain, namely, the uncertainties
    caused through the current qualitative tests of the taking of a substantial part and derivation.
    When enforcement takes place, this is enforcement of a vague law, whose vagueness is the
    antithesis of digital technology. For example, it may be unclear whether a character in a

    International Review of Law, Computers & Technology 305

    novel has copyright but an enforcement agency may decide the character does and begin
    infringement proceedings. This creates a noticeable deficit between the law and the enfor-
    cement of law, made more noticeable in the online context where, for instance, fan fiction
    might not be infringing and yet copyright enforcement agencies may decide that it is. Such
    works being online means that they are more identifiable as targets of prosecution threats –
    and there are no legal provisions to deal with this.32

    The second, related, deficit relates to the difference of scope between the digital and the
    analogue. Digital works of any sort are typically accessed, and this has led to an increased
    emphasis upon the licensing of works (Efroni 2011; Rifkin 2000). Furthermore, conver-
    gence and network effects are at play within digital technologies, which means that those
    technologies are constantly multiplying and extending their scope. A method of enforcing
    law and order generally is likely to spread to other areas, as has indeed happened with State
    surveillance methods spreading to more general Internet usage. Likewise, the notion of
    licensing of works would have begun with the need to access data on a computer to be
    able to use it, and so licensing arose as the technically most appropriate means of regulation
    rather than the traditional proprietary sale. Indeed, this has led to calls for reform based
    around the notion of licensing:

    Very specifically, yes, for me for copyright law I would do very specific things.

    One, I would give unlimited access to citizens and compensate rights-holders with some sort of
    a levy or tax scheme and I wouldn’t restrict it to music, it should be for everything, because you
    can’t do everything at once although later you would be able to stack things and obviously you
    can listen to music and read a book at the same time but essentially it’s 24 hours in a day – it’s
    not hard to cover it up we can see what they are doing . . . and instead of looking at it based on
    file size or anything else, it’s just about time you spent with something which gives you a pro
    rata share of that kind of license fee or whatever.

    Second, I think on top of that create a scheme where people then build businesses so that it’s not
    the end of the monetisation.

    Third, I think copyright should be modified so that the term is actually reduced for the exclu-
    sivity period and then there is a longer tail of, OK you can just collect some money but you no
    longer have a real say of the destiny of your work . . . or what people can do with it . . . how they
    can chop it up you get a fee for it . . . you don’t own it for the next 50 years plus now.33

    The calls for these reforms stem directly from the deficits between the State, the technology,
    and the public. These deficits have been characteristic of the disjuncture between traditional
    law and the recent developments in technology. It has primarily been an issue of enforce-
    ment but now the means to enforce the law have caught up. What does this imply about the
    current and future direction of the law? If we cast our minds back to the mid-1990s, there
    was much debate about the introduction of laws for ‘Digital Rights Management’ mechan-
    isms, or to give them another name, technological protection measures.34 Even more debate
    ensued as to how these laws would be enforced. In the event, whilst many bottles of ink
    were spilled discussing how these laws could (and indeed, did) extend beyond the tra-
    ditional boundaries of copyright law (inter alia Boyle 1997; Lessig 1999, 2001, 2004;
    Reese 2002 – 03; Vaidhyanathan 2001), it remained the case that the mechanisms were cir-
    cumvented, that (illegal) copies were distributed, and that the right holders had to turn to
    other means, to secondary liability,35 authorisation rights,36 filtering37 and ISP legislation38

    to attempt to make it difficult for the works to then be distributed. These were attempts by
    analogue laws to regulate what the digital technology could and could not do, and it is best
    characterised as a reasonably futile attempt to make the general public adhere to copyright

    306 J.G.H. Griffin

    Recent State surveillance techniques mark a shift in regulation in that the means of
    surveillance are not analogue but exist embedded within the digital realm. It is, in effect,
    a non-democratic enforcement of digital law; a non-democratic detailing of the analogue
    domestic law. It is a growth of analogue law that because of network effects, because of
    the interconnected nature of digital technologies, has become deeply enshrined within
    the connected technologies. It is an effect that we can also expect to see extend into
    other laws. We can start to observe it within the field of copyright law where, as mentioned
    above, the surveillance tools are being deployed to predict and guess what users will want to
    see, to know what their habits are, to know what they think before they themselves know
    what to think – and the calls in turn for this to be further protected by digital law, digital

    The current legal rules that may protect such measures were passed in the same statutes
    as the DRM laws – notably the US Digital Millennium Copyright Act 199839 and the EU’s
    Copyright Directive 200140 – and protect what is known as ‘Copyright Management
    Information’ (CMI).41 We need to first of all to note that the notion of ‘copyright infor-
    mation’ is a bit of a misnomer in that any work involving something copyright related
    will likely qualify for protection.42 The names of authors, what can and cannot be
    copied, can be stored in this information, as can be more ‘active’ information, for instance
    digital watermarks and other forms of tracking technologies.43 However, the legislation
    concerned remains analogue law, in that they do not directly interface with the original
    digital content. Indeed, it was many years subsequent to the introduction of the specific pro-
    visions that they appeared to have any applicability, and they have been the ugly step-sister
    to the main set of DRM provisions with which they were enacted.

    This situation has begun to change, with an increasing emphasis upon the digital nature
    that regulation needs to hold if it is to be effective. It needs, in essence, to be digital
    regulation – not an analogue law about the subject matter of digital content, but an actual
    interfacing digital law, that is to say, a truly digital law. The proposed EU copyright code
    (at the time of writing in 2014) is significant in this regard – the European Commission
    consultation paper refers to ‘identifiers’44 which may initially sound redolent of the CMI pro-
    visions, but there is a distinction – these identifiers are designed to interface directly with
    the digital content. With the CMI provisions there was more of an analogue gap between
    the law and the implementation through code, but the identifiers mentioned in the European
    Commission paper are quite evidently moving towards a direct digital interface – to quote,
    ‘to create a linked platform, enabling automated licensing across different sectors.’45

    Whilst the formal machineries of Government are working towards an understanding of
    the interface with code, private industry is making greater headways into areas beyond State
    surveillance. As noted earlier, the open structure of the Internet is such that it mirrors
    commercial interests in surveillance. Much of the State surveillance technology was pro-
    duced by private companies and it is natural that this has now been utilised by other
    private companies to assess the actions of users in establishing, for instance, what sort of
    products a consumer may wish to purchase – which is how Netflix operates:46

    Web 3.0 is where technology starts to work for us instead of the other way around . . . so instead
    of me logging into Spotify to figure out what I want. It’s already created stuff for me, but again
    all that’s underpinned by information and the lack of information in this industry is stifling. We
    can’t get to a 3.0 world unless we have basic information.47

    It is that future, that of being able to assess what users want before they know it, which will
    provide the source of financial revenue rather than the sale of copyright content. It is as

    International Review of Law, Computers & Technology 307

    important as the act of licensing itself. The information about the use of content needs to be
    protected, yet there is as of yet no direct means by which all of this information itself can be
    protected through legal regulation:

    Going back to that iTunes audit that we had . . . their number one asset . . . apart from
    obviously their eco system of selling i-products to the world . . . the number one asset for
    iTunes is the database, so they jealously guard it and they have very draconian rules on
    how the metadata should be presented.48

    This could lead – and is likely to lead – to calls for direct protection of this sort of infor-
    mation, but care is required for the network effects of the technology are such that there
    could be consequences upon future uses. For instance, the existence of tracking technology
    in observing how users utilise content49 could extend to cover the use of content in the real
    world – especially with the rise in demand of devices such as Google Glass.50 If that sort of
    device can recognise real-world objects – which it surely will in time – then it is not such a
    jump to realise that licensing of real-world products may be required to view them, that real-
    world objects will be altered at the viewpoint of the user – to enhance the object, or to
    reduce the usability of the object. The world is likely to become increasingly fluid in its
    relationship between the real world and the virtual world,51 and this fluidity is important
    if private enforcement of the virtual world is not to trample all over the real world.
    Already this overlap has become apparent with the programmes of State surveillance,
    but if our everyday activity is affected then, quite clearly, there is a need for direct regulat-
    ory intervention to protect the interests of the user.

    5. The future – symbiosis

    The transposition of State surveillance technologies by private companies further into the
    realm of everyday activities reveals an increasing technocratisation not just of our relation-
    ship with the State, or even with private companies, but also between ourselves. The tech-
    nology of State regulation, of the distributor, and of the individual is moving ever closer
    together. The implications of this are way beyond anything we have experienced so far,
    and probably beyond our current imagination. The digital future is starting here, the
    future in which the technology around us directly interfaces with us. Bio-power, the
    brain child of Foucault (1976, 1978 – 1979), is about to take a radical turn, a turn where
    we can begin to identify a powerful symbiosis between the individual and the technology
    of the State. Google Glass is one such example, where the glasses are a digital interface
    between us and the world, but other examples abound. For instance, 3D printing could
    lead to a situation where the same content identifiers are used to influence the manner in
    which objects can be 3D ‘scanned’ and then reproduced with a printer (Ernesto 2013; Whit-
    warn 2012). They could even be used where content is printed biologically, i.e. using what
    is known as 4d printing where the printing material also prints itself52 – which could be
    done with man-made materials but also biological material using programmed DNA.
    This is the future symbiosis of which we need to be aware, if we are to make informed
    choices for the future technology of regulation and the future of biosynthesis.53 If regulation
    is likely to become symbiotic with the human body, then we should also plan for that as we
    plan for the near future.

    We can observe the basis for the ever-increasing technological symbiosis in the prehis-
    tory of man. The tools, the use of them, the development of State and the regulation of the
    State, all is down to the use of technologies, just as Heidegger identified in his hammer

    308 J.G.H. Griffin

    example where the tool and the man enter a form of symbiosis (Heidegger 1927).54

    Technology is the basis and the means by which individuals have come to interact with
    one another, from the technology of speech to the technology of the Internet; from the
    technology of peace to the technology of war. War machines even bring together human
    minds,55 eliminating difference – the unending, ceaseless, sometimes unedifying and
    ugly truth of technology. The unavoidable future is an ever greater union between man
    and machine, between the technology machines of the State and of the people. Our archi-
    tectures, our ways of being, will become increasingly virtual and biological, and thus it is
    necessary for an informed future of regulation to be ontologically correct, for the law to be
    able to engage with this way of being: to engage with the technology of and within the body,
    and to be aware of its thinking and unthinking machinations.

    The technology affects the way in which individuals communicate and rationalise with
    each other, and with the way in which the State itself has developed. State regulation has
    often been invasive in some form or other, be that through principles relating to censorship,
    copyright, privacy – the technical State of digital technocracy. The rise first of all of CMI,
    then of Identifiers, and then more invasive and involving direct digital regulation, indicates
    a phase shift towards the observation of the individual and the actions of that individual.
    Furthermore, the technological feed of the future, web 3.0, will provide users with an indi-
    vidualised Internet, but in all probability through surveillance algorithms, which will lead
    the user in particular directions. The debate as to the rights and wrongs in terms of driving
    individuals towards them ‘thinking’ what they want has been well covered by Adorno and
    Horkheimer (1947), and their discussants. Future regulation should be aware of the contin-
    ued convergence of technologies in addition to the subject matter. The architecture of
    control will become increasingly important.

  • 6. Conclusion: the machine state
  • Assumptions have been common in the development of the State – from the assumptions
    made about the importance of the technologies, through to the assumptions made about
    the necessity of the technologies of capitalism and economy. These assumptions are insi-
    dious, in that they can lead to failings in dialogue within society between various groups
    (Derrida 1988). Unger (1976), a proponent of group pluralism, identified the importance
    of groups to the development of a society. Technology has played a critical role in the
    formation of these groups, in the ways in which they interact. Likewise, Foucault, in The
    Order of Things (1970, Chapter One esp.), is also indirectly referencing back to the impor-
    tance of technologies within the individual’s perception of the world and of history – even
    if the perception may not be realised or side-lined by more traditional interpretations. The
    assumptions that are made about technology have a direct impact in the initial and
    ongoing formation of groups. Bing discussed how the gathering of information could
    distort our understanding of our immediate world view. It can also influence us as to the for-
    mation of future groups that are the cause and reason for the technology. The structure of the
    society, as created by the technology, is the consequence of earlier groups and so future struc-
    tures are a combination of these groups. There is an analogy with Gidden’s view of structura-
    tion, with the groups’ own dynamics and social life causing change to the structure of society
    and vice versa (Giddens 1984). The machination of the technology provides another process
    to consider, perhaps even the most central for it is upon technologies that all societies depend.

    So, whilst technology has played a critical function in the development of group plur-
    ality, technology in future may also be said to play a critical role. It is likely to form the basis
    of the development of further group pluralities, because law will form new pluralities

    International Review of Law, Computers & Technology 309

    between groups if it takes a directly interventionist form. The closer the integration, the
    greater we can assume the influence of the law – particularly in relation to digital
    factors, where the more certain flows of digital technologies and more certain flows of
    law will influence the forming and evolution of pluralist groups. For example, if the law
    becomes increasingly ‘coded’ in form and meaning, then the language of that code, the
    architecture of the code,56 the dialogue of the code, will form and influence future
    debate and future groups. If a dialogue evolves in a language and form technically alien
    or opposed to another language or form, then the technology itself, with its network
    effects, will lead to clashes and collisions. We see this in the history of clashes between civi-
    lisations in different stages of technical development; those tribes who were obliterated by
    Western civilisations. We also see it in the current development of coding groups, not just
    computer coding groups but groups that define their own coda, their own values and judge-
    ments, arising phoenix-like from the ashes of the failed technologies of others: competing
    clans of computer uses (Amiga v Atari, Apple v PC), competing re-users of content, users v
    creators, creators v publishers, so the codas of groups will clash. The complex technological
    interrelations are key. An example of how subtle dialogue change can take place is with
    regard to the ways in which re-users alter existing content. Re-users may wish to edit or
    make changes to software, but if that software is protected with a DRM or TPM mechanism,
    then in many circumstances the re-user will need to know how to either circumvent the
    technical mechanism or use other software to do so, which still requires some technical
    skill. A technical meritocracy emerges – perhaps the default position of any technical
    order. The structure of DRM and TPM is likewise influenced by legal provisions. Taking
    this further, consider how certain groups in society may be unable to interact with
    technology on an equal footing, due to economic or other social issues (e.g. denying
    access to hardware necessary to perform certain actions); likewise, consider the situation
    of Google Glass, where it is possible that certain levels of hardware will need to be used
    by individuals in certain physical locations in order to be able to fully interact with a hybri-
    dised reality.

    In essence, the interflow of social groupings and technology could be characterised as
    losing its subtlety. In early human history the technologies would act as an initiator, an
    enabler, of societal groupings, e.g. the stone axes and other such technical tools. If bureauc-
    racy is being characterised as technology (as it was by Heidegger 1954), bureaucracy marks
    a gradual shift in that it begins to classify certain acts, certain aspects, groups and layers in
    society as less desirable, shifting the development of society in certain nuanced ways.
    Limits to regulation exist, for instance in the extent to which it can interface with the
    human mind, to its enforceability in general. However, technology that utilises the forms
    of enforcement mechanisms of the sort possible with widespread surveillance can be far
    more invasive both in terms of the precision and detail of its regulation using that technol-
    ogy, in terms of its invasiveness in the actual perception of the world by individuals,57 and
    in terms of the creation of new technical zones (Dyer 2012) of the physical world.58

    Bronowski (1973) had suggested in his work looking at science and society that complete
    control in the scientific way over life was only possible in a ‘push button order,’59 namely to
    destroy life completely. However, what we can see is that a form of push-button order is
    possible through the nuanced interventions of digital regulation – a complex world of 0s
    and 1s, or a complex push-button order. This is not to say that digital technology is inher-
    ently nefarious, but that it is a characteristic of the technology. Technology itself is invari-
    ably about control, internal and external, and so if a technology develops that inherently
    enables complex communications with individuals, then it can inherently lead to more
    control within that field.

    310 J.G.H. Griffin

    The current regulatory system with the emphasis upon surveillance has thus far one sig-
    nificant issue, in that the surveillance system and its direct interface with the technologies of
    the populace is one that, although it will influence the actions of the populace, is not regu-
    lation that has stemmed from the traditional sources of law, namely the democratic insti-
    tutions. As argued earlier, the democratic bodies responsible for passing laws have passed
    laws that are analogue in nature, in that the technologies of such laws themselves do not
    directly interface with the actions of individuals. The surveillance system can directly
    enter into dialogue, instruction, with code used by individuals – and because of the increas-
    ing symbiosis between this technology and the individual, there exists the risk of non-demo-
    cratic control of individual thought. Hence, a starting point for any future regulation would be
    some consideration of whether such a situation of direct interface with code is desirable,
    whether it is something that should be legislated against, or whether there should be a
    direct attempt by legislators to directly regulate through code. With the growth of technol-
    ogies such as Google Glass, 3D and 4D printing technologies, and biotech printing, it is argu-
    able that ontologically speaking there is a need for the State to engage if it is to remain
    relevant, to appear rational, in the way in which it regulates its people.

    If that is to be so, then the manner in which the State interfaces with digital technologies
    will need to be considered. Already we can see the gradual establishment of such an approach
    within fields such as Internet Governance, Pornography and Copyright Licensing. Within
    each field bodies either exist or will exist – e.g. inter alia ICANN,60 the Internet Watch Foun-
    dation,61 and the proposed Copyright Hub.62 Direct regulation therefore exists in the form of
    needing to register new domain names, of the possibility of that registration being revoked
    due to the content contained on the computer hardware, or the deletion of content or licensing
    links for, e.g., the failure to pay licensing fees. However, the exact nature of the intervention,
    the form of that physical technological intervention, is not the sort of issue that tends to
    receive public debate but it should be – for it is a form of dialogue that can influence
    other group interactions. Ideally, then, there should be a means by which to consider the
    digital language of a legislation code – that is, not just the legal code, but also the directly
    interfacing code, be that software or hardware based.

    The technological spider web of regulation thus has multiple and multifaceted impacts
    upon the development of society and of the State itself. Technology today, let alone tomor-
    row, is already in essence within one large techno-biosphere, with increasing convergence
    and homogeneity among the various component elements. Inherent convergence, and the
    inherent network effects, of technologies is such that the digital geography of the space
    of the State will become of ever-increasing importance. Much has been written of digital
    architectures, and the architectures of the real world, as expressed in code (e.g. Lessig
    1999). However, we need to step beyond this, to consider how the State will directly inter-
    face with code and how that code will interface with each and every human being. The
    techno-biosphere is all encompassing, and an all-embracing representation of the dialogue
    of human society.

    However, the issue of network effects poses a significant challenge to the rationality of
    the State through the eyes of the populace. The machination of the technology is simple – it
    is a means by which reproduction becomes possible, not just in the production of things but
    in the technology itself. It is a perfect replicator, which stands in direct contrast to the human
    being, existing as we do as a result of accidental mutations, of imperfect creation and imper-
    fect death. Technology exists akin to a hive colony, its beauty is in perfect reproduction,
    perfect harmonisation. In contrast, the beauty of the human mind is in its difference, in
    its ability to create and make from a harmonious blank canvas. Technology may gradually
    undermine these differences, these mutations, for they do not represent the perfect core of

    International Review of Law, Computers & Technology 311

    technology, of infinite perfect reproduction. If we think of the future, of the possibility of
    printing in DNA strands, of rDNA, and the ability to print life from technology in the
    same way as we procreate through our imperfect acts of human reproduction, what will
    this pose for the future of the human race? Technology and its perfected-ness does not
    need to value our creative values, for just the purity of reproduction is valuable, and there-
    fore we are introducing a system of regulation into the human life which, if left unchecked,
    will provoke a clash of values, between that of perfect and imperfect reproduction. This is
    the ultimate machination of technology, of the machine State. It is the ultimate realisation of
    the science – society debate, but one which realises that technology does imply a degree of
    determinism, not through its use or even its structure, but because of its existence as a means
    of enabling perfect reproduction. Google Glass, all these technologies through which we
    will interface with the world, through which we may be ultimately reproduced in the
    virtual world, will become a means by which the virtual world will ultimately affect our
    own real world. The perfection of reproduction, the innate requirement of sameness, will
    come to inflict itself upon us if there is no realisation by the analogue State of the conse-
    quences of digital control. In 1973 Bronowski, when he discussed the ‘push-button
    order’,63 described a world of blunt technologies, of where 1 and 0 related to the nature
    of the technology vis-à-vis the existence of a human life, but today that technology is
    more nuanced, more able to interface with the thoughts of a human and the inputs to the
    human brains. Furthermore, what of the future, of 3D printing of biological matter with
    DNA, of computers built with DNA rather than binary code, which may or may not of
    themselves value change due to their DNA makeup? It is these challenges, both current
    and potential, and the machinations of the technology, that the State needs to address. To
    quote a music distributor on the relationship of technology and the person:

    Starting with the lyrics several hundred thousand years ago, it . . . [was not realised by earlier]
    . . . humans that what makes homo-sapiens specialist is our ability to do complex language
    through voice, [being] able to get my ideas or more complex ideas from my brain to your
    brain . . . as we moved from writing, to the printing press, into things like recording devices
    . . . creating new forms of getting things from my brain to yours . . . More creative ways of
    saying what’s in my brain instead of merely facts . . . we moved to higher levels.64

  • Acknowledgements
  • This research is in part based around qualitative empirical interviews that were funded by BILETA,
    and a paper given at the 2013 BILETA conference held at the University of Liverpool. The BILETA
    funding was for the project ‘Property in Copyright’. My thanks to all those who contributed comments
    and thoughts on the paper and its underlying thesis, in particular the two anonymous reviewers and the
    editor of this journal.

  • Notes
  • 1. Quote from research interview. The interviews were conducted on an anonymous basis, as

    explained within the body text under Section 3.
    2. See Google Glass documentation at http://www.google.co.uk/glass/start/ and see http://

    googleglassforum.net/. For an example of this technology, see Matthews, http://www.geek.

    3. For reasons explained in Section 2, it is the argument of this paper that copyright law will
    become the most important law of the future, pre-empting most other laws.

    4. For further discussion see below, this section, paragraph 4.
    5. See inter alia Watson (2005), Borstein (1992), Shumaker, Walkup and Beck (2011).
    6. Heidegger (1927) Note SUNY edition trans. J Stambaugh (2010) at 100 – 101.

    312 J.G.H. Griffin






    7. Deleuze and Guattari (1980) at 4 and at 504 – 508. The Rhizome analysis in Chapter 1 could
    apply here.

    8. This analysis is similar to the notion of the visible and the articulable in Foucault (1966).
    9. Meant primarily In the Heideggerian sense that technology is an outcome of our technological

    view of the world – Heidegger (1954) but one could also focus upon the use of tools more gen-
    erally within animal species, e.g. inter alia Shumaker, Walkup and Beck, Animal Tool Behav-
    ior, supra n.5.

    10. Steigler, Technics and Time 1: The Fault of Epimetheus (1998), B Latour, Reassembling the
    Social: An Introduction to Actor-Network-Theory (2005).

    11. Ihde (1990), i.e. embodiment relations (quite literally too – he discusses glasses at 73 and 94).
    12. Most works cited above skirt around the issue of legal regulation. An exception would be Sus-

    skind (1996).
    13. For a comprehensive overview see Caddick, Davies, and Harbottle (1998), Chapter 3.
    14. Caddick, Davies, and Harbottle (eds), ‘Copinger and Skone James on Copyright’ ibid., at §3 – 30.
    15. Empirical interview, see note 1.
    16. Ibid.
    17. Ibid.
    18. A reference to the rule of recognition – Hart, Concept of Law (1961).
    19. See a parallel discussion on information flows, Elkin- Koren (1996), N Elkin- Koren (2002).
    20. Empirical interview, see note 1.
    21. Ibid.
    22. Ibid.
    23. Ibid.
    24. Ibid.
    25. UMG Recordings, Inc. v. MP3.com, Inc., 92 F. Supp. 2d 349 (S.D.N.Y. 2000)
    26. A&M Records, Inc. v. Napster, Inc., 239 F.3d 1004 (CA 9, 2001)
    27. MGM Studios, Inc. v. Grokster, Ltd., 545 U.S. 913 (US Supreme Court, 2005)
    28. See inter alia Lunney (2014), Lessig (2004, 67).
    29. Consider the revelations of Snowden (Guardian 2013) and Manning (Wikileaks 2010).
    30. Empirical interview, see note 1.
    31. See http://www.ipo.gov.uk/types/hargreaves/hargreaves-copyright/hargreaves-copyrighttech

    review.htm for recent developments.
    32. See Griffin and Nair (2013).
    33. Empirical interview, see note 1.
    34. In particular see Litman (2001), Lessig (1999), Vaidhyanthan (2001).
    35. In the UK, under s.24-s.27 CDPA 1988, in the US, under the vicarious and contributory liability

    36. In the UK, s.16 CDPA 1988, in the US, under 17 USC §106.
    37. Consider in the EU, C-70/10 Scarlet v SABAM [2011] ECR I-11959 and in the US, MGM v

    Grokster 518 F.Supp.2d 1197 (CD Cal, 2007).
    38. In the UK, the Digital Economy Act 2010, and in the US, there is the Copyright Alert System of

    the Center for Copyright Information – see http://www.copyrightinformation.org/the-

    39. ‘The Digital Millennium Copyright Act’, Pub. L. 105 – 304, 28 October 1998, 112 Stat. 2860.
    40. Directive 2001/29 of the European Parliament and of the Council of 22 May 2001 on the har-

    monisation of certain aspects of copyright and related rights in the information society, OJ L

    41. See 17 USC §1202, Art 7 EUCD.
    42. The CMI provisions refer to ‘copyright works’, which in practice will be the whole copyright

    work as distributed rather than broken down to each copyright element. There is, however, no
    concrete authority on the point.

    43. For a discussion of those types of watermark see Ferrill and Moyer (2002), Jones, (1999) and
    Page (1998).

    44. European Commission (2013) at 15.
    45. Ibid.
    46. A Madrigal, How Netflix reverse engineered Hollywood, The Atlantic, available at http://www.

    Z Bulygo, ‘How Netflix Uses Analytics To Select Movies, Create Content, and Make

    International Review of Law, Computers & Technology 313




    The Copyright Alert System



    Multimillion Dollar Decisions’ KISSmetrics blog, available at http://blog.kissmetrics.com/how-
    netflix-uses-analytics/, A Leonard, How Netflix is turning viewers into puppets, Salon, available
    at http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into_puppets/

    47. Empirical interview, see note 1.
    48. Ibid.
    49. For example, see http://www.fastcodesign.com/3025318/asides/eyetracking-study-reveals-

    50. See Google Glass documentation at http://www.google.co.uk/glass/start/ and see http://

    googleglassforum.net/. For an example of this technology, see Matthews, http://www.geek.

    51. Consider virtual fluid architecture – Novak (1992).
    52. See for instance Skylar Tibbits TED lecture of how this works – TED Lectures (2013) see

    https://www.youtube.com/watch?v=0gMCZFHv9v8 esp. at 4”10’ for a demonstration.
    53. Cf Fukuyama, 2002, focusing on changes to the human body.
    54. Supra n. 6.
    55. Chiefly a reference to Deleuze and Guattari (1980) supra n 7 Chapter 12. Consider also

    Nietzsche (1882) at §109, §110.
    56. Recall Lessig (1999) in particular the appendix chapter concerning architecture.
    57. Both in terms of augmented technologies and the underlying physical changes caused in the

    body (Fukuyama, 2002).
    58. Consider, for example, Google Glass being the only means by which to access augmented

    reality screens placed around city centres, access dependent on monthly subscription fee.
    Note the reference to Dyer in the main text is a work about the Tarkovsky film Stamlfr
    (Stalker 1979).

    59. Bronowski, (1973) at 374.
    60. The Internet Corporation for Assigned Names and Numbers – see www.icann.org
    61. The Internet Watch Foundation – see https://www.iwf.org.uk/
    62. See the proposals here: http://www.ipo.gov.uk/hargreaves-copyright-dce
    63. Bronowski, The Ascent of Man supra n.59.
    64. Empirical interview, see note 1.

  • References
  • Adorno, Theodor and Horkheimer, Max. (1974). Dialectic of Entitlement (1947) trans. J Cumming

    Benkler, Yochai. 2006. Wealth of Networks.
    Bing, Jon. 2010. Let There Be Lite: A Brief History of Legal Information Retrieval EJLT 1 (1).
    Borstein, Daniel. 1992. The Creators.
    Boyle, James. 1997. Shamans, Software and Spleens.
    Brandeis, Loius. 1915 – 16. The Living Law 10 Illinois Law Review 461.
    Bronowski, Jacob. 1973. The Ascent of Man.
    Bulygo, Zach. “How Netflix Uses Analytics To Select Movies, Create Content, and Make

    Multimillion Dollar Decisions.” KISSmetrics blog, Available at http://blog.kissmetrics.com/

    Caddick, Nicholas, Davies, Gillian and Harbottle, Gwilym (eds). (2010). Copinger and Skone James
    on Copyright. 16th edition.

    Deleuze, Gilles and Guattari, Félix. 1980. A Thousand Plateaus, trans by Massumi, Brian (1987).
    Derrida, Jacques. 1988. Limited Inc.
    Dyer, Geoff. 2012. Zona.
    Ehrlich, Eugen. 1936. Fundamental Principles of the Sociology of Law.
    Efroni, Zohar. 2011. Access-Right.
    Elkin-Koren, Nina. 1996. “Cyberlaw and Social Change: A Democratic Approach to Copyright Law

    in Cyberspace.” Cardozo Arts and Entertainment Law Journal 14: 215.
    Elkin-Koren, Nina. 2002. “The Rule of the Law and the Rule of the Code.” In The Commodification of

    Information, edited by N. Elkin-Koren and N. Netanel. The Hague: Kluwer.
    Ernesto. 2013. 3d Printing Aims to Stop Next-gen Pirates, Torentfreak, at https://torrentfreak.com/3d-


    314 J.G.H. Griffin











    https://www.youtube.com/watch?v=0gMCZFHv9v8 esp








    European Commission. Public Consultation on the review of the EU Copyright rules. (2013) http://ec.

    Feir, Scott. 1997. “Regulations Restricting Internet Access: Attempted Repair of Rupture in China’s
    Great Wall Restraining the Free Exchange of Ideas.” Pacific Rim Law and Policy Journal 6: 361.

    Ferrill, Elizabeth and Moyer, Matthew. 2002. “A Survey of Digital Watermarking”, at Elizabeth.ferill.com/
    papers/watermarking last accessed in 2002 – no longer available.

    Foucault, Michel. The Order of Things, trans Routledge (1970).
    Foucault, Michel. 1976. The History of Sexuality, trans Random House (1978).
    Foucault, Michel. 1978 – 9. (Ed. Senellart, trans Burchell). The Birth of Biopolitics, Lectures at the

    Collège de France (2004/2008).
    Fukuyama, Francis. 2002. Our Posthuman Future.
    Giddens, Anthony. 1984. The Constitution of Society.
    Habermas, Jürgen. 1984 & 1987. Theory of Communicative Action. Vols I & II.
    Hart, Herbert. 1961. Concept of Law.
    Heidegger, Martin. 1954. The Question Concerning Technology and Other Essays, trans. 1977.
    Heidegger, Martin. Being and Time. 1927. SUNY edition trans. J Stambaugh (2010).
    Ihde, Don. 1990. Technology and the Lifeworld: From garden to earth.
    Jones, Richard. 1999. “Wet Footprints? Digital Watermarks: A Trail to the Copyright Infringer on the

    Internet.” Pepperdine Law Review 26: 559.
    Latour, Bruno. 2005. Reassembling the Social: An Introduction to Actor-Network-Theory.
    Leonard, Andrew. How Netflix is Turning Viewers into Puppets, Salon, available at http://www.salon.

    Lessig, Lawrence. 1999. Code.
    Lessig, Lawrence. 2001. The Future of Ideas.
    Lessig, Lawrence. 2004. Free Culture.
    Litman, Jessica. 2001. Digital Copyright.
    Lunney, Glynn. 2014. Empirical Copyright: A Case Study of File Sharing and Music Copyright,

    Tulane Public Law Research Paper No. 14-2.
    Madrigal, Alexis. How Netflix Reverse Engineered Hollywood, The Atlantic, available at http://www.

    Matthews, Lee. Google Glass Becomes Your Personal Translator, Geek.com, available at http://


    Nietzsche, Friedrich. 1974. The Gay Science (1882) trans. W Kaufmann.
    Novak, Marcos. 1992. Liquid Architectures in Cyberspace, in Benedickt, Cybserspace.
    Page, Thomas. 1998. “Digital Watermarking as a Form of Copyright Protection.” Computer Law and

    Security Report 14 (6): 390.
    Reese, Anthony. 2002 – 2003. “The First Sale Doctrine in the Era of Digital Networks.” Boston

    College Law Review 44: 577.
    Rifkin, Jeremy. 2000. The Age of Access.
    Shumaker, Robert W., Walkup, Kristina R., and Beck, Benjamin B. (2011). Animal Tool Behaviour:

    The Use and Manufacture of Tools by Animals.
    Steigler, Bernard. 1998. Technics and Time 1: The Fault of Epimetheus.
    Susskind, Richard. 1996. The Future of the Law.
    Tarkovsky, Andrei. 1979. Stamlfr [Stalker, film].
    TED Lectures. 2013. How 4d printing works, see https://www.youtube.com/watch?v=

    Unger, Roberto. 1976. Law in Modern Society.
    Vaidhyanathan, Siva. 2001. Copyrights and Copywrongs.
    Watson, Peter. 2005. Ideas.
    Whitwarn, Ryan. 2012. How DRM will Infest the 3d Printing Revolution at http://www.extremetech.


    International Review of Law, Computers & Technology 315












    Copyright of International Review of Law, Computers & Technology is the property of
    Routledge and its content may not be copied or emailed to multiple sites or posted to a listserv
    without the copyright holder’s express written permission. However, users may print,
    download, or email articles for individual use.

    • Abstract
    • 1. Introduction
      2. The assumptions made about technology
      3. Empirical evidence of the assumptions about technology in the copyright sector
      4. The desire for surveillance

    • 5. The future – symbiosis
    • 6. Conclusion: the machine state

    Information Governance and Assurance;
    Reducing Risk, Promoting Policy
    Publication info: Records Management Journal ; Bradford  Vol. 24, Iss. 3,  (2014): 253-255.

    ProQuest document link


    Early chapters of the book focus on introducing the concepts of governance and assurance and the UK law and

    regulations that drive these requirements; there was a definite bias toward those laws that specifically concern

    information, in particular, the Data Protection Act, and I would have liked to have seen more about those laws that

    affect the way in which information is managed both in the broader context, e.g. employment law but also a

    reference to the implications and issues when working in a global or international context which can present some

    quite significant challenges when implementing an Information Governance Framework. Data are the focus of a

    whole chapter, and it is a great introduction to the concepts of data management for those who have worked more

    around information policy than the operational delivery of data and information services, systems and solutions.

    Overall, I think that this is a useful addition to the books that are currently available that attempt to address this

    subject area; having worked across the spectrum of information management, records management, information

    assurance, information governance, risk and compliance and information security, I was already familiar with

    much of the content.

    This is one of the few books that brings together the concepts of records and information management and

    information security and is a really solid introduction to the way in which the various information disciplines,

    whether concerned with security and protection or reuse and optimisation, need to come together to ensure that

    information remains useful, yet is appropriately secured to minimise risk.

    Early chapters of the book focus on introducing the concepts of governance and assurance and the UK law and
    regulations that drive these requirements; there was a definite bias toward those laws that specifically concern
    information, in particular, the Data Protection Act, and I would have liked to have seen more about those laws that
    affect the way in which information is managed both in the broader context, e.g. employment law but also a
    reference to the implications and issues when working in a global or international context which can present some

    quite significant challenges when implementing an Information Governance Framework.

    I really like that this book referenced information in all of its forms, including data, which is all too often considered

    as an entirely separate entity, yet remains a challenge when attempting to implement policy or demonstrate or

    assure compliance. Data are the focus of a whole chapter, and it is a great introduction to the concepts of data

    management for those who have worked more around information policy than the operational delivery of data and

    information services, systems and solutions.

    The chapter that focusses on the identification and assessment of threats is really useful and this is followed up

    with a subsequent chapter on the security and protective measures that can be implemented to mitigate the threat

    and any associated risk to the information. Again, this is a useful introduction to the concepts of information risk

    management and information security.

    While there are a couple of case studies, I would have liked this book to include some practical examples or

    potential methodologies that bring together and integrate these information disciplines. Chapter 6 which focusses

    on frameworks and “how it all fits together” identifies all of the various components that are referenced in the

    broad spectrum of “information governance and assurance” and suggests an approach but does not sufficiently




    demonstrate its effectiveness. There are many real challenges that will need to be overcome if a truly integrated

    approach to the management, governance and assurance of information and data is to be achieved within an

    enterprise environment, and it would have been useful to have some tools and techniques that have been proven

    elsewhere for consideration by the reader.

    The challenge that the author faces is that this is such a broad subject that to try to go into any degree of detail is

    not really practical, and this means that much of the content literally introduces a concept rather than go into any

    great detail. I do not think that this is a bad thing though; instead, I feel that this book demonstrates the necessary

    integration of functions that have previously been seen (and treated) as distinctly different in an organisation. It

    highlights that it is no longer practical to produce information policies, to develop and implement security controls

    and to operate and support key information management services in isolation of each other. Instead, it is

    absolutely necessary to develop a framework approach that highlights the importance of each of these roles and

    functions and seeks to establish the way in which they can work together to manage information as an asset in an

    enterprise context.

    Overall, I think that this is a useful addition to the books that are currently available that attempt to address this
    subject area; having worked across the spectrum of information management, records management, information
    assurance, information governance, risk and compliance and information security, I was already familiar with

    much of the content. Through necessity, I had come by this knowledge the hard way, and I feel that this book is a

    really solid introductory resource for those currently working in a specific discipline or those starting their careers.

    It introduces integrated concepts and highlights the various information management principles that need to be

    considered if we are to truly manage information at an enterprise level and as a key business asset that has long

    been a (significant) challenge for many information professionals, irrespective of their specific discipline.


    Subject: Integrated approach; Books; Information professionals; Information management

    Publication title: Records Management Journal; Bradford

    Volume: 24

    Issue: 3

    Pages: 253-255

    Number of pages: 3

    Publication year: 2014

    Publication date: 2014

    Publisher: Emerald Group Publishing Limited

    Place of publication: Bradford

    Country of publication: United Kingdom, Bradford

    Publication subject: Business And Economics–Management


    Database copyright  2021 ProQuest LLC. All rights reserved.
    Terms and Conditions Contact ProQuest

    ISSN: 09565698

    e-ISSN: 17587689

    Source type: Scholarly Journals

    Language of publication: English

    Document type: Journal Article

    DOI: http://dx.doi.org/10.1108/RMJ-08-2014-0034

    ProQuest document ID: 2439157141

    Document URL: https://search.proquest.com/scholarly-journals/information-governance-assurance-


    Copyright: © Emerald Group Publishing Limited 2014

    Last updated: 2020-09-02

    Database: ABI/INFORM Global






    • Information Governance and Assurance; Reducing Risk, Promoting Policy

    Image Data Sharing for Biomedical Research—Meeting
    HIPAA Requirements for


    John B. Freymann & Justin S. Kirby & John H. Perry &
    David A. Clunie & C. Carl Jaffe

    Published online: 29 October 2011
    # Society for Imaging Informatics in Medicine 2011

    Abstract Data sharing is increasingly recognized as critical
    to cross-disciplinary research and to assuring scientific
    validity. Despite National Institutes of Health and National
    Science Foundation policies encouraging data sharing by
    grantees, little data sharing of clinical data has in fact
    occurred. A principal reason often given is the potential of
    inadvertent violation of the Health Insurance Portability and
    Accountability Act privacy regulations. While regulations
    specify the components of private health information that
    should be protected, there are no commonly accepted
    methods to de-identify clinical data objects such as images.
    This leads institutions to take conservative risk-averse
    positions on data sharing. In imaging trials, where images
    are coded according to the Digital Imaging and Communi-
    cations in Medicine (DICOM) standard, the complexity of

    the data objects and the flexibility of the DICOM standard
    have made it especially difficult to meet privacy protection
    objectives. The recent release of

    DICOM Supplement 142

    on image de-identification has removed much of this
    impediment. This article describes the development of an
    open-source software suite that implements DICOM Sup-
    plement 142 as part of the National Biomedical Imaging
    Archive (NBIA). It also describes the lessons learned by the
    authors as NBIA has acquired more than 20 image
    collections encompassing over 30 million images.

    Keywords Data sharing . De-identification .

    Anonymization . Cross-disciplinary research . Open access .

    Open source . DICOM . Supplement 142 . Image archive .

    HIPAA . PHI . Common rule

    This project has been funded in whole or in part with federal funds
    from the National Cancer Institute, National Institutes of Health, under
    Contract No. HHSN261200800001E. The content of this publication
    does not necessarily reflect the views or policies of the Department of
    Health and Human Services, nor does mention of trade names,
    commercial products, or organizations imply endorsement by the U.S.

    J. B. Freymann
    SAIC-Frederick, Inc.,
    EPN, Room 3006, 6130 Executive Blvd,
    Rockville, MD 20892, USA

    J. S. Kirby
    SAIC-Frederick, Inc.,
    EPN, Suite 317, 6130 Executive Blvd,
    Rockville, MD 20892, USA
    e-mail: kirbyju@mail.nih.gov

    J. H. Perry
    Radiological Society of North America,
    820 Jorie Blvd,
    Oak Brook, IL 60523, USA
    e-mail: johnperry@dls.net

    J Digit Imaging (2012) 25:14–24
    DOI 10.1007/s10278-011-9422-x

    D. A. Clunie
    CoreLab Partners, Inc.,
    100 Overlook Center,
    Princeton, NJ 08540, USA
    e-mail: dclunie@dclunie.com

    C. C. Jaffe
    Boston University School of Medicine,
    FGH Building 3rd Floor, 820 Harrison Ave.,
    Boston, MA 02118, USA
    e-mail: carl.jaffe@bmc.org

    J. B. Freymann (*)
    SAIC-Frederick, Inc.,
    EPN, Room 3006, 6130 Executive Blvd,
    Bethesda, MD 20892-7412, USA
    e-mail: freymannj@mail.nih.gov


    Advancing imaging research to serve as a critical element
    in clinical therapeutic trials requires that imaging methods
    be developed, optimized, and validated using commercial
    clinical imaging instruments. This applies particularly to
    quantitative imaging as a bio-marker for drug development
    or measurement of drug response. For example, there is a
    critical need to harmonize data collection and analysis across
    the different commercial platforms used in clinical practice to
    ensure robust correlation of image-derived parameters with
    clinical outcome. In addition, data integration with other
    laboratory-based molecular bio-markers requires a fundamen-
    tal understanding of the physical and biological measurement
    uncertainty in order to convert data to knowledge or support a
    medical intervention. The National Cancer Institute (NCI)
    Cancer Imaging Program has supported research initiatives to
    improve the performance and reproducibility of imaging
    methods, including development of imaging technology,
    software tools for clinical decision making, and development
    of molecular probes to incorporate the molecular basis for
    clinical decision making. Central to these efforts is a
    fundamental need for a widely adoptable, image-focused
    informatics infrastructure along with data archives that
    provide a common framework for data exchange and
    shareable methods to validate current and emerging imaging
    agents and methods.

    Public funding agencies have long recognized the
    importance of data sharing in cross-disciplinary research.
    National Institutes of Health (NIH), for example, has had a
    final statement for grantees on sharing research data since
    2003 and a published guidance for grant recipients since
    2006 [1]. Nevertheless, little data sharing has occurred
    outside the framework of prearranged links between
    research groups. One reason for the unwillingness of
    institutions to share clinical research data is the variety of
    local interpretations of Health Insurance Portability and
    Accountability Act (HIPAA) regulations enforced by HHS
    Office of Civil Rights. In this environment, the most
    comfortable stance for institutional IT departments has
    been to adopt risk-averse postures [2].

    In the science community, mainstream stakeholders like
    NIH, FDA researchers, PhRMA, and the device industry
    continue to emphasize the importance of data and image
    sharing in policy statements. New societal attitudes toward
    funding science have focused renewed attention to data
    sharing as a way to break down silos, accelerate progress,
    and reduce research redundancy [3]. Besides access to a
    greater universe of data available for research purposes and
    assuring the validity of scientific claims, data sharing
    provides other advantages to individual researchers by
    producing more citations [4, 5]. Biomedical research
    containing clinical data in particular motivates new justifica-

    tion for encouraging data sharing since the bedrock of
    disease-based clinical genetics and cellular discovery rests on
    data derived from human subjects. Moreover, genetic
    research must rely on large population sample sizes, making
    conclusions derived from such data too costly to replicate by
    other investigators. The data from each individual is obtained
    at great cost and effort. If such data were sequestered in
    small isolated collections and cannot be cross-queried, the
    research community suffers. Investments in large-scale
    national and international bio-specimen genetic projects are
    underway by the NIH, including The Cancer Genome Atlas
    [6] and the Cancer Human Biobank [7]. To be adequately
    studied and analyzed, such tissue-specimen genetic data
    must be accompanied by the individual’s clinical data, a key
    component of which could include non-invasive imaging
    obtained for diagnostic purposes. Sharing such images
    requires informed consent by the patient and robust removal
    of protected health information (PHI) from the images.

    At a technical level, the field of diagnostic imaging has
    benefited from a long historical investment in the Digital
    Imaging and Communications in Medicine (DICOM) standard
    by equipment manufacturers and devoted personnel in the
    professional radiological societies [8]. In the context of image
    sharing, DICOM Working Group 18 has recently developed
    Supplement 142 (ftp://medical.nema.org/medical/dicom/final/
    sup142_ft , accessed 28 February 2011) that provides
    important guidance for de-identification of images and related
    data objects.

    This manuscript describes the challenges faced and
    lessons learned during development and production imple-
    mentation of an open-source suite of software that imple-
    ments Supplement 142 for de-identification in the context
    of an NCI-sponsored public biomedical image archive,
    National Biomedical Imaging Archive (NBIA). These tools
    have matured through extensive field use over the past
    several years and offer a method sufficiently tested to
    assure de-identification, transfer, management, and distri-
    bution of DICOM images and XML objects. While this
    software suite is freely available for download and use [9],
    the focus of this paper is not to advocate for these specific
    implementations but rather to provide guidance for evalu-
    ating tools appropriate to a given context.

    Technical Issues in Multi-center Data Sharing

    Clinical trials and other research-driven image collection
    activities often produce a combination of image and non-
    image data objects. Preserving the interrelationships between
    these objects while de-identifying their PHI is challenging.
    Images are typically encapsulated in DICOM datasets that
    contain identifiers for a trial, a patient, a study, a series (of
    images), etc. Increasingly, non-image data objects are encap-
    sulated in XML files. All data objects in a given research set

    J Digit Imaging (2012) 25:14–24 15



    must share common identifiers if the correspondences among
    them are to be preserved. Since the original identifiers inserted
    into the data objects when they were created can be PHI, they
    are almost always replaced by pseudonymous values (PHI
    encrypted by an appropriate authority) that maintain the
    relationships among the data objects but break the connection
    to the specific human trial participant [10]. When multiple
    data object types are present in a trial, the de-identification
    mechanism must support all the data types such that the
    identifying links between them are maintained.

    It is possible to discern subtle differences in the mean-
    ings of the words “de-identification” and “anonymization,”
    but in this paper, they will be used as synonyms, with the
    former being preferred. In a multi-center clinical image
    collection project, images are generally received by a data
    system via the DICOM protocol, usually from a PACS
    workstation or modality. Non-image data objects are
    generally transferred to the clinical trial system via HTTP.
    Once the data objects have been received, they are de-
    identified and then transmitted to a principal investigator
    site, contract research organization, or a centralized archive,
    usually in another location, via the Internet.

    Data Transmission

    Although clinical image data are de-identified at the
    originating institution before transmission, many trials
    require that the data be transmitted using Secure Sockets
    Layer to provide encryption. Some trials use Transport
    Layer Security (TLS) to provide both data encryption and
    client/server authentication.

    Most clinical image data transfer on the Internet requires the
    penetration of at least one firewall. Most projects employ
    software that makes outbound connections from the secure
    network at the image acquisition sites to the principal
    investigator site. This relieves the image acquisition sites from
    having to open a port to the Internet, but it requires one port to
    be open to the Internet at the principal investigator site—a
    requirement that some IT departments are unwilling to support
    (see Fig. 1). Some clinical trial transfer packages allow two
    programs to run together at the principal investigator site to
    pull data into the secure network from the DMZ without
    having to open a port to the secure network. A DMZ
    (demilitarized zone) is an interface sub-network that exposes
    an organization’s external services to a larger untrusted
    network. It provides an additional layer of security to an
    organization’s local network. Others use virtual private
    network technology to allow image acquisition sites to access
    the secure network at the principal investigator site. Most
    clinical trial data transfer packages support all those options.

    Once in the secure network at the principal investigator
    site, data objects must be validated (checked that they belong
    to a specific trial), curated (assure that data file structure

    allows it to be viewable as an image), organized, and stored.
    This process, which varies from project to project, requires
    software that is flexible enough to allow human intervention
    in the process. In all projects, access to the stored data must be
    controlled. In large image archive acquisition projects,
    multiple layers of storage in staging servers may be involved
    prior to data being made available more generally.


    The objective of de-identification is to ensure that data
    objects cannot be connected to a specific human subject
    [11]. The HIPAA Privacy Rule [12] defines two approaches
    to removal of PHI: one that leaves the decision as to what
    constitutes PHI to a nominal expert and the other that pre-
    defines 18 categories of identifiers to specifically remove or
    conceal, i.e.,

    The following identifiers of the individual or of relatives,
    employers, or household members of the individual must
    be removed: (1) Names; (2) all geographic subdivisions
    smaller than a state, except for the initial three digits of
    the ZIP code if the geographic unit formed by combining
    all ZIP codes with the same three initial digits contains
    more than 20,000 people; (3) all elements of dates except
    year, and all ages over 89 or elements indicative of such
    age; (4) telephone numbers; (5) fax numbers; (6) email
    addresses; (7) social security numbers; (8) medical record
    numbers; (9) health plan beneficiary numbers; (10)
    account numbers; (11) certificate or license numbers;
    (12) vehicle identifiers and license plate numbers; (13)
    device identifiers and serial numbers; (14) URLs; (15) IP
    addresses; (16) biometric identifiers; (17) full-face photo-
    graphs and any comparable images; (18) any other
    unique, identifying characteristic or code, except as
    permitted for re-identification in the Privacy Rule.

    Note the ambiguity of item 18. The Federal Register in
    2006 presents the rule [13], and NIH guidance is provided
    under the title “Research Repositories, Databases, and the
    HIPAA Privacy Rule” [14]. In research data, such informa-
    tion is typically replaced with pseudonymous values that
    allow trial subjects, studies, and data objects to be related to
    one another but not connected to a specific human being.

    To fully de-identify a DICOM image, PHI must be
    removed from both the metadata elements and the pixels of
    the image itself. De-identifying metadata is complicated by
    the fact that manufacturers and even end users of medical
    imaging equipment often use DICOM elements in a way that
    legitimately extends or does not conform to the standard,
    resulting in PHI sometimes being found where not normally
    expected. In addition, manufacturers sometimes place PHI in
    private elements, the contents of which are unspecified in the

    16 J Digit Imaging (2012) 25:14–24

    DICOM standard, and not reliably clarified in conformance
    statements. These complications require a de-identification
    system to be flexible enough to be configured to handle
    special circumstances as they arise [15].

    The removal of PHI burned into the pixels of diagnostic
    images is even more difficult. This can be performed
    completely manually (http://www.dclunie.com/pixelmed/
    accessed 28 February 2011), but several groups have developed
    approaches for discovering text information burned into the
    pixels of an image. In most of these efforts, image processors
    use optical character recognition to flag possible PHI. As yet,
    none seems provably robust enough to be acceptable for
    automatic processing without a human observer in the loop.
    The DICOM standard provides an element used to indicate that
    an image contains PHI, but the element is not universally
    supported, and in any case, it does not indicate where in the
    image the PHI is located. The best approach appears to be using
    the DICOM metadata elements to identify those images
    particularly at risk of containing burned-in PHI, such as specific
    modalities including ultrasound, or those images with elements
    suggesting that they are screen captures (e.g., of 3D recon-
    structions or other post-processed images). In some cases,
    specific templates for the locations of burned in text can be
    applied based on the device manufacturer and model. Care
    needs to be taken to address PHI present in the high (unused)
    bits of the pixel data that may be used as overlays.

    DICOM Supplement 142

    The DICOM standard provides important guidance for de-
    identification. In DICOM PS 3.15, Annex E, “Attribute
    Confidentiality Profiles,” the standard defines the Basic
    Application Level Confidentiality Profile, which specifies
    requirements for applications that de-identify and/or re-
    identify dataset attributes and (in Table E.1-1) lists a set of
    attributes that are subject to the profile (ftp://medical.nema.
    org/medical/dicom/2009/09_15pu , accessed 28 February
    2011). This profile was added in Supplement 55 in 2002
    (ftp://medical.nema.org/medical/dicom/final/sup55_ft ,
    accessed 28 February 2011), but it has proven to be insufficient
    for robust de-identification. During the development of the IHE
    Teaching File and Clinical Trial Export (TCE) profile (http://
    accessed 28 February 2011), additional standard material was
    added to elaborate on the issues of de-identification and
    pseudonymization, but it too does not define a comprehensive
    and detailed approach.

    Accordingly, Supplement 142 (ftp://medical.nema.org/
    medical/dicom/supps/sup142_pc , accessed 28 February
    2011) was developed, to provide more detailed guidance for
    de-identification of data objects for various purposes. The
    supplement is built on a Basic Profile that takes a very
    conservative approach to removing or replacing any
    information about the identity of the patient, their family

    Fig. 1 CTP software performs
    custom scriptable
    de-identification behind the
    institution’s firewall. The files
    are then securely transferred
    through the Internet to the host
    NBIA where they are
    re-inspected for DICOM validity
    and thorough de-identification
    before they are made publically

    J Digit Imaging (2012) 25:14–24 17










    members, any personnel involved in the procedure, the
    organizations involved in ordering or performing the
    procedure, and additional information that might be
    combined to associate the object with the patient.

    Supplement 142 also provides several options appropriate
    to special situations. Two classes of options are defined, those
    that require significant and burdensome effort to remove
    additional information (and which may not be justified in low
    risk scenarios) and those that define retention of information
    that would otherwise be removed, but without which a
    particular type of research would be impossible. Common
    examples of the latter include the need to retain date
    information in therapeutic oncology trials, without which
    dates of progression or response cannot be determined, the
    need to retain patient characteristics related to body size for
    whole body PET studies, without which standardized uptake
    values cannot be computed, and the need to retain image and
    device (but not patient) unique identifiers that may be required
    for the audit trail. In such cases, the additional information that
    is needed for the conduct of the trial may not be permitted by
    regulation, and therefore, additional permission is required
    either from the subject or from the institutional review board
    (IRB) or ethics committee. The options defined in Supplement
    142 are intended to provide a small and tractable set of
    standard definitions with accompanying justification, such
    that each IRB and consent form can reference the standard
    categories, rather than debating the merits of individual
    DICOM data elements.

    The options defined in the supplement are:

    & Clean Pixel Data Option: removal or distortion of the
    actual pixel data where there is identification information
    burned in as annotation text

    & Clean Recognizable Visual Features Option: removal or
    distortion of the actual pixel data where there is possibility
    of visually identifying the individual in the images

    & Clean Graphics Option: removal of identification
    information encoded as graphics, text annotations, or
    overlays (excluding Structured Report SOP classes)

    & Clean Structured Content Option: removal of identifi-
    cation information in Structured Report SOP classes

    & Clean Descriptors Option: removal of identification
    information from descriptive tags which contain unstruc-
    tured plain text values over which an operator has control

    & Retain Longitudinal Temporal Information Options: re-
    tention or modification of tags that contain dates or times

    & Retain Patient Characteristics Option: retention of
    physical characteristics of the patient that are descrip-
    tive rather than identifying information (e.g., metabolic
    measures, body weight, etc.)

    & Retain Device Identity Option: retention of information
    about the identity of the device used to perform the

    & Retain UIDs Option: retention of the unique identifiers
    for studies, series, instances, and other entities in the
    DICOM model

    & Retain Safe Private Option: retention of private attributes
    known to be safe

    Supplement 142 was drafted by leading industry experts in
    DICOM Working Group 18. In particular, those involved in
    international pharmaceutical clinical trials for regulatory sub-
    missions were broadly consulted, and indeed, the work effort
    was initiated as a consequence of discussion during a Drug
    Information Association Medical Imaging Stakeholders Call
    for Action in 2007. Global regulations were considered,
    including not just the HIPAA Privacy Rule but also the
    European Privacy Directive. Supplement 142 provides a
    platform for consistent de-identification that meets global
    regulatory requirements and is thus a substantial contribution
    to medical research.


    The NBIA [16] is an open-source software suite developed
    under the aegis of the caBIG program of the NCI’s Center for
    Bioinformatics [17] and Information Technology [18]. The
    software has been installed at numerous institutions for use in
    sharing image collections. This section introduces the
    software and describes its use in the acquisition, management,
    and distribution of image collections by the NCI’s Cancer
    Imaging Program and other institutions running the software.

    National Biomedical Imaging Archive Project

    NBIA [19] is a highly scalable, DICOM-based image archive
    that provides full submission-to-retrieval functionality opti-
    mized for the requirements of the in vivo medical imaging
    clinical and research communities. It combines image
    acquisition and processing capabilities with submission
    reporting and quality control tools to facilitate inter-
    institution data sharing. NBIA provides query access to more
    than 90 DICOM tag elements. These can be queried through
    three levels of search interfaces as well as an API. It integrates
    cine-view, thumbnails, and full DICOM element previews. A
    saved-query feature provides a unique reference keyword for
    direct linkage to data sets from publications, etc. Data
    download is supported through a Java download manager
    for larger collections. Non-DICOM metadata can be contained
    in XML or Zip files and linked at the image series level when
    appropriate. Images can be grouped within collections for
    specific research purposes, and the NBIA supports pop-up
    menus that can provide short summaries of these collections
    or link to external information sites such as Wikis or other
    web sites.

    18 J Digit Imaging (2012) 25:14–24

    The NBIA web application allows users to search for,
    manage, and retrieve DICOM images. The web application is
    written in Java and relies on the JSF presentation framework.
    It is deployed on a JBoss application server. The image
    metadata indexed by the web application is stored in a
    MySQL or Oracle database. The DICOM images themselves
    are stored in a file system of the administrator’s choice. NBIA
    provides a collection- and submission site-based authorization
    model that is implemented using NCI’s Common Security
    Module. This allows an administrator to create public access
    and restricted access data sets as needed. Additionally, the
    NBIA system includes a caGrid data service based upon the
    caBIG NCIA_MODEL version 3 [20]. The grid service
    provides the ability to retrieve DICOM images using the
    caGrid Transfer service, allowing for multiple installations of
    NBIA to seamlessly communicate and share images in a
    federated manner.

    NBIA integrates a separate software package, RSNA’s
    Clinical Trial Processor (CTP), to manage the transfer of
    images into the NBIA system. In a project employing
    NBIA, CTP is installed at both the data acquisition sites
    and an NBIA site. These sites are often called client and
    server sites, respectively. CTP is configured to de-identify
    data objects at the client sites to ensure that PHI never
    leaves the originating institutions. At the client site, images
    are both de-identified and tagged with provenance infor-
    mation in private elements for use in indexing the images.
    The CTP at the client site then transmits the data objects to
    the CTP at the NBIA server site, which stores the images in
    a file system and extracts information from the DICOM
    elements for storage in the NBIA relational database.

    NCI’s Cancer Imaging Program has used NBIA to create
    more than 20 research image collections. These collections
    and more that will follow are intended to make medical
    imaging case studies available to a wide cross-disciplinary
    research community. NBIA has also been used to establish a
    nationwide infrastructure for sharing images, supporting
    stratification of patients in adaptive clinical trials, cross-
    disciplinary research on response measurement fundamentals,
    and increasing the research community’s awareness of image
    reliability analysis.

    NBIA’s archive and open-source tools provide:

    & Multiple research image data collections, encouraging
    development of reliable quantitative measurement of
    change over time by supplying longitudinal clinical
    response imaging case studies to a wide research

    & Real-time, multi-institutional image access, supporting
    protocol stratification strategies in adaptive trials

    & Support for cross-disciplinary research on response
    measurement fundamentals and analysis of quantitative
    reproducibility studies

    For clinical trial data residing in non-public-access
    archives, these same software tools implement role-based
    security to permit selected PHI to remain in place. In this
    situation, access to such images requires formal permission
    granted by the signing of a limited dataset agreement [21].

    Clinical Trial Processor

    CTP is a tool developed by the Radiological Society of North
    America (RSNA) for autonomously processing data objects in
    clinical trials. It is written entirely in Java and runs on Unix,
    Linux, Solaris, Mac OS, and Windows. It runs either stand-
    alone or as a Windows service on XP, Vista, and Windows 7.
    The program’s interface is provided by an integrated web
    server with several servlets that provide access to status and
    configuration information. Complete documentation on CTP
    is located on the RSNA MIRC Wiki [22].

    Processing in CTP is organized into pipelines [19], each
    consisting of a sequence of stages, where each pipeline stage
    is designed to perform a specific function. CTP is highly
    configurable, allowing administrators to construct pipelines
    to meet a wide variety of requirements. CTP currently
    provides 25 standard pipeline stages in four categories:

    & Import Services receive data objects from external
    sources and queue them for subsequent processing.

    & Processors receive a data object as it flows down the
    pipeline, take some action, and pass on the object to
    the next stage. Actions can range from simply logging the
    passage of the object to modification of the object.
    Processors are synchronous stages, not passing on the
    object until processing is complete.

    & Storage Services receive a data object as it flows down
    the pipeline, store a copy of the object in some kind of
    storage system, and then pass the object on to the next
    stage. Storage Services are synchronous.

    & Export Services receive a data object as it flows down the
    pipeline, queue a copy of the object for subsequent
    transmission to an external system, and then pass the object
    to the next stage. The queuing process is synchronous; the
    subsequent transmission occurs asynchronously.

    CTP is designed to be easily extended by the addition
    of new pipeline stages and database adapters.

    To be useful, a clinical data object must contain
    identifiers that relate it to other data objects. CTP supports
    four types of data objects, three of which provide
    standardized access to the identifiers and data they contain:

    & FileObjects are data objects of indeterminate contents.
    This is the superclass of the other three types, but on its
    own, it is not useful because it does not provide access
    to the required identifiers.

    J Digit Imaging (2012) 25:14–24 19

    & DicomObjects are DICOM datasets. This type provides
    all the necessary identifiers as defined in the DICOM

    & XmlObjects are XML documents. XML provides for the
    encapsulation of text-based data. Many XML schemas are
    in use in clinical trials today, and there is no standard
    definition of how the required identifiers are encoded. The
    CTP XmlObject attempts to find identifiers by looking in
    a sequence of commonly used schema locations.

    & ZipObjects are zip files containing one or more data
    files plus a file called manifest.xml which contains the
    required identifiers. The manifest.xml file is located in
    the root of the zip file’s directory tree, and it obeys a
    standard schema. The ZipObject provides a way to
    encapsulate collections of related data objects in any
    format while still carrying the identifiers which allow
    them to be related to other objects in the trial.

    Since data objects in clinical image collections are generally
    produced by clinical systems, they almost always contain PHI.
    Among the most important standard pipeline stages in CTP are
    ones for de-identifying data objects. CTP provides four
    standard pipeline stages for modifying data objects to remove
    PHI and replace it with pseudonymous values:

    & The DicomAnonymizer modifies DicomObjects in
    accordance with a script. The script is written in a
    simple language that provides many functions for
    handling specific types of DICOM elements. Both
    CTP and the independent clinical trial management
    software written by the American College Research
    Imaging Network use this language. CTP provides a
    special servlet to simplify the process of defining a
    DicomAnonymizer script. This servlet allows the
    administrator to define the rules for de-identification
    of each individual DICOM element. Since de-
    identification is a complex technical field, the DICOM
    committee has released Supplement 142 to the standard,
    specifying de-identification profiles and options for
    various purposes. One of the authors (JK) has written
    script implementations of all the Supplement 142
    profiles and options, and these are built into CTP. The
    CTP DICOM Anonymizer Configurator also supports
    user-defined profiles. The default de-identification
    script is the most stringent one defined in Supplement
    142 (the Basic Profile). This provides access to a de-
    identification mechanism that is in common use and has
    been vetted to meet regulatory requirements for protect-
    ing patient privacy. The configurator servlet allows the
    administrator to select a profile as a starting point and
    modify it to meet any special needs of the trial.

    & The DicomPixelAnonymizer modifies DicomObjects
    by blanking regions of the pixels in a DicomObject in

    accordance with a script. The script consists of a
    sequence of signatures and region sets. A signature is
    a boolean calculation based on the contents of the
    DicomObject’s elements. Each signature is accompa-
    nied by a list of rectangular regions to blank in images
    that match the signature. When processing a DicomOb-
    ject, the DicomPixelAnonymizer computes each signa-
    ture value in turn, chooses the first one that matches,
    and then blanks the regions associated with it.

    & The XmlAnonymizer modifies XmlObjects in accor-
    dance with a script written in a language that is inspired
    by, but is much simpler than, XPath. CTP provides a
    special servlet to simplify the process of defining an
    XmlAnonymizer script.

    & The ZipAnonymizer modifies the manifest.xml file in a
    ZipObject in accordance with a script written in a language
    that is identical to that used by the XmlAnonymizer. When
    de-identifying ZipObjects in a clinical trial, one must
    remember that since the ZipObject can contain files of any
    format, PHI may be contained in places that the ZipAno-
    nymizer does not modify. For this reason, ZipObjects are
    most useful for encapsulating the analytic results of
    programs that operate on prior de-identified objects.

    Import and export pipeline stages provide for the
    reception and transmission of data objects. CTP includes
    five standard import stages and five standard export stages
    that support the common protocols (HTTP(S), DICOM, and
    FTP) as well as manual import from directories and

    & HttpImportService receives data objects via the HTTP
    and HTTPS protocols.

    & PollingHttpImportService makes an outbound connection
    to a PolledHttpExportService and receives data objects in
    the input stream of the connection, thus avoiding the
    necessity of opening a port for inbound connections.

    & DicomImportService implements a DICOM Storage
    SCP for the receipt of DICOM data objects.

    & DirectoryImportService imports (and removes) data
    objects that appear in a directory.

    & ArchiveImportService copies data objects from a
    directory tree and processes them, leaving the objects
    in the original location unmodified.

    & HttpExportService transmits data objects via the HTTP
    and HTTPS protocols.

    & PolledHttpExportService serves data objects in response
    to received connections.

    & DicomExportService implements a DICOM Storage
    SCU for the transmission of DICOM data objects.

    & FtpExportService transmits data objects via the FTP
    protocol, organizing them on the destination server by
    study identifier.

    20 J Digit Imaging (2012) 25:14–24

    & DatabaseExportService provides a queued interface to
    an external database.

    In situations where a port to the internet cannot be
    opened on the secure network at a principal investigator
    site, two instances of CTP can be run, one on the secure
    network and one in the DMZ, using the polled HTTP stages
    to allow data objects in from the Internet without opening a

    The DatabaseExportService interfaces with an external
    database through an extension of the standard CTP Data-
    baseAdapter class. In the NBIA project, the NCI wrote a
    DatabaseAdapter (NCIADatabase [sic]) that receives parsed
    data objects from the DatabaseExportService and extracts
    information for storage in an external SQL database
    (MySQL or Oracle). This mechanism provides a flexible
    way to build complex databases without having to manage
    the transfer, or even the parsing, of the data objects

    A Survey of Image Collections and Tools

    Several research alliances are actively developing both
    publicly accessible biomedical image databases and soft-
    ware tools to support them. In some cases, the tools
    themselves are accessible for download, allowing new
    research groups to utilize them in posting their own
    datasets. In other situations, the software is a customized
    solution with more limited scalability to other use cases.

    To gain a better understanding of the characteristics of
    the various approaches, a search for biomedical imaging
    tools and archives was carried out. Since any such survey
    would be rapidly out of date, the information gathered is
    posted on a Wiki [23]. The goal of this resource is to allow
    members of the research imaging community to find image
    collections and tools for creating new collections, to
    participate in the review, and to ensure that posted
    information remains as accurate and up to date as possible.
    A well-maintained site that catalogues mostly open-source
    image software analytic tools is also available on another
    Wiki. [24]


    The process of building the collections housed at the NIH
    NCI NBIA produced a number of lessons learned with
    regard to effectively managing the process of collection, de-
    identification, and distribution of DICOM images for
    research. They are presented here as points to consider
    not only to users of the NBIA and CTP software suite but
    also to anyone developing or assessing similar tools. This
    section presents the key lessons learned.

    Support Multiple Means for Submitting Data

    Data have been submitted to the NCI NBIA archive from
    many sources via several communication protocols. Among
    the most common ways that DICOM objects have been
    imported into CTP at the client site are:

    & Transmission via HTTP(S) on the Internet, usually from
    a tool such as RSNA’s FileSender

    & Transmission via the DICOM protocol from a PACS or
    workstation at the submission site

    & Physical delivery on CD/Hard Disk via mail, some in
    the format of DICOM CDs, others simply image files

    Any software suite must be able to import data from all
    these media. Although the transport protocol varies,
    DICOM is the dominant format for the image data itself.
    Occasionally, images have been received in a non-standard
    format, but we have found that converting such images to
    DICOM expands their utility.

    Use DICOM Supplement 142 Profile Templates

    Institutions and vendors vary widely in the ways they create
    and de-identify images. The de-identification rules for a
    collection depend on the intended use of the collection as well
    as the initial state of the images as they are acquired. For
    example, patient studies containing PHI must be de-identified
    fully, but previously de-identified studies obtained from another
    collection may require little or no additional modification. The
    de-identification process must therefore be very flexible.

    Before the publication of Supplement 142, developing
    de-identification scripts for a variety of use cases required a
    thorough understanding of DICOM, and the scripts them-
    selves took substantial time to write and test. Having
    implementations of the Supplement 142 profiles available
    in the CTP de-identification stages greatly simplifies the
    task and improves the confidence of the submitters and
    curators that regulatory requirements are being met. It also
    allows the de-identification rules to be changed quickly for
    specific submissions when necessary. Proper use of the
    Supplement 142 profiles also provides a historical record
    within each DICOM object detailing the previous profiles
    applied to de-identify the images. This practice, discussed
    in further detail below, allows consumers of the data to
    clearly evaluate how the image was de-identified and also
    clarifies what additional steps may need to be taken if the
    data are being repurposed for a new audience.

    Do Not Overdo De-identification

    Image collections generally contain data from many
    patients, each often having multiple studies and series. To

    J Digit Imaging (2012) 25:14–24 21

    maximize the benefits of such collections, the identifiers in
    the data objects must retain the ability to distinguish among
    patients, studies of a single patient, etc. Any implementa-
    tion of the Supplement 142 profiles must be careful to
    provide pseudonyms for such identifiers rather than fixed
    values. For example, if every patient ID were named to the
    same value, then most DICOM software would treat your
    entire dataset as though it consisted of only a single patient.

    Dates require special attention. Maintaining the temporal
    relationships among studies of the same patient adds
    significantly to the utility of an image collection, but
    original calendar dates themselves are PHI and must
    therefore be modified. This is addressed in the Supplement
    142 “Retain Longitudinal Temporal Information Options.”
    The simplest implementation is to offset dates by an
    interval that is the same for all images in the collection.
    Prior to the creation of Supplement 142, we had found it
    convenient to use intervals large enough that users of the
    collection do not question whether the dates have been
    modified. However, it was later discovered that offsetting
    the dates by large increments can cause problems in some
    DICOM software if the resulting dates are prior to the
    1980s. Supplement 142 specifies that the Attribute Longi-
    tudinal Temporal Information Modified (0028,0303) should
    be populated with a value of “MODIFIED” to make it clear
    that dates have indeed been altered. This is a simpler and
    more effective solution.

    Do Not Rely on DICOM to Indicate Burned-in PHI

    PHI burned into the pixels of images poses a serious
    problem for public research archives. Technologists or
    PACS administrators are sometimes unaware that these
    types of images exist in their local systems. A wide
    variety of such images have been received for the NCI
    collections, including not only clinical images containing
    patient names in their pixels but also digitized billing
    records in DICOM wrappers. Of significant concern is
    the recent practice of scanning the patient exam request
    document into the DICOM study series to record the
    clinical need for the exam and validate billing. That scanned
    image, usually a final series in the study, is often full of PHI
    both in the DICOM tags and within the image. Most
    commercial software intended for de-identification fail to
    address the special content of that series.

    Strategies for dealing with this issue are provided by the
    Supplement 142 “Clean Pixel Data” and “Clean Graphics”
    options, but the identification of the images themselves can
    be a problem. Some DICOM elements that can be useful

    & (0008,0016) SOP Class UID: value indicating Second-
    ary Capture and Ultrasound SOP Classes

    & (0008,0008) Image Type: The values SECONDARY
    and SCREEN SAVE indicate a suspect image, but they
    are not definitive

    & (0028,0301) Burned-in Annotation: The value YES is
    definitive, but this element is often not supplied in
    DICOM images, since it is optional for most objects
    and a relatively recent addition to the standard

    & (0018,1016) Secondary Capture Device Manufacturer:
    The value of this element can be used to discriminate
    against certain image types that may contain PHI

    & (0018,1018) Secondary Capture Device Manufacturer’s
    Model name: The value of this element can be used to
    discriminate against certain image types that may
    contain PHI

    Image collection tools must have a means for scanning
    such elements and segregating images for special attention
    based on defined criteria. CTP provides filter stages driven
    by a script language that allows testing the values of all
    DICOM elements and automatically quarantining objects
    that fail the test.

    Keep an Audit Trail of De-identification History

    It is often necessary to know the de-identification history of an
    image. DICOM Supplement 142 meets this need by defining
    standard profiles, the codes for which can be used as an audit
    trail. For example, if in the process of de-identification one used
    the Basic Application Confidentiality Profile with the option to
    Retain Longitudinal With Modified Dates, one would also
    populate the De-identification Method Code Sequence
    (0012,0064) with the corresponding Coding Scheme Desig-
    nators for those changes. If the biomedical image community
    were to adopt this standard, it would be much easier to
    understand the history of how an image was de-identified and
    to make decisions on whether further changes are needed as
    images are repurposed for consumption by new audiences.

    A separate audit trail of exactly what values have been
    replaced may also be maintained, but must be protected since
    by definition it may contain PHI. If this is done within the
    DICOM image file itself, it must be encrypted, and data
    elements are provided for that purpose; their use is deprecated,
    however, since any encryption scheme becomes vulnerable
    over time and such images may be archived indefinitely.
    Supplement 142 warns about this, and accordingly if any such
    audit trail is required, it should probably be maintained separately
    from the images and both logically and physically protected.

    Enable Local Mapping Between Anonymized Identifiers
    and PHI

    When questions arise about the integrity of the submitted data,
    it is often necessary for an administrator at the submitting site

    22 J Digit Imaging (2012) 25:14–24

    to examine the original data to determine whether the
    problems are within the original data or if they were created
    during the process of de-identification or transmission. To do
    so, the anonymized identifiers obtained from the collection
    curator must be translated back to the original PHI. The CTP
    IDMap stage can be used to provide this translation of
    identifiers. To have access to this function, a user must be
    authenticated and have administrator privileges. This func-
    tionality may also be necessary in situations where image data
    are to be correlated with additional data types that have not or
    will not be de-identified.

    Provide End-to-End Transport Verification

    In many clinical trials, each submission is accompanied by a
    case report form or an IHE TCE manifest. In most
    submissions to research image collections, however, no
    manifest is available to identify the individual images, series,
    etc. that have been transmitted. NBIA and CTP therefore
    include special tools to verify that the submitted data has been
    received and successfully processed.

    Once such tool, the CTP Database Verifier, can be used at
    the submitting site to ensure that all transmitted data made it all
    the way into the NBIA database. This tool tracks the de-
    identified UIDs of every object that is sent to the archive and
    then periodically queries the NBIA server via its relational
    database to confirm the object was received and stored. This
    has saved both submitters and curators substantial effort. The
    NBIA View Submission Report function is also useful for
    comparing totals of data objects received by the system against
    counts of the original submissions, although this tool is more
    often used for general reporting and auditing of what has been

    Provide Multiple Levels of Data Verification

    We have used CTP’s filtering stages to verify that the
    metadata of images matches the protocol of a study and to
    quarantine images that fail before they are added to the
    collection. We have also used the QC Tool in the NBIA
    software to verify the content of the data manually. The tool
    is designed to allow a curator to see both the images and
    corresponding DICOM elements in a single view.
    Because PHI can occur both within the image pixels
    and the metadata elements, we have found that having
    the ability to view both simultaneously substantially
    decreases the level of effort involved in managing submitted

    We have also found that having a built-in method for
    deleting images has been necessary more often than
    expected. This allows curators to easily remove data that
    have eluded detection for not matching the protocol or for
    containing PHI in unexpected places.

    Carefully Estimate Resources Required

    It is easy to underestimate the time and effort involved in
    collecting and managing images for image collections. While
    the maturity of CTP and NBIA has grown significantly over
    the past few years, it still requires between 1 and 4 h for an
    expert CTP user to provide training to a new site manager on
    how the submission process works and to do preliminary
    setup. In a large project, this justifies setting up a help-desk
    function. Preliminary setup is typically followed by small-scale
    submission tests to ensure the data arrives as expected
    (modality, number of images per study, de-identification
    completeness, etc.). Again, the use of CTP implementation of
    the Supplement 142 profiles has greatly reduced the amount of
    setup time required. It does not, however, completely remove
    the need for careful checking by a small test submission of the
    implementation before large-scale acquisition is started.

    Although the combination of CTP and NBIA can be run
    autonomously, it is important to provide human oversight, not
    only to ensure that privacy regulations continue to be met as
    data from new acquisition sites are received but also to ensure
    that the data added to the collection are consistent with the
    collection’s intended use. Tools such as the ones described
    here reduce the workload of the collection’s human curator,
    but they do not eliminate it. Thus, anyone considering hosting
    a truly open biomedical image archive should also allocate
    staff resources for the collections’ curators.


    Publicly shared archives of image data are an increasingly
    critical element of cross-disciplinary research, especially for
    clinical biomedical research where diagnostic images of the
    spectrum of human disease and its response to therapy are a
    scarce commodity. As genetic biomedical understanding
    develops, one of the significant contributions of clinical
    imaging will be to produce very large collections that can be
    subjected to statistical tests of validity. Without a greater
    confidence in the image de-identification process, open-access
    DICOM archives that can be queried to correlate with genetics
    will never achieve their potential. Some international efforts
    besides those described in this paper are ongoing with the
    intent to achieve similar ends [25]. In the absence of
    community consensus on image de-identification and user-
    friendly tools and SOPs, researchers have been understand-
    ably reluctant to create publicly accessible image archives.

    This paper suggests that developments in standards and
    technology have removed key stumbling blocks to the
    creation of these valuable archives. The DICOM Commit-
    tee, through Supplement 142, now offers a robust frame-
    work for de-identification meeting the privacy regulations.
    The incorporation of these guidelines into easy-to-use

    J Digit Imaging (2012) 25:14–24 23

    image acquisition and management tools, coupled with the
    increasing availability of open archive solutions, should
    facilitate the creation of the image archives needed for the
    next generation of biomedical research.

    Acknowledgments The authors wish to acknowledge the support of
    the Radiological Society of North America in developing and
    promoting the deployment of CTP. We would also like to recognize
    the extensive contributions of members of DICOM Working Group 18
    in the development of Supplement 142.


    1. National Institutes of Health. http://grants.nih.gov/grants/policy/
    data_sharing/. Accessed 28 February 2011

    2. Ohm, Paul, Broken Promises of Privacy: Responding to the
    Surprising Failure of Anonymization (August 13, 2009). Univer-
    sity of Colorado Law Legal Studies Research Paper No. 09–12.
    Available at SSRN: http://ssrn.com/abstract=1450006. Accessed
    28 February 2011

    3. Nelson B: Empty archives. Nature 461–10:160–163, 2009
    4. Vickers AJ: Whose data set is it anyway? Sharing raw data from

    randomized trials, Trials 2006. BioMed Central 7:15, 2006
    5. Piwowar HA, Day RS, Fridsma DB: Sharing detailed research data is

    associated with increased citation rate. PLoS One. 2(3):e308, 2007
    6. National Institutes of Health. http://cancergenome.nih.gov/.

    Accessed 28 February 2011
    7. National Institutes of Health. http://biospecimens.cancer.gov/archive/

    cahub/default.asp. Accessed 28 February 2011
    8. Branstetter 4th, BF, Uttecht SD, Lionetti DM, Chang PJ:

    SimpleDICOM suite: personal productivity tools for managing
    DICOM objects. Radiographics. 27(5):1523–1530, 2007

    9. National Cancer Institute, Cancer Imaging Program. https://wiki.
    +Supplement+142+into+CTP. Accessed 28 February 2011

    10. Noumeir R, Lemay A, Lina JM: Pseudonymization of radiology
    data for research purposes. J Digit Imaging. 20(3):284–295, 2007

    11. Hrynaszkiewicz I, Norton M, Vickers AJ, Altman DG: Preparing
    raw clinical data for publication: guidance for journal editors,
    authors, and peer reviewers. Trials 11:9, 2010

    12. Health and Human Services. http://www.hhs.gov/ocr/privacy/
    hipaa/understanding/summary/. Accessed 28 February 2011

    13. National Institutes of Health. http://privacyruleandresearch.nih.
    gov/pdf/FinalEnforcementRule06 . Accessed 28 February

    14. National Institutes of Health. http://privacyruleandresearch.nih.
    gov/research_repositories.asp. Accessed 28 February 2011

    15. González DR, Carpenter T, van Hemert JI, Wardlaw J: An open
    source toolkit for medical imaging de-identification. Eur Radiol.
    20(8):1896–1904, 2010

    16. National Institutes of Health. https://imaging.nci.nih.gov/ncia/
    login.jsf. Accessed 28 February 2011

    17. National Institutes of Health. http://ncicb.nci.nih.gov/. Accessed
    28 February 2011

    18. National Institutes of Health. https://wiki.nci.nih.gov/dashboard.
    action. Accessed 28 February 2011

    19. National Institutes of Health. https://wiki.nci.nih.gov/display/CIP/
    NBIA+at+CBIIT+Image+Collections. Accessed 28 February

    20. National Institutes of Health. https://cabig.nci.nih.gov/tools/sharable/
    caGrid&status=True. Accessed 28 February 2011

    21. National Institutes of Health. https://wiki.nci.nih.gov/display/Imaging/
    Limited+dataset+−+user+agreement. Accessed 28 February 2011

    22. National Institutes of Health. http://mircwiki.rsna.org/index.php?

    23. National Institutes of Health. https://wiki.nci.nih.gov/display/CIP/
    CIP+Survey+of+Biomedical+Imaging+Archives. Accessed 28
    February 2011

    24. National Institutes of Health. http://www.idoimaging.com/index.
    shtml. Accessed 28 February 2011

    25. Lien C-Y, Onken M, Eichelberg M, Kao T, Hein A: “Open source
    tools for standardized privacy protection of medical images”,
    Proc. SPIE 7967, 79670M, 2011. doi:10.1117/12.877989

    24 J Digit Imaging (2012) 25:14–24



































    • Image Data Sharing for Biomedical Research—Meeting HIPAA Requirements for De-identification
    • Abstract
      Technical Issues in Multi-center Data Sharing
      Data Transmission
      DICOM Supplement 142
      National Biomedical Imaging Archive Project
      Clinical Trial Processor
      A Survey of Image Collections and Tools
      Support Multiple Means for Submitting Data
      Use DICOM Supplement 142 Profile Templates
      Do Not Overdo De-identification
      Do Not Rely on DICOM to Indicate Burned-in PHI
      Keep an Audit Trail of De-identification History
      Enable Local Mapping Between Anonymized Identifiers and PHI
      Provide End-to-End Transport Verification
      Provide Multiple Levels of Data Verification
      Carefully Estimate Resources Required


    Health Informatics Journal
    2019, Vol. 25(4) 1538 –1548

    © The Author(s) 2018
    Article reuse guidelines:

    DOI: 10.1177/1460458218779101


    Same, same but different:
    Perceptions of patients’ online
    access to electronic health records
    among healthcare professionals

    Sofie Wass
    Jönköping University, Sweden

    Vivian Vimarlund
    Jönköping University, Sweden; Linköping University, Sweden

    In this study, we explore how healthcare professionals in primary care and outpatient clinics perceive the
    outcomes of giving patients online access to their electronic health records. The study was carried out as a case
    study and included a workshop, six interviews and a survey that was answered by 146 healthcare professionals.
    The results indicate that professionals working in primary care perceive that an increase in information-sharing
    with patients can increase adherence, clarify important information to the patient and allow the patient to
    quality-control documented information. Professionals at outpatient clinics seem less convinced about the
    benefits of patient accessible electronic health records and have concerns about how patients manage the
    information that they are given access to. However, the patient accessible electronic health record has not led
    to a change in documentation procedures among the majority of the professionals. While the findings can be
    connected to the context of outpatient clinics and primary care units, other contextual factors might influence
    the results and more in-depth studies are therefore needed to clarify the concerns.

    electronic health records, healthcare service innovation and IT, organizational change and IT, patient-
    centeredness, work impact


    During the last years, there has been an increasing focus on patient involvement and patient engage-
    ment through a more patient-centered approach.1–4 Various strategies have been developed which
    emphasize the importance of patient-centered care and the new demands that are associated with
    this type of care.5,6 A great deal of effort has been put into, for instance, digital services which
    potentially enhance information-sharing and communication with patients.4,7 One example of this

    Corresponding author:
    Sofie Wass, Jönköping International Business School, Jönköping University, P.O. Box 1026, 551 11 Jönköping, Sweden.
    Email: sofie.wass@ju.se

    779101 JHI0010.1177/1460458218779101Health Informatics JournalWass and Vimarlund

    Original Article





    Wass and Vimarlund 1539

    type of service is patient accessible electronic health records (PAEHRs) which have the potential
    to improve healthcare delivery and health outcomes8,9 with benefits like improved recall and
    understanding of healthcare information,10,11 increased adherence,12 and improved communication
    between patients and healthcare professionals.10,11

    However, there exist divergent reports about the effects that PAEHRs have on healthcare profes-
    sionals and their daily work.8,9,13–19 A review identifies no negative effects for healthcare profes-
    sionals who met with either outpatients or inpatients that could access the EHR.8 These effects are
    also reflected in a study of a major project, where few physicians reported that they perceived an
    increased workload, because visits took longer time or they received more questions from their
    patients.9 Other studies report on concerns, from both primary care physicians and specialist physi-
    cians, about patients becoming anxious or misinterpreting the content of the record.13–17 However,
    there are studies that report that workload concerns, expressed prior to the system implementation,
    diminished when the service was actually put into practice.9,18

    In general, studies that describe the benefits of PAEHR and the effects on healthcare profession-
    als seem to focus on multi-payer or market-based financial systems.9,12,20,21 In this study, we focus
    on the perceptions of healthcare professionals who work in a publicly supported healthcare system
    and compare the perceptions of professionals working in primary care and in outpatient clinics. We
    seek answers to the following research questions: How do healthcare professionals perceive the
    outcomes of PAEHRs, in their daily work activities? Do healthcare professionals perceive different
    outcomes depending on the unit that they work in?

    Case description

    In February 2015, the Region of Jönköping County gave all patients, aged 18 or older, online
    access to their EHRs. Through the national patient portal, patients can read their health-related
    information, including medical notes, diagnosis and vaccinations. The medical notes that are
    accessible on the system include information that was registered after 1 July 2014. If a patient
    wants access to notes that were made before this date, they can request and receive these notes on
    paper. The decision to make the information available in this manner was made to give healthcare
    professionals the opportunity and the time necessary to modify the language that they used in the
    medical notes, before patients access the records online. Healthcare professionals have 14 days
    after documentation of the event to confirm and, if necessary, correct the medical notes before they
    become accessible to the patient. After 14 days, all notes are accessible, whether they have been
    confirmed or not. Healthcare professionals can use two keywords to keep information inaccessible
    to the patient. These keywords can be used to withhold diagnoses which need further investigation,
    and notes about sensitive life-situations; for example, violence against women. At the time of the
    study, patients had been able to access the EHR for 15 months.


    This study was performed as an explorative case study22 that included a workshop with six par-
    ticipants, six interviews and a survey answered by 146 healthcare professionals. First, we con-
    ducted a workshop to identify the expected benefits and drawbacks of the PAEHR. The workshop
    participants included the project manager of the PAEHR, an eHealth strategist, a director of
    communication, two physicians and the system owner. The participants were selected based on
    their knowledge of the PAEHR and their ability to represent different actors in the organization.
    During the workshop, we used a technique called “Pains and Gains” to structure the workshop.23

    1540 Health Informatics Journal 25(4)

    The participants were asked to develop a “persona”24 who represented a healthcare professional
    whose patients could access the EHR. The persona was then used as a representation for health-
    care professionals experiencing the introduction of the PAEHR and each participant was asked
    to indicate the expected benefits and drawbacks that the persona would face due to the PAEHR.
    The benefits and drawbacks were written on post-it notes and presented by each participant to
    the group. Finally, the group discussed the benefits and drawbacks that were identified and
    agreed on the most important outcomes.

    During the second step of our research process, we interviewed four physicians, one nurse and
    one therapist (from both primary care and clinics) to seek evidence that the results from the work-
    shop had some generality. Additional information was gathered during the interviews, which com-
    plemented the previously identified benefits and drawbacks. The interviewees were recruited from
    a group of professionals that had answered a survey in an earlier study and had indicated their
    willingness to participate in further studies. The interviewees were asked to answer questions
    regarding their perceived benefits and drawbacks of the PAEHR and whether patient–professional
    communication had changed since patient access to the EHRs was granted. The interviews were
    recorded, subsequently transcribed and analyzed by means of inductive content analysis.25 The
    first author reviewed the transcripts of the interviews and identified sentences that focused on the
    benefits and drawbacks of the PAEHR. These different sentences and paragraphs were then labeled
    with a code. By collating the codes that were related to each other in terms of their content, differ-
    ent themes were derived from the interviews.

    After analyzing the interviews, an online survey was created which was based on insights that
    were gathered from the workshop and the interviews. The survey was sent out to healthcare
    professionals at six different sites in the region: three primary care units and three outpatient
    clinics. The respondents included healthcare professionals who were responsible for document-
    ing information in the EHR, that is, 47 physicians, 50 nurses, 11 assistant nurses and 38 occupa-
    tional therapists. The sites were selected to enable a comparison between the perceptions of
    healthcare professionals working at primary care units and at outpatient clinics. Participation
    was voluntary and anonymous and the individual answers were not accessible to any executives
    at the Region of Jönköping County. The distribution of the surveys lasted for 3 weeks in May
    2016 and in November 2016 to extend the data gathering. Two reminders were sent out and, in
    total, 146 healthcare professionals completed the survey (total response rate 45%): 75 percent of
    the respondents were women, age between 24 and 67, with a median working experience of
    16 years or more. Twenty surveys, which did not include answers to all the questions, were
    excluded. The respondents answered along a 5-point Likert-type scale whether they agreed or
    disagreed to different statements. We examined the results across a three-level grade and com-
    parisons between the respondents at primary care units and outpatient clinics were made. A sum-
    mary of the data collection activities is presented in Figure 1.

    Figure 1. Overview of the data collection activities.

    Wass and Vimarlund 1541

    The quality of case studies can be judged by a number of criteria.22 In the present study, con-
    struct validity has been achieved through the use of multiple sources of evidence and by allowing
    respondents to review collected data. To increase reliability, we strove to be transparent about how
    the data were collected and we organized and structured the data into different categories.


    The results are presented in three sections. First, we present the results from the workshop and then
    the results from the interviews and the survey.

    Results from the workshop

    The results from the workshop indicated that the participants expected the following benefits:
    enhanced information-sharing, the possibility of establishing mutual understanding between the
    patient and the care provider, increased patient involvement and a better prepared patient. The
    expected drawbacks included the following: healthcare professionals being expected to be more
    up-to-date about their patients’ situations and healthcare professionals being unable to document
    desired information in the EHR. The participants of the workshop also mentioned the risk of
    patients misinterpreting and getting upset about the information that is recorded in the EHR.

    Results from the interviews

    The interviewees noted a number of benefits and drawbacks with the PAEHR. The key themes that
    emerged from the interviews are presented in Table 1 and described in more detail, with quotations
    from the interviewees, in the following section.

    Table 1. Benefits and drawbacks identified from the interviews.

    Benefits Respondent Drawbacks Respondent

    Enhanced information access P1, P3, N Exposed and vulnerable patients P1, T
    Improved understanding P1, P2, P4, T Negative impact on work processes P3, P4
    Increased patient involvement P1, P2, T, N Worries and misunderstandings P2, P3, P4
    Positive impact on work processes P1, P2, N

    “P” represents physicians P1, P2 and so on. “N” represents the nurse and “T” represents the therapist.

    The PAEHR was viewed as an initiative that made information easier to access for the patient.
    The interviewees spoke about patients accessing the information where and whenever he or she
    wants. “… it can be positive for some patients, especially those who cannot take in all the infor-
    mation, then they can access and read it at home in peace and quiet.” (P3) Several respondents
    mentioned improved understanding as a benefit for the patient. “If you have been to an appoint-
    ment and met the doctor, to be able to log in and double-check that you have understood the doc-
    tor …” (T) They mentioned the possibility of accessing the EHR to repeat what was said, “It can
    be nerve-wracking to have a doctors’ appointment, and you can have it as an extra memory.” (P4)
    It was also described as a way to ensure that there was a mutual understanding of the situation,
    “… if you need to repeat [the information], then you can understand more of what we really had
    a mutual understanding about.” (P1)

    Respondents talked about patients who could get more active and involved in healthcare because
    of the PAEHR. One respondent said,

    1542 Health Informatics Journal 25(4)

    So I believe that it gets more transparent, and that we walk away from the hierarchal, patriarchal world
    even more. That you actually co-produce … I believe that, that is part of the person-centered care, that the
    patient gets more active in his or her own healthcare. (T)

    One physician spoke about educating the patient. “You have to give them the tools so that they
    really feel safe. It is very much about patient development and patient education.” (P1) Another
    respondent hoped that the increase in information-sharing through the service would result in
    fewer phone calls:

    When you can follow your case, then I think and hope that we can avoid a lot of unnecessary phone calls.
    People wonder what is happening and then they can actually find it on their own … A lot of questions are
    actually asked because people do not know. (P2)

    One respondent also mentioned the possibility of patients identifying and correcting inaccurate
    entries in the EHR. “One time we got a notice and then we corrected it. It was not my notes but a
    doctors’ note and it feels really good. Because then you can remove incorrect information.” (N)
    The type of language used to document in the EHR was also discussed. For instance, one inter-
    viewee said, “We have a working language just like any other craftsman or -woman, and that we
    must continue to use, but we also have to be pedagogical in that we can write a summary on what
    you call ‘simple Swedish’.” (P1)

    Concerning drawbacks, two respondents mentioned the risk of other people who might force the
    patient to show them their EHR. This included issues related to abortion and domestic violence. “I
    see a risk if you are a young woman and have contacted us for an abortion for instance, and if your
    parents then force you to access your health record.” (P1) The therapist mentioned domestic vio-
    lence. “In destructive relations, where you can be forced to show the information.” (T) Some
    respondents were worried that the PAEHR might cause a further strain on the provision of health-
    care in that it would result in more questions being asked by patients. One respondent explained
    that it was less time-consuming to call patients instead of patients accessing the information online
    and then asking questions. “We are scared that it will result in much more work when they intro-
    duce test results and radiology, that it will be a lot of extra questions.” (P4) The drawbacks of the
    PAEHR also included concerns about patients getting worried or misinterpreting information. One
    physician explained that there is sometimes a need to continue with a medical investigation before
    one informs the patient about their condition. “What I am worried about is that people might get
    hurt because they get a decision before I am able to find out … I might want to talk to colleagues.”
    (P4) Another physician said, “… it is not that simple, and they cannot interpret it in the right way
    on their own. If you see a blood level on 13, what do you understand of that?” (P3)

    Results from the survey

    More than 50 percent of the survey respondents reported that the PAEHR was a “good” or “very
    good” initiative. When we compared the answers from respondents working in primary care units
    and outpatient clinics, the data showed that there were small differences between these two groups
    (55% vs 53%). However, the more detailed questions showed that the respondents working in
    primary care were more positive toward the use of the PAEHR in comparison to the respondents
    working in outpatient clinics.

    Half of the respondents working in primary care “agreed” or “somewhat agreed” that the PAEHR
    made it easier to clarify what is important to the patient, while only 26 percent of the respondents
    from outpatient clinics perceived that the PAEHR contributes to the clarification of what is important

    Wass and Vimarlund 1543

    to the patient. The same pattern can be observed with issues related to adherence and patients fol-
    lowing the advice of healthcare professionals; 50 percent of the respondents in primary care versus
    35 percent of respondents in outpatient clinics “agreed” or “somewhat agreed” that patient access to
    the EHR contributed to increased adherence; 36 percent of the respondents in primary care per-
    ceived that it was easier to communicate with the patients, while 20 percent of the respondents in
    outpatient clinics “agreed” or “somewhat agreed.” In general, the results show that the respondents
    in the outpatient clinics were less positive to the PAEHR and they “disagreed” or “somewhat disa-
    greed” to a larger extent on this point than respondents from primary care (Table 2).

    Slightly more respondents in primary care “agreed” or “somewhat agreed” that patient access
    had made the patient more involved in his or her treatment and more prepared. However, more
    respondents working in primary care units were positive to issues related to quality-control than
    the respondents working in outpatient units. 46 percent of the respondents in primary care versus
    26 percent of the respondents in outpatient clinics “agreed” or “somewhat agreed” that patient
    access contributed to quality-control (Table 3).

    Table 2. Healthcare professionals’ perceptions about benefits of the PAEHR.

    The service makes it
    easier to …

    Primary care units/
    outpatient clinics

    Agree or
    somewhat agree

    Neither agree
    nor disagree

    Disagree or
    somewhat disagree

    Clarify what is
    important to the patient

    Primary 50% (33) 26% (17) 24% (16)
    Outpatient 26% (21) 38% (30) 36% (29)

    Increase adherence to
    the advice I provide the

    Primary 50% (33) 35% (23) 15% (10)
    Outpatient 35% (28) 35% (28) 30% (24)

    Communicate with the

    Primary 36% (24) 36% (24) 27% (18)
    Outpatient 20% (16) 39% (31) 41% (33)

    Table 3. Healthcare professionals’ perceptions about benefits related to the patient.

    The service has
    made the patient …

    Primary care units/
    outpatient clinics
    Agree or
    somewhat agree
    Neither agree
    nor disagree
    Disagree or
    somewhat disagree

    More involved in his
    or her treatment
    and/or rehabilitation

    Primary 35% (23) 44% (29) 21% (14)
    Outpatient 24% (19) 55% (44) 21% (17)

    More prepared for
    an appointment

    Primary 29% (19) 49% (32) 23% (15)
    Outpatient 23% (18) 44% (35) 34% (27)

    Able to quality-
    control what I

    Primary 46% (30) 38% (25) 17% (11)
    Outpatient 26% (21) 46% (37) 28% (22)

    Table 4 shows that respondents working in outpatient clinics are more concerned about patients
    becoming upset, worried or misunderstanding information in the EHR. For instance, 45 percent of
    the respondents working in outpatient clinics “agreed” or “somewhat agreed” that the patient
    becomes upset, while only 26 percent of the respondents from primary care perceived patients
    becoming upset. A similar difference exists with respect to issues related to worries and misunder-
    standings; 53 percent of the respondents in outpatient clinics “agreed” or “somewhat agreed” that

    1544 Health Informatics Journal 25(4)

    the patient becomes worried, versus 36 percent of the respondents working in primary care.
    Moreover, 49 percent of the respondents in outpatient clinics “agreed” or “somewhat agreed” that
    the patient misunderstands information in the health record, compared to 33 percent in primary
    care (Table 4).

    The PAEHR seems to have little impact on the healthcare professionals’ work and we observed
    only small differences between respondents working in primary care units and in outpatient clin-
    ics; 8 percent of the respondents working in primary care reported that they spend more time on
    appointments versus 10 percent of the respondents working in outpatient clinics. Similarly,
    23 percent (primary care) versus 24 percent (outpatient clinics) of the respondents “agreed” or
    “somewhat agreed” that they spend more time on writing or dictating notes; 26 percent of those
    working in primary care units and 31 percent of the respondents working in outpatient clinics
    “agreed” or “somewhat agreed” that they cannot document everything what they want in the
    medical notes (Table 5).

    Most respondents had not changed the way they document information in the record (79% of
    those working in primary care and 65% of those working at outpatient clinics). Those who did,
    reported that they had changed the way they record specific symptoms related to mental illness (17
    respondents), obesity (11 respondents), cancer (9 respondents) and drug abuse (9 respondents).
    These professionals also noted changing the language they use, including less ‘provocative’ lan-
    guage (23 respondents), fewer abbreviations (17 respondents), and the use of Latin words (13
    respondents). Only 5 percent of the respondents, from both groups, had used the special keywords
    to withhold information from patients. Respondents in outpatient clinics seem to be less aware of

    Table 4. Healthcare professionals’ perceptions about drawbacks related to the patient.

    The service has resulted
    in that the patient …

    Primary care units/
    outpatient clinics
    Agree or
    somewhat agree
    Neither agree
    nor disagree
    Disagree or
    somewhat disagree

    Becomes upset about
    the information that can
    be read

    Primary 26% (17) 55% (36) 20% (13)
    Outpatient 45% (36) 39% (31) 16% (13)

    Becomes worried about
    the information that can
    be read

    Primary 36% (24) 47% (31) 17% (11)
    Outpatient 53% (42) 35% (28) 13% (10)

    Misunderstands the
    information in the
    health record

    Primary 33% (22) 47% (31) 20% (13)
    Outpatient 49% (39) 38% (30) 14% (11)

    Table 5. Healthcare professionals’ perceptions about drawbacks of the PAEHR.

    The service has resulted
    in that I …

    Primary care units/
    outpatient clinics
    Agree or
    somewhat agree
    Neither agree
    nor disagree
    Disagree or
    somewhat disagree

    Use more time for the
    appointment/phone call

    Primary 8% (5) 38% (25) 55% (36)
    Outpatient 10% (8) 45% (36) 45% (36)

    Use more time to
    dictate/write information

    Primary 23% (15) 30% (20) 47% (31)
    Outpatient 24% (19) 38% (30) 39% (31)

    Cannot document what
    I want in the medical

    Primary 26% (17) 29% (19) 46% (30)
    Outpatient 31% (25) 24% (19) 45% (36)

    Wass and Vimarlund 1545

    the use of the keywords to withhold information for the patient; 48 percent did not know that the
    keywords existed, compared to 29 percent of the respondents in primary care (Table 6).


    In this study, we have explored how healthcare professionals perceive the outcomes of PAEHRs,
    by comparing the perceptions of healthcare professionals working in primary care units and in
    outpatient clinics. The results of the study indicate that professionals working in primary care per-
    ceive that an increase in sharing information with patients can be beneficial. The majority of them
    perceive this as a way to increase adherence and to clarify important information to the patient. It
    is also seen as an opportunity for the patient to control what is documented in primary care notes.
    This is consistent with the perceptions of patients who have described similar benefits, like
    increased adherence12 and enhanced understanding and recall of health information.9–11 Healthcare
    professionals at outpatient clinics, on the other hand, seem less convinced about the benefits of the
    PAEHR. The survey results indicate that they are neither positive nor negative toward statements
    about increased adherence, clarification of information and quality-control.

    In previous studies, results show that PAEHRs make patients feel more involved in their own
    care.9,10,11,26 In this study, we also found indications for this during the workshop and in the inter-
    views. However, the perceptions reported in the survey revealed that most healthcare professionals
    do not perceive that their patients are more involved in his or her treatment or rehabilitation, and
    neither do they claim that their patients are more prepared for their appointments. Reports of an
    increased workload were rare in both groups and the majority of the healthcare professionals
    reported that there was no impact on the time that was used for setting up and attending to appoint-
    ments or documenting information. Around one-fifth of the respondents reported that they used
    more time to dictate or write notes, and that they had to be more up-to-date about their patients’
    health records. This finding is consistent with the study conducted by Delbanco et al.,9 which
    reported on little impact on the length of appointments or time used for documentation.

    In line with previous studies,14–17 the results from the workshop, the interviews and the survey
    reflect concerns about increased patient anxiety and misunderstandings of the information in the
    EHR. This was especially perceived by healthcare professionals in outpatient clinics, where
    approximately 50 percent “agreed” or “somewhat agreed” to the statements about patient anxiety,
    misunderstandings and patients getting upset. In addition, half of the professionals working in
    outpatient clinics were not aware that it is possible to withhold information from the patient with
    respect to on-going diagnoses. Although healthcare professionals are concerned about patient anxi-
    ety and misunderstandings of the information in the EHR, the patient access to the EHR has not led
    to a change in documentation procedures among the majority of the healthcare professionals. In
    those cases that changes were made, specific symptoms, the use of abbreviations and Latin terms

    Table 6. Perceptions about the impact on documentation.

    Questions Primary care units/
    outpatient clinics

    Yes No No, I did not know
    that they existed

    Have you changed the
    way you document

    Primary 21% (14) 79% (52) −
    Outpatient 35% (28) 65% (52) −

    Have you ever used
    the keywords?

    Primary 5% (3) 67% (44) 29% (19)
    Outpatient 5% (4) 48% (38) 48% (38)

    1546 Health Informatics Journal 25(4)

    were items that were affected by the change. One explanation could be that outpatient clinics meet
    patients with severe or even life-threatening diseases; however, they also meet chronically ill
    patients who tend to be knowledgeable about their condition. In contrast to the concerns raised by
    the healthcare professionals, Rexhepi et al.27 found that cancer patients preferred to access their
    EHR even if some of the information contained therein was difficult to understand. They also note
    that this information did not generate anxiety or undue concern.

    One limitation of our study is that we do not compare the perceptions of different types of pro-
    fessionals. Despite this limitation, we believe that this study presents a relevant discussion of the
    perceptions of healthcare professionals’ working in both primary care units and outpatient clinics
    and incorporates several different types of professionals and their perceptions of PAEHRs. It
    should also be noted that there is a difference in the response rate between the two groups, and that
    additional studies are needed. Since case studies aim to reach analytical generalization,25 our study
    provides knowledge about perceptions in a specific context. It is thus of some importance to further
    investigate whether these findings can be replicated in other settings.


    Healthcare organizations are moving toward increased information-sharing with patients through
    the application of various services. Previous studies indicate that patients wish to get online access
    to their healthcare information,10,28,29 but there are divergent reports on how healthcare profession-
    als perceive an increase in information-sharing. Our study shows that healthcare professionals who
    work in primary care find benefits like increased adherence, clarification of important information
    and the possibility for patients to control what is documented. Professionals in outpatient clinics
    seem to be less convinced about the benefits of PAEHRs. The concerns include patients becoming
    upset, unduly worried and misinterpreting information. Nevertheless, patient access to the EHR
    has not led to a change in documentation procedures among the majority of the healthcare profes-
    sionals. The PAEHR was expected to increase patient involvement and allow patients to be more
    prepared. However, the healthcare professionals did not perceive this. While these findings could
    be connected to the context of outpatient clinics and primary care units, other contextual factors
    might influence the results. More in-depth studies are therefore needed to clarify if and why there
    are differences.

    Since the concerns that were identified in this study are mainly about how patients will manage
    their access to the information and not the impact this may have on the healthcare professionals’
    work, it is important for managers to disseminate research findings with respect to patients’ experi-
    ences, so as to ease the concerns of healthcare professionals. This seems to be especially important
    in outpatient clinics. In addition to this, a greater awareness about the possibility of withholding
    information for the patient about incomplete or tentative diagnoses, that need further investigation
    until otherwise confirmed, might limit the professionals’ perception of being restricted in terms of
    what to document in the EHR.

    Future research should be conducted on further investigating if professionals in outpatient
    clinics and primary care units have different concerns about their patients accessing the EHR
    and possible explanations for such results. It will be of interest to clarify whether this can be
    connected to organizational factors, patient groups and the different practices of healthcare

    Wass and Vimarlund 1547

    Declaration of conflicting interests

    The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publi-
    cation of this article.


    The author(s) received no financial support for the research, authorship and/or publication of this article.


    1. Koch S.Improving quality of life through eHealth—the patient perspective. In: Proceedings of The 24th
    medical informatics in Europe conference, Pisa, 26–29 August 2012, pp. 25–29.

    2. Barry MJ and Edgman-Levitan S. Shared decision making—the pinnacle of patient-centered care.
    N Engl J Med 2012; 366: 780–781.

    3. Berwick DM, Nolan TW and Whittington J. The triple aim: care, health, and cost. Health Affairs 2008;
    27: 759–769.

    4. Dwamena F, Holmes-Rovner M, Gaulden CM, et al. Interventions for providers to promote a patient-
    centred approach in clinical consultations. Cochrane Database Syst Rev 2012; 12: CD003267.

    5. HealthIT.gov. Meaningful use regulations 2016. Washington, DC: HealthIT.gov, 2016.
    6. European Commission. eHealth action plan 2012–2020—innovative healthcare for the 21st century.

    Brussels: European Commission, 2012, pp. 1–14.
    7. Tang C, Lorenzi N, Harle CA, et al. Interactive systems for patient-centered care to enhance patient

    engagement. J Am Med Inform Assoc 2016; 23: 2–4.
    8. Ross SE and Lin C-T. The effects of promoting patient access to medical records: a review. J Am Med

    Inform Assoc 2003; 10: 129–138.
    9. Delbanco T, Walker J, Bell SK, et al. Inviting patients to read their doctors’ notes: a quasi-experimental

    study and a look ahead. Ann Intern Med 2012; 157: 461–470.
    10. Woods SS, Schwartz E, Tuepker A, et al. Patient experiences with full electronic access to health records

    and clinical notes through the My HealtheVet Personal Health Record Pilot: qualitative study. J Med
    Internet Res 2013; 15: e65.

    11. Esch T, Mejilla R, Anselmo M, et al. Engaging patients through open notes: an evaluation using mixed
    methods. BMJ Open 2016; 6: e010034.

    12. Wright E, Darer J, Tang X, et al. Sharing physician notes through an electronic portal is associated with
    improved medication adherence: quasi-experimental study. J Med Internet Res 2015; 17: e226.

    13. Prey JE, Restaino S and Vawdrey DK. Providing hospital patients with access to their medical records.
    AMIA Annu Symp Proc 2014; 2014: 1884–1893.

    14. Ross SE, Todd J, Moore LA, et al. Expectations of patients and physicians regarding patient-accessible
    medical records. J Med Internet Res 2005; 7: e13.

    15. Johnson AJ, Frankel RM, Williams LS, et al. Patient access to radiology reports: what do physicians
    think? J Am Coll Radiol 2010; 7: 281–289.

    16. Grünloh C, Cajander Å and Myreteg G. “The record is our work tool!”—physicians’ framing of a patient
    portal in Sweden. J Med Internet Res 2016; 18: e167.

    17. De Lusignan S, Mold F, Sheikh A, et al. Patients’ online access to their electronic health records and
    linked online services: a systematic interpretative review. BMJ Open 2014; 4: 1–11.

    18. Ålander T and Scandurra I. Experiences of healthcare professionals to the introduction in Sweden
    of a public eHealth service: patients’ online access to their electronic health records. In: MEDINFO,
    São Paulo, 19–23 August 2015, pp. 153–7. Amsterdam: IOS Press.

    19. Oster NV, Jackson SL, Dhanireddy S, et al. Patient access to online visit notes: perceptions of doctors
    and patients at an urban HIV/AIDS clinic. J Int Assoc Provid AIDS Care 2015; 14: 306–312.

    20. Vodicka E, Mejilla R, Leveille SG, et al. Online access to doctors’ notes: patient concerns about privacy.
    J Med Internet Res 2013; 15: e208.

    1548 Health Informatics Journal 25(4)

    21. Mafi JN, Mejilla R, Feldman H, et al. Patients learning to read their doctors’ notes: the importance of
    reminders. J Am Med Inform Assoc 2016; 23: 951–955.

    22. Yin RK. Case study research: design and methods. 5th ed. Thousand Oaks, CA: SAGE, 2014.
    23. Gray D, Brown S and Macanufo J. Gamestorming: a playbook for innovators, rulebreakers, and change-

    makers. Sebastopol, CA: O’Reilly, 2010.
    24. Cooper A. The inmates are running the asylum. Indianapolis, IN: Sams Publishing; Macmillan, 1999.
    25. Graneheim UH and Lundman B. Qualitative content analysis in nursing research: concepts, procedures

    and measures to achieve trustworthiness. Nurse Educ Today 2004; 24: 105–112.
    26. Wass S, Vimarlund V and Ros A. Exploring patients’ perceptions of accessing electronic health records:

    innovation in healthcare. Health Informatics J 2019; 25(1): 203-215. DOI: 10.1177/1460458217704258.
    27. Rexhepi H, Åhlfeldt RM, Cajander Å, et al. Cancer patients’ attitudes and experiences of online access

    to their electronic medical records: a qualitative study. Health Informatics J 2016; 24: 115–124.
    28. Nazi KM, Hogan TP, McInnes DK, et al. Evaluating patient access to electronic health records: results

    from a survey of veterans. Medical Care 2013; 51: S52–S56.
    29. Baer D. Patient-physician e-mail communication: the Kaiser permanente experience. J Oncol Pract

    2011; 7: 230–233.


    Requirements of Health Data Management Systems for Biomedical
    Care and Research: Scoping Review

    Leila Ismail1, PhD; Huned Materwala1, MS; Achim P Karduck2, PhD; Abdu Adem3, PhD
    1Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, Abu
    Dhabi, United Arab Emirates
    2Faculty of Informatics, Furtwangen University, Furtwangen, Germany
    3College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, Abu Dhabi, United Arab Emirates

    Corresponding Author:
    Leila Ismail, PhD
    Department of Computer Science and Software Engineering
    College of Information Technology
    United Arab Emirates University
    Al Maqam Campus
    Al Ain, Abu Dhabi, 15551
    United Arab Emirates
    Phone: 971 37673333 ext 5530
    Email: leila@uaeu.ac.ae


    Background: Over the last century, disruptive incidents in the fields of clinical and biomedical research have yielded a tremendous
    change in health data management systems. This is due to a number of breakthroughs in the medical field and the need for big
    data analytics and the Internet of Things (IoT) to be incorporated in a real-time smart health information management system. In
    addition, the requirements of patient care have evolved over time, allowing for more accurate prognoses and diagnoses. In this
    paper, we discuss the temporal evolution of health data management systems and capture the requirements that led to the
    development of a given system over a certain period of time. Consequently, we provide insights into those systems and give
    suggestions and research directions on how they can be improved for a better health care system.

    Objective: This study aimed to show that there is a need for a secure and efficient health data management system that will
    allow physicians and patients to update decentralized medical records and to analyze the medical data for supporting more precise
    diagnoses, prognoses, and public insights. Limitations of existing health data management systems were analyzed.

    Methods: To study the evolution and requirements of health data management systems over the years, a search was conducted
    to obtain research articles and information on medical lawsuits, health regulations, and acts. These materials were obtained from
    the Institute of Electrical and Electronics Engineers, the Association for Computing Machinery, Elsevier, MEDLINE, PubMed,
    Scopus, and Web of Science databases.

    Results: Health data management systems have undergone a disruptive transformation over the years from paper to computer,
    web, cloud, IoT, big data analytics, and finally to blockchain. The requirements of a health data management system revealed
    from the evolving definitions of medical records and their management are (1) medical record data, (2) real-time data access, (3)
    patient participation, (4) data sharing, (5) data security, (6) patient identity privacy, and (7) public insights. This paper reviewed
    health data management systems based on these 7 requirements across studies conducted over the years. To our knowledge, this
    is the first analysis of the temporal evolution of health data management systems giving insights into the system requirements
    for better health care.

    Conclusions: There is a need for a comprehensive real-time health data management system that allows physicians, patients,
    and external users to input their medical and lifestyle data into the system. The incorporation of big data analytics will aid in
    better prognosis or diagnosis of the diseases and the prediction of diseases. The prediction results will help in the development
    of an effective prevention plan.

    (J Med Internet Res 2020;22(7):e17508) doi: 10.2196/17508

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 1https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)








    big data; blockchain; data analytics; eHealth; electronic medical records; health care; health information management; Internet
    of Things; medical research; mHealth


    The notion of health data management systems has evolved
    during the last century. With the evolution of medical records
    from paper charts to electronic health records (EHRs) [1], health
    data management has undergone disruptive transitions to provide
    more accurate and better patient care and make qualitative use
    of these records. This shift is underpinned by the advancement
    in information technologies that led to the development of
    several notions of health data management systems. Those
    health data management systems were often misaligned with
    the goals of biomedical care and research. This misalignment
    is caused particularly by the discrepancies between advanced
    technologies and their adoption for biomedical care and research.
    Consequently, it becomes vital to address this gap by developing
    a new framework for the health data management system. In
    this paper, we provide a broader history and evolution of health
    data management systems underpinned by the changing
    definition of medical records, discuss the issues prevailing
    within, introduce the modern aspects of health data management
    systems supporting the growing size of medical data, and discuss
    insights and provide solutions aiming for a better health care

    The introduction of EHRs has transformed the health care
    industry by providing more services, improving the quality of
    patient care, and enhancing the data access ability in real time,
    thereby creating a diverse set of health data management systems
    [2]. Our understanding of EHRs is that it provides a longitudinal
    view of a patient’s medical history over his or her lifetime
    generated by one or more health care providers or medical
    organizations delivering treatments to that patient. These
    cohesive and summarized records include the patient’s
    demographic and personal information, past and current
    diagnoses and treatments, progress notes, laboratory and
    radiology results, allergies, and immunizations [1]. However,
    an earlier form of EHRs referred to as paper charts involves
    written records of a patient’s diagnosis and treatments for the
    purpose of medical teaching. Next, the term has been revised
    to computer-based patient records, electronic medical records,
    and currently EHRs. With the advancement in technological
    developments and the goal to provide better and efficient health
    care, health data management systems have evolved from a
    computer-based approach to client-server–based, cloud, the
    Internet of Things (IoT), and finally to blockchain-based system.

    With the rise of big health care data and the realization of using
    medical data for governance and research, it becomes necessary
    to integrate big data analytics within health data management
    systems [3]. However, this brings new challenges of data
    aggregation and preprocessing from multiple sources to develop
    insights, data security, and privacy to cope with an increasing
    number of data breaches and hacking incidents [4]. Further
    challenges have been imposed on biomedical care and research
    by the nature and types of medical data being generated at a

    rapid pace. These challenges have developed the need for a new
    health data management framework.

    This paper analyzes the requirements for better patient care and
    predictive analysis that must be considered when implementing
    a health data management system. Considering these
    requirements will make the health care data management system
    more accurate, efficient, and cost-effective. To our knowledge,
    this is the first analysis of the temporal evolution of health data
    management systems to give insights into the system
    requirements for better health care.

    The contributions of this paper are three-fold. First, the paper
    provides a taxonomy of health data management systems based
    on their technological advancement, and the inherent challenges
    and issues are discussed therein. Second, we present the
    reforming definitions of medical records and extract the
    requirements of a health data management system. Third, the
    paper provides insights into the health data management system
    research and guidelines for the future research area.

    Related Works
    Health data management systems are evolving for better health
    care. Literature reviews on these systems are classified into 2
    categories: (1) electronic health (eHealth) [5-8] and (2) mobile
    health (mHealth) [9].

    Regarding eHealth, the study by Jamal et al [5] reviews the
    impact of a computerized system on the quality of health care.
    The results showed that a health information system, if properly
    designed, can prevent medical errors and can support doctors
    and medical providers in diagnosis. The study by Van De Belt
    et al [6] reviews the definitions of health and medicine over 2
    years (from 2007 to 2009), coming up with a common definition
    involving the web, patients, professionals, social networking,
    health information content, and collaboration. In this study, we
    reveal additional requirements needed for better health care:
    privacy, security, public insights, and patient participation in
    accessing and monitoring medical data. The studies by Hans et
    al [7] and Cunningham et al [8] focus on the definitions of
    eHealth from 1999 to 2004. The authors found that the themes
    health and technologies are most recurrent in all definitions.

    Concerning mHealth, Silva et al [9] provide a review of mHealth
    apps and services. It highlights that the coordination, integration,
    and interoperability between different mHealth apps is important
    for better health care as well as improved performance of mobile
    devices in terms of device battery, storage, computation, and

    In this study, we reviewed health data management systems
    based on the following 7 requirements across studies conducted
    over the years: (1) medical record data, (2) real-time data access,
    (3) patient participation, (4) data sharing, (5) data security, (6)
    patient identity privacy, and (7) public insights.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 2https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)





    For the analysis and study of the evolution of health data
    management systems, we reviewed published research articles,
    reports, medical lawsuits, and health care regulations; acts about
    the methods of organizing medical record data; and the needs
    of a health data management system. The literature was searched
    in the Institute of Electrical and Electronics Engineers,
    Association for Computing Machinery, Elsevier, MEDLINE,
    PubMed, Scopus, and Web of Science databases from 1793 to
    2020. We selected the papers that included incidents that
    involved the definitions of a health data management system,
    triggered the introduction of a new system, and/or implemented
    technologies for better health care. The analysis of these papers
    shows that advances in technologies are being adopted for
    accurate and efficient patient care.


    Taxonomy of Health Data Management Systems
    Before satisfying the requirements of biomedical care and
    research, the evolution of the underlying health data
    management systems and their limitations must be understood.
    The capabilities of the health data management should ensure
    that the requirements of patient care are met. Health data
    management systems have undergone multiple transitions over
    the years alongside the advancement in information technologies
    as shown in Figure 1. During this evolution, several programs
    were established and regulation acts were passed to improve
    the quality of patient care. Table 1 presents the events that
    triggered the evolution of health data management systems.
    Table 2 presents the limitations of health data management

    Figure 1. Evolution of the health data management system.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 3https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    Table 1. Events that have triggered the evolution of health data management systems.

    Evolutionary changeResponsible authorityYear

    Rule to record patients’ data for hospital expenditure justification was passed [10].

    Board of Governors of the Society of
    the New York Hospital


    Rule to record major medical cases for education was passed provoked by a fatal dispute between
    an American statesman and an American politician. According to the rule, the recorded cases

    Board of Governors of the Society of
    the New York Hospital


    should be bounded in a book for inspection by the governors, medical professionals and students,
    and the friends of the patients [11].

    Rule to maintain a record of all the medical cases [11].Board of Governors of the Society of
    the New York Hospital


    A hospital standardization program was established to standardize the format of medical records
    for improved patient care [12].

    American College of Surgeons1918

    American Association of Record Librarians of North America was established to enhance the
    standards of medical records [13].

    American College of Surgeons1928

    The idea to use computers for medical records was proposed to allow doctors to track a patient’s
    medical history and provide evidence for the treatment [14].

    A problem-oriented medical records model was developed to standardize the method of EMRsa

    that provided a structure to help doctors record their notes [14].

    Lawrence WeedThe 1960s

    Paper charts were termed as EMRs.N/AbThe 1960s

    Rule to record patients’ data by medical nurses for medical insurance reimbursement with the
    introduction of Medicare and Medicaid laws [15].

    Centers for Medicare and Medicaid


    First commercial computerized health data management system known as Clinical Information
    System was developed for El Camino Hospital. The system included features for laboratory tests,
    appointment scheduling, and pharmacy management [16].

    Lockheed Corporation1965

    First clinical decision support system known as Health Evaluation of Logical Processing was
    developed to support clinical operations. The system helped doctors to identify cardiac contraction

    University of Utah, 3M and Latter-
    Day Saints Hospital


    based on a patient’s test results’ analysis and to select an appropriate medication for infectious
    disease cases [17].

    The first modular computer-based health data management system known as Computer Stored
    Ambulatory Record was implemented. The system accommodated clinical vocabularies through
    clinical mapping to recognize different terms used for the same disease [18].

    Massachusetts General Hospital and
    Harvard University


    MPIc was introduced to keep track of patients’ medical data to reduce unnecessary testing and
    adverse drug effects [19].

    Indian Health ServiceThe 1980s

    Electronic standards were developed to address the standardization issues of health data manage-
    ment system development and adaption. The standards allowed the use of components from
    different vendors in a health data management system [20].

    Health Level Seven1987

    The term computer-based patient records was introduced in a report studying the benefits of
    electronic management of health records [21].

    Institute of Medicine1991

    The Health Insurance Portability and Accountability Act was passed to safeguard patients’
    medical records by involving role-based access control, automatic data backup, audit trails, and
    data encryption [22].

    US Congress1996

    The term eHealthd was coined that refers to the integration of electronic communication and in-
    formation technologies for electronic transmission, storage, and retrieval of medical records both
    locally and remotely [23].

    John Mitchell1999

    The term mHealthe was coined that refers to wireless telemedicine using mobile telecommunica-
    tions and multimedia technologies for the new mobile health care system [24].

    S Laxminarayan and Robert SH Is-


    The definition of eHealth was expanded by incorporating business and public health to health
    services and defining the outcomes and stakeholders of eHealth [25].

    Gunther Eysenbach2001

    The term uHealthf was coined that refers to the use of biometric sensors and medical devices to
    monitor and improve a patient’s medical health [26].

    Stephen S Intille2004

    Proposed a formal definition of the term personal health records that allows patients to access
    their medical history and to manage it by making part of it available to selected participants by
    defining access control rights [27].

    Health care organizationsThe 2000s

    The term electronic health records [28] was coined.Institute of Medicine2003

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 4https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    Evolutionary changeResponsible authorityYear

    Massachusetts health care reform law was passed that mandated for residents to have minimum
    medical insurance coverage and for employers with more than 10 full-time employees to provide
    medical insurance coverage [29].

    Commonwealth of Massachusetts2006

    The term Accountable Care Organizations was coined that refers to a group of doctors, hospitals,
    and other health care providers who volunteer to give high-quality care to their patients to avoid
    unnecessary duplication of services and reduce medical errors [30].

    Elliott Fisher2006

    The Health care Fraud Prevention and Enforcement Action was established to strengthen the
    existing programs to prevent and reduce Medicare and Medicaid frauds [31].

    US Department of Justice, Office of
    Inspector General, and Human and
    Health Services


    The Patient Protection and Affordable Care Act was signed into law with an objective to provide
    an expansion of medical insurance coverage [32].

    US President Barack Obama2010

    aEMR: electronic medical record.
    bN/A: not applicable.
    cMPI: master patient index.
    deHealth: electronic health.
    emHealth: mobile health.
    fuHealth: ubiquitous health

    Table 2. Limitations of health data management systems.

    LimitationHealth data management

    Illegible handwriting resulting in incorrect treatments [33] and deaths [34,35]. Requires physical storage and are sus-
    ceptible to unplanned destruction such as flood, fire, rodents, and degradation. Physically cumbersome to read, understand,
    and search for specific information. The cost and time required for paper charts to be requested for duplication and then
    delivered are unacceptably high.

    Paper charts

    Medical records are managed by the physicians and cannot be accessed by the patients. Physicians visiting a patient
    have to note down or memorize the patient’s medical data to return to the hospital and record it digitally, which may
    lead to error.


    A patient has no traceability on how his or her data are used. The issues of security, privacy, and single point of failure.
    In addition, a cohesive view of a patient’s medical data from multiple hospitals is difficult. Requires repeating medical
    tests at times, which results in more time, cost, and effect on health conditions.


    Single point of failure, loss of data control and stewardship, a requirement of steady internet connection, and data reli-
    ability [36,37].


    Data security and patient privacy are a major concern.


    The process of data aggregation from different storage sites is time consuming, complex, and expensive. The data are
    stored using different formats and requires preprocessing. In addition, preserving the security of the data and privacy
    of the patient identity while maintaining the usefulness of data for analysis and studies is quite challenging.


    The process of ledger update on multiple nodes is energy consuming [38] and suffers from the issue of low throughput


    aIoT: Internet of Things.

    Requirements of a Health Data Management


    Over the last century, the definition of health data management
    has undergone numerous reformations to address the need for
    better and advanced patient care alongside technological
    advances. We evaluated these differing examinations and
    rationalized the definition used in the remainder of the paper.
    It is important to note that, as the term health data management
    is rather recent, the listed definitions were taken from different

    legislations and health data recording systems, even if the exact
    phrase health data management was not used. Table 3 shows
    the evolving definitions of health data management systems
    from being purely medical practice and learning-based
    definitions to being more patient-centric and research-based
    definitions. We classified health data management systems
    based on 7 requirements that underpin the evolution in the field
    as shown in Figure 2. Each number in the figure represents a
    definition stated in Table 3.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 5https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    Table 3. The definitions of health data management systems.


    “[…] Names and Diseases of the Persons, received, deceased or discharged in the same, with the date of
    each event, and the Place from whence the Patients last came […]”

    Siegler [10]17931

    “The house physician, with the aid of his assistant, under the direction of the attending physician, shall keep
    a register of all medical cases which occur in the hospital, and which the latter shall think worthy of
    preservation, which book shall be neatly bound, and kept in the library for the inspection of the friends of
    the patients, the governors, physicians and surgeons, and the students attending the hospital.”

    Siegler [10]18052

    “Accurate and complete medical records […] which includes identification data; complaint; personal and
    family history; history of the present illness; physical examination; special examinations such as consultations,
    clinical laboratory, x-ray and other examinations; provisional or working diagnosis; medical or surgical
    treatment; gross or microscopical pathological findings; progress notes; final diagnosis; condition on dis-
    charge; follow-up; and, in case of death, autopsy findings.”

    Sayles and Gordon


    “The computer is making a major contribution […] the patient will gain from his physician an immediate
    sympathetic understanding […] inadequate analysis by the medical profession can be avoided.”

    Weed [14]19684

    “[…] orient data around each problem […] complete list of all the patient’s problems […] diagnosis and all
    other unexpected findings or symptoms […] The list is separated into active and inactive problems, and in
    this way, those of immediate importance are easily discernible […] orders, plans, progress notes and numer-
    ical data can be recorded under the numbered and titled problem […]”

    Weed [14]19685

    “Digital versions of paper charts that contain the medical and treatment history of the patients from one
    practice for providers to use for diagnosis and treatment”

    Cynthia [40]19936

    “Electronic patient record […] support users through availability of complete and accurate data, practitioner
    reminders and alerts, clinical decision support systems, links to bodies of medical knowledge, and other

    Dick et al [21]19977

    “[…] medical informatics, public health and business, referring to health services and information delivered
    or enhanced through the Internet and related technologies […] an attitude, and a commitment for networked,
    global thinking, to improve health care locally, regionally, and worldwide by using information and com-
    munication technology.”

    Eysenbach [25]20018

    “The subjective component contains information about the problem […] objective information consists of
    those observations made by the counselor […] assessment section demonstrates how […] data are formulated,
    interpreted, and reflected upon, and the plan section summarized the treatment direction.”

    Cameron and Turtle-
    Song [41]


    “[…] electronic application through which individuals can access, manage and share their health information,
    and that of others for whom they are authorized, in a private, secure, and confidential environment.”

    Markle Foundation


    “[…] longitudinal electronic record of patient health information generated by one or more encounters […]
    patient demographics, progress notes, problems, medications, vital signs, past medical history, immunizations,
    laboratory data and radiology reports […] automates and streamlines the clinician’s workflow. The EHRs
    has the ability to generate a complete record of a clinical patient encounter […] evidence-based decision
    support, quality management, and outcomes reporting.”

    HIMSSa [1]200311

    “The Electronic Health Record (EHR) is a secure, real-time, point-of-care, patient-centric information re-
    source […] decision making by providing access to patient health record information where and when they
    need it and by incorporating evidence-based decision support […] billing, quality management, outcomes
    reporting, resource planning, and public health disease surveillance and reporting.”

    HIMSS [43]200312

    “[…] lifelong resource of health information needed by individuals to make health decisions. Individuals
    own and manage the information […] is maintained in a secure and private environment, with the individual
    determining rights of access […]”

    AHIMAb [44]200513

    “Health data management […] acquiring, entering, processing, coding, outputting, retrieving, and storing
    of data gathered in the different areas of health care […] also embraces the validation and control of data
    according to legal or professional requirements.”

    Böcking and Tro-
    janus [45]


    “A major goal […] to protect the privacy of individuals’ health information […] adopt new technologies
    to improve the quality and efficiency of patient care.”

    HIPAAc [22]201315

    aHIMSS: Healthcare Information and Management Systems Society.
    bAHIMA: American Health Information Management Association.
    cHIPAA: Health Insurance Portability and Accountability Act.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 6https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    Figure 2. Requirements of a health data management system.

    Medical Record Data
    Medical record data that describes the identity and health of a
    patient based on the personal and demographic identity, history
    of the medical condition, ongoing treatment, laboratory tests,
    and radiology results are the common requirement of a health
    data management system. The medical records have been a
    primary component throughout the evolution of health data
    management systems, whether in the form of printed documents
    or digital records.

    Real-Time Data
    To improve the quality of patient care, the requirement of
    real-time data access was highlighted in the definitions of health
    data management systems. This requirement reduces the medical
    incidents owing to the delay in data updates by the physicians.
    However, this requirement cannot be fulfilled by the paper-based
    and computer-based health data management systems. This
    requirement was introduced with the client-server–based
    management system [46-52] that enables the physicians to
    access and update the patient medical records in real time.

    Patient Participation
    With the medical records maintained by the hospitals or
    third-party cloud service providers, the patients cannot track
    how their medical data are used. Consequently, patient
    participation in accessing and monitoring medical data is a key
    requirement to develop trust in health data management systems.
    In addition to data access, the participation of patients in
    providing health conditions and lifestyle data to the physicians
    will aid in better prognosis and diagnosis. The introduction of
    IoT-based health data management system involving sensors
    and medical devices that monitor a patient’s health and lifestyle
    conditions enables the patient to input their medical conditions
    to the system [53-59]. An analysis of personal health records
    management platforms based on users’ perception shows that
    a simple easy-to-use system is required for patient engagement
    and satisfaction [60].

    Sharing of medical records is a vital requirement with the
    patient’s treatment being spread across various health care
    providers. This is to aid other physicians to study the patient’s
    medical history for better treatment and to avoid repetition of
    laboratory and radiology tests. On the basis of the list of
    definitions in Table 3, we classified sharing based on the users
    allowed to access the data into 3 different categories: (1) degree
    1, where the information is shared within the same medical
    organization where the patient is currently receiving treatment,
    (2) degree 2, where the information is shared with the patient,
    patient’s friends, and family, and (3) degree 3, where the
    information is shared with other medical organizations and
    government. The requirement of sharing is complemented by
    the introduction of the cloud-based health data management
    system [61-63]. However, to share medical record data between
    different health care organizations and to efficiently use the
    shared information, the systems should support interoperability.
    Interoperability can be achieved by using a standard format to
    store, manage, and share the medical data. There are several
    standard formats to store medical data and images [64]. Some
    of the major file formats used for medical images are Analyze
    [65], Neuroimaging Informatics Technology Initiative [66],
    Minc [67], and Digital Imaging and Communications in
    Medicine [68]. Health Level 7 International, standardized by
    the American National Standards Institute, is a health care
    protocol for sharing medical data [20]. It includes the rules for
    the integration, exchange, and management of EHRs. Wen et
    al [69] assessed the interoperability of eHealth systems in
    Taiwan for exchanging data. This is to reduce repeated medical
    examinations and medications for better health care. They
    concluded that the government should define policies to enforce

    With increasing incidents of data breaching and phishing attacks,
    and the adoption of a third-party service provider, the security
    of the patients’ sensitive and important data is essential.
    Compared with 477 health data breaches reported in 2017,

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 7https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    affecting 5,579,438 patient records in 2017, 503 breaches
    affecting 15,085,302 records were reported in 2018 [70]. The
    requirement of security is even high when patients’ medical
    records are handled by a cloud service provider or when medical
    sensors and devices are used to gather patients’ medical and
    lifestyle data. According to a report by Intel Security, the use
    of cloud services by the health care provider has reduced owing
    to the lack of cyber security methods implemented by the cloud
    service provider [71]. A report states that, on average, hospitals
    lose track of around 30% of their networked medical devices,
    making it much harder to protect against vulnerabilities [72].
    More than 61% of all IoT devices and sensors on a hospital
    network are at high risk of cyber-attack. In recent years,
    blockchain technology [73,74] has gained wide popularity and
    has penetrated into the domain of health care to address the need
    for a more patient-centric supportive system for the
    professionals, to connect disparate systems for improved patient
    care, and to increase the accuracy of EHRs [75-81].

    The privacy requirement of a patient’s identity in a health data
    management system is crucial with the increasing number of
    medical frauds and fake medications. The privacy of the patient
    cannot be compromised, especially with the rise of data
    analytics, where the medical record data of the patients are used
    for analysis. The blockchain-based health data management
    system aims to address this issue.

    Public Insights
    Prediction of health conditions is important to avoid
    life-threatening situations. The increasing amount of health care
    data [82], if properly analyzed, can facilitate the prediction of
    health conditions. The process of gathering, organizing, storing,
    and analyzing big data to discover correlations, hidden patterns,
    and other valuable insights is known as big data analytics. Figure
    3 shows the life cycle of big data analytics.

    Figure 3. Lifecycle of big data analytics.

    The prediction from health care data for public insights allows
    to actively improve public health and to react faster to a situation
    [83-91]. Using personal health care data requires, of course, a
    well-defined balance between the assurance of the privacy of
    personal health care data with respect to transparency, for
    example, toward insurance companies. Insights into genetical
    personal risk factors for chronic diseases should not lead to a
    situation where a person has disadvantages concerning the
    insurance status. Moreover, the monitoring of the public health

    situation has to be based on the health care data of individuals.
    Consequently, research projects have recently addressed the
    balance of personal health care data as a public good [92]. Figure
    4 [92] shows the relationship between the 3 key stakeholders
    for defining the balance between personal health care data and
    the potential of these data as a public good. Companies could
    be health insurance providers, hospitals, pharmaceutical
    companies, and government organizations.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 8https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    Figure 4. Personal health care data ecosystem.

    The diabetes mellitus crisis or the growth of cardiovascular
    problems caused by nutrition patterns and lifestyle behavior in
    many countries and regions of the world, changing patterns of
    Alzheimer and dementia, or microbiome research, and the abuse
    of antibiotics would benefit tremendously from personal health
    care data as a public good [93,94]. Bringing together the insights
    of large initiatives such as the Health Data Exploration Project
    and Computational Health Sciences [92,94] promises the key
    for future advancement in the area of private and personal health

    care data for the public good. Health care data analytics can
    help researchers and government officials for better prediction
    of chronic diseases, the development of effective therapeutic
    drugs, more accurate patient care, and the development of a
    nation-wide effective prevention plan.

    Table 4 shows health data management systems presented in
    the taxonomy and evaluates them in terms of their adherence
    to the defined requirements.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 9https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    Table 4. Health data management systems in the literature vs the requirements.


    PrivacySecuritySharingPatient participationReal-time dataMedical
    record data


    Degree 3Degree 2Degree 1Data in-

    Data ac-

    not sup-

    Does not

    Does not

    Does not al-
    low data shar-

    data shar-

    data shar-

    Does not
    allow pa-

    Does not
    allow pa-

    high delays

    of medical


    port pre-

    for pre-

    serving a

    against cyber-

    security at-

    ing with the
    patient, pa-
    tient’s friends,
    and family

    ing with
    the pa-

    tient, pa-

    ing only
    within the

    same hos-

    tients to



    tients to
    track the
    use of
    their med-
    ical data

    data for

    use patient’s


    and fami-

    not sup-
    Does not
    Does not

    Allows data
    sharing with

    data shar-
    data shar-
    Does not
    allow pa-
    Does not
    allow pa-
    high delays
    of medical


    port pre-
    for pre-
    serving a

    against cyber-
    security at-

    other medical

    and govern-

    ing with
    the pa-
    tient, pa-

    ing only
    within the
    same hos-

    tients to
    tients to
    track the
    use of
    their med-
    ical data
    data for
    use patient’s
    and fami-
    not sup-
    Does not
    Does not
    Allows data
    sharing with
    data shar-
    data shar-
    Does not
    allow pa-


    Allows data
    retrieval in re-
    al time

    of medical
    data for


    port pre-
    for pre-
    serving a
    against cyber-
    security at-

    other medical
    and govern-

    ing with
    the pa-
    tient, pa-
    ing only
    within the
    same hos-
    tients to

    to access
    and moni-

    tor their

    use patient’s
    and fami-
    not sup-

    Does not
    reveal a

    Does not
    Allows data
    sharing with
    data shar-
    data shar-
    Does not
    allow pa-
    Allows data
    retrieval in re-
    al time
    of medical
    data for
    port pre-


    against cyber-
    security at-
    other medical
    and govern-
    ing with
    the pa-
    tient, pa-
    ing only
    within the
    same hos-
    tients to

    to access
    and moni-
    tor their


    and fami-


    Does not
    Does not
    Allows data
    sharing with
    data shar-
    data shar-
    Allows data
    retrieval in re-
    al time
    of medical
    data for

    for the

    for pre-
    against cyber-
    other medical
    ing with
    the pa-
    ing only
    within the

    to pro-

    to access
    and moni-

    tion of

    serving a

    security at-
    and govern-

    tient, pa-
    and fami-

    same hos-
    tor their
    Does not
    reveal a
    Does not
    Allows data
    sharing with
    data shar-
    data shar-
    Allows data
    retrieval in re-
    al time
    of medical
    data for

    Big data analyt-

    for the
    against cyber-
    security at-
    other medical
    and govern-
    ing with
    the pa-
    tient, pa-
    ing only
    within the
    same hos-

    to pro-

    to access
    and moni-
    tor their

    tion of

    and fami-

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 10https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    PrivacySecuritySharingPatient participationReal-time dataMedical
    record data
    Degree 3Degree 2Degree 1Data in-
    Data ac-

    for the
    tion of

    Does not
    reveal a

    Ensures the
    protection of
    medical data
    against cyber-
    security at-

    Allows data
    sharing with
    other medical
    and govern-

    data shar-
    ing with
    the pa-
    tient, pa-
    and fami-

    data shar-
    ing only
    within the
    same hos-

    to pro-

    to access
    and moni-
    tor their

    Allows data
    retrieval in re-
    al time

    of medical
    data for


    aIoT: Internet of Things.


    Principal Findings
    This study revealed that there is a need for a secure and efficient
    health data management system that will allow physicians and
    patients to update decentralized medical records and to analyze
    the medical data for supporting more precise diagnoses,
    prognoses, biomedical research, and public insights. The early
    form of health data management using the manual recording of
    a patient’s diagnosis and treatment on sheets of paper was
    introduced almost a century ago. Later, with the advancement
    in technology, health data management systems evolved to web,
    cloud, IoT, big data analytics, and blockchain-based systems.
    The definition of medical records has reformed alongside this
    temporal evolution of the system. The requirements for a health
    data management system extracted from these definitions are
    medical record data, real-time data, patient participation, sharing,
    security, privacy, and public insights. The paper-based health
    data management system fulfills the requirements of medical
    record data and sharing. However, paper charts are prone to
    misplacement, occupy large physical space, and involve a
    time-consuming and expensive data sharing process. Over time,
    the paper charts were replaced by electronic records in the
    computer-based system with the same requirements.

    To achieve the requirement of real-time data access in addition
    to medical record data and sharing, a client-server–based health
    data management system was introduced. This system allows
    patients and health care providers to access medical data over
    the internet using a mobile device or a desktop computer.
    However, it suffers from the issues of single point of failure,
    data fragmentation, system vulnerability, low scalability, and
    high data security and patient privacy risks. To minimize the
    infrastructural cost and to address the issue of data
    fragmentation, the medical organizations and health care
    providers transitioned to a cloud-based system. The cloud
    service provider ensures the requirement of privacy of patient
    identity, but the security of the data is not ensured in addition
    to the issue of a single point of failure.

    The requirement of patient participation to feed their medical
    data and lifestyle conditions for better prognosis and diagnosis
    was achieved with the introduction of the IoT-based
    management system. However, with the increasing number of
    data breaches and hacking of the medical sensors and devices,

    there prevails a constant threat to the security of data and privacy
    of a patient’s identity. With the advancement in big data
    analytics, increasing amount of health care data are being studied
    to gain insights for better prognosis and diagnosis of diseases.
    However, the privacy of a patient’s identity still remains a

    The blockchain technology, which recently attracted the
    attention of industries, shows potential in the field of health
    care. A blockchain-based health data management system
    satisfies all the requirements needed for better patient care.
    However, it consumes a high amount of energy [95,96] and has
    low throughput [39]. There are increasing research efforts to
    solve these issues. For instance, to address the problem of energy
    consumption, Milutinovic et al [97] proposed the proof of luck
    consensus mechanism that ensures energy-efficient and
    low-latency transaction validation. Ismail et al [98] and Dorri
    et al [99] proposed scalable blockchain architectures for health
    care that use a clustering approach to increase transactions

    The main requirements of a health care data management system
    are security and privacy, especially with the increasing number
    of data breaching and hacking attacks. Furthermore, the adoption
    of patient participation to feed health data to a health system is
    increasing with the introduction of disruptive technologies, such
    as the IoT and big data analytics. Big data analytics requires
    the sharing of medical information among hospitals to get
    insights and predictive analysis from the data. This paves the
    way toward a health data management system as a support to
    physicians and medical professionals for better diagnosis and
    prognosis of chronic diseases. In addition, such a system allows
    to derive public insights from data to develop a nation-wide
    prevention plan for certain diseases. The traceability feature of
    the blockchain ensures that the data used for developing the
    predictive models is accurate, leading to a precise prognosis,
    diagnosis, and decision support system. Consequently, we
    suggest an integrated blockchain-, IoT- and big data–based
    health data management system to ensure the requirements of
    smart health care: real-time access to data by physicians and
    patients, health data input from patients through medical sensors
    and lifestyle, security, privacy, and public insights. This
    integrated health management system should be scalable and
    energy-efficient, presenting new research challenges in the
    research era of a smart health data management system.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 11https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)




    The objective of this paper was to highlight the requirements
    of a health data management system for biomedical care and
    research. In summary, it discussed the temporal evolution of
    health data management systems from paper charts to
    blockchain-based systems, along with the reformation of the

    definition of what we call EHRs today. The system should
    satisfy the requirements of medical record data, real-time access,
    patient participation, data sharing, data security, patient identity
    privacy, and public insights. The incorporation of big data
    analytics aids in better prognosis and diagnosis of the diseases
    and the prediction of risk for the development of chronic

    This work was supported by the Emirates Center for Energy and Environment Research of the United Arab Emirates University
    under grant 31R101. The authors would like to thank the anonymous reviewers for their valuable comments, which helped them
    improve the content, quality, and presentation of this paper.

    Conflicts of Interest
    None declared.


    1. Healthcare Information and Management Systems Society. What Are Electronic Health Records (EHRs)? URL: https:/
    /www.himss.org/electronic-health-records [accessed 2020-02-13]

    2. King J, Patel V, Jamoom EW, Furukawa MF. Clinical benefits of electronic health record use: national findings. Health
    Serv Res 2014 Feb;49(1 Pt 2):392-404 [FREE Full text] [doi: 10.1111/1475-6773.12135] [Medline: 24359580]

    3. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2014;2(1):3
    [FREE Full text] [doi: 10.1186/2047-2501-2-3] [Medline: 25825667]

    4. Kuo MH, Sahama T, Kushniruk AW, Borycki EM, Grunwell DK. Health big data analytics: current perspectives, challenges
    and potential solutions. Int J Big Data Intell 2014;1(1/2):114. [doi: 10.1504/ijbdi.2014.063835]

    5. Jamal A, McKenzie K, Clark M. The impact of health information technology on the quality of medical and health care: a
    systematic review. Health Inf Manag 2009;38(3):26-37. [doi: 10.1177/183335830903800305] [Medline: 19875852]

    6. Van De Belt TH, Engelen LJ, Berben SA, Schoonhoven L. Definition of health 2.0 and medicine 2.0: a systematic review.
    J Med Internet Res 2010 Jun 11;12(2):e18 [FREE Full text] [doi: 10.2196/jmir.1350] [Medline: 20542857]

    7. Oh H, Rizo C, Enkin M, Jadad A. What is ehealth (3): a systematic review of published definitions. J Med Internet Res
    2005 Feb 24;7(1):e1 [FREE Full text] [doi: 10.2196/jmir.7.1.e1] [Medline: 15829471]

    8. Cunningham SG, Wake DJ, Waller A, Morris AD. Definitions of eHealth. In: Gaddi A, Manca M, editors. eHealth, Care
    and Quality of Life. New York, USA: Springer; 2013:15-30.

    9. Silva BM, Rodrigues JJ, de la Torre Díez I, López-Coronado M, Saleem K. Mobile-health: a review of current state in
    2015. J Biomed Inform 2015 Aug;56:265-272 [FREE Full text] [doi: 10.1016/j.jbi.2015.06.003] [Medline: 26071682]

    10. Siegler EL. The evolving medical record. Ann Intern Med 2010 Nov 16;153(10):671-677. [doi:
    10.7326/0003-4819-153-10-201011160-00012] [Medline: 21079225]

    11. Thomas E, John M, Charles R, Society of the New York Hospital. An Account of the New York Hospital. Medical Center
    Archives 1811.

    12. Sayles NB, Gordon LL. Health Information Management Technology: An Applied Approach. Chicago, USA: American
    Health Information Management Association; 2013.

    13. American Health Information Management Association. AHIMA History URL: http://bok.ahima.org/doc?oid=58133#.
    XnNMSIgzbIU [accessed 2020-02-13]

    14. Weed LL. Medical records that guide and teach. N Engl J Med 1968 Mar 14;278(11):593-600. [doi:
    10.1056/NEJM196803142781105] [Medline: 5637758]

    15. Centers for Medicare & Medicaid Services. 1965. CMS’ Program History URL: https://www.cms.gov/About-CMS/
    Agency-information/History/ [accessed 2020-02-13]

    16. Tripathi M. EHR evolution: policy and legislation forces changing the EHR. J AHIMA 2012 Oct;83(10):24-9; quiz 30.
    [Medline: 23061349]

    17. Gardner RM, Pryor T, Warner HR. The HELP hospital information system: update 1998. Int J Med Inform 1999
    Jun;54(3):169-182. [doi: 10.1016/s1386-5056(99)00013-1] [Medline: 10405877]

    18. Amatayakul MK. Electronic Health Records: A Practical Guide for Professionals and Organizations. Chicago, USA:
    American Health Information Management; 2004.

    19. IHS Markit. 1980. Master Patient Index (MPI) URL: https://www.ihs.gov/hie/masterpatientindex/ [accessed 2020-02-13]
    20. Hammond WE. Health level 7: an application standard for electronic medical data exchange. Top Health Rec Manage 1991

    Jun;11(4):59-66. [Medline: 10112038]
    21. Dick RS, Steen EB, Detmer DE. The Computer-based Patient Record: An Essential Technology for Health Care. Washington,

    DC: National Academies Press; 1997.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 12https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)





































    22. United States Department of Health and Human Services. Summary of the HIPAA Security Rule URL: https://www.hhs.gov/
    hipaa/for-professionals/security/laws-regulations/index.html [accessed 2020-02-13]

    23. John M. From Telehealth to E-health: The Unstoppable Rise of E-health. Australia: Department of Communications,
    Information Technology and the Arts; 1999.

    24. Laxminarayan S, Istepanian R. Unwired e-med: the next generation of wireless and internet telemedicine systems. IEEE
    Trans Inf Technol Biomed 2000 Sep;4(3):189-193. [doi: 10.1109/titb.2000.5956074] [Medline: 11026588]

    25. Eysenbach G. What is e-health? J Med Internet Res 2001;3(2):E20 [FREE Full text] [doi: 10.2196/jmir.3.2.e20] [Medline:

    26. Intille SS. Ubiquitous computing technology for just-in-time motivation of behavior change. Stud Health Technol Inform
    2004;107(Pt 2):1434-1437. [doi: 10.3233/978-1-60750-949-3-1434] [Medline: 15361052]

    27. Kim MI, Johnson KB. Personal health records: evaluation of functionality and utility. J Am Med Inform Assoc
    2002;9(2):171-180 [FREE Full text] [doi: 10.1197/jamia.m0978] [Medline: 11861632]

    28. Gillies J, Holt A. Anxious about electronic health records? No need to be. N Z Med J 2003 Sep 26;116(1182):U604.
    [Medline: 14581956]

    29. Holahan J, Blumberg L. Massachusetts health care reform: a look at the issues. Health Aff (Millwood) 2006;25(6):w432-w443.
    [doi: 10.1377/hlthaff.25.w432] [Medline: 16973652]

    30. Centers for Medicare & Medicaid Services. 2006. Accountable Care Organizations (ACOs) URL: https://www.cms.gov/
    Medicare/Medicare-Fee-for-Service-Payment/ACO [accessed 2020-02-13]

    31. US Department of Justice. 2016. Fact Sheet: The Health Care Fraud and Abuse Control Program Protects Conusmers and
    Taxpayers by Combating Health Care Fraud URL: https://www.justice.gov/opa/pr/
    fact-sheet-health-care-fraud-and-abuse-control-program-protects-conusmers-and-taxpayers [accessed 2020-02-13]

    32. HealthCare. 2010. Patient Protection and Affordable Care Act URL: https://www.healthcare.gov/glossary/
    patient-protection-and-affordable-care-act/ [accessed 2020-02-13]

    33. Sokol DK, Hettige S. Poor handwriting remains a significant problem in medicine. J R Soc Med 2006 Dec;99(12):645-646
    [FREE Full text] [doi: 10.1258/jrsm.99.12.645] [Medline: 17139073]

    34. Leonidas LL. Opinion – Inquirer.net. 2014. Death by Bad Handwriting URL: https://opinion.inquirer.net/79623/
    death-by-bad-handwriting [accessed 2020-02-13]

    35. Charatan F. Family compensated for death after illegible prescription. Br Med J 1999 Dec 4;319(7223):1456 [FREE Full
    text] [doi: 10.1136/bmj.319.7223.1456] [Medline: 10582922]

    36. Davis J. Healthcare IT News. 2017. eClinicalWorks Sued for Nearly $1 Billion for Inaccurate Medical Records URL:
    https://www.healthcareitnews.com/news/eclinicalworks-sued-nearly-1-billion-inaccurate-medical-records [accessed

    37. Amazon Web Services. 2018. United States District Court Northern District of Illinois URL: https://s3.amazonaws.com/
    assets.fiercemarkets.net/public/004-Healthcare/external_Q12018/SurfsidevAllscripts [accessed 2020-02-13]

    38. The Guardian. 2017. Bitcoin Mining Consumes More Electricity a Year Than Ireland URL: https://www.theguardian.com/
    technology/2017/nov/27/bitcoin-mining-consumes-electricity-ireland [accessed 2020-02-13]

    39. Scherer M. UMEA University. 2017. Performance and Scalability of Blockchain Networks and Smart Contracts URL:
    https://umu.diva-portal.org/smash/get/diva2:1111497/FULLTEXT01 [accessed 2020-03-24]

    40. Miller C. The electronic medical record: a definition and discussion. Top Health Inf Manage 1993 Feb;13(3):20-29. [Medline:

    41. Cameron S, Turtle-Song I. Learning to write case notes using the SOAP format. J Couns Dev 2002;80(3):286-292. [doi:

    42. Markle: Advancing America’s Future. 2003. Personal Health Working Group Final Report URL: https://www.markle.org/
    publications/1429-personal-health-working-group-final-report [accessed 2020-02-13]

    43. Handler R, Holtmeier R, Metzger J, Overhage M, Taylor S, Underwood C. Healthcare Information and Management
    Systems Society. 2003. HIMSS Electronic Health Record Definitional Model URL: http://www.providersedge.com/ehdocs/
    ehr_articles/HIMSS_EMR_Definition_Model_v1-0 [accessed 2020-02-13]

    44. AHIMA e-HIM Personal Health Record Work Group. Defining the Personal Health Record. J AHIMA 2005 Jun
    11;156(24):786-786. [doi: 10.1136/vr.156.24.786-a]

    45. Böcking W, Trojanus D. Health Data Management. In: Kirch W, editor. Encyclopedia of Public Health. New York, USA:
    Springer; 2008.

    46. Kohane IS, Greenspun P, Fackler J, Cimino C, Szolovits P. Building national electronic medical record systems via the
    World Wide Web. J Am Med Inform Assoc 1996;3(3):191-207 [FREE Full text] [doi: 10.1136/jamia.1996.96310633]
    [Medline: 8723610]

    47. Rind DM, Kohane IS, Szolovits P, Safran C, Chueh HC, Barnett GO. Maintaining the confidentiality of medical records
    shared over the internet and the world wide web. Ann Intern Med 1997 Jul 15;127(2):138-141. [doi:
    10.7326/0003-4819-127-2-199707150-00008] [Medline: 9230004]

    48. Schoenberg R, Safran C. Internet based repository of medical records that retains patient confidentiality. Br Med J 2000
    Nov 11;321(7270):1199-1203 [FREE Full text] [doi: 10.1136/bmj.321.7270.1199] [Medline: 11073513]

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 13https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)


























    Death by bad handwriting

    Death by bad handwriting













    Personal Health Working Group Final Report

    Personal Health Working Group Final Report














    49. Uckert F, Görz M, Ataian M, Prokosch HU. Akteonline-an electronic healthcare record as a medium for information and
    communication. Stud Health Technol Inform 2002;90:293-297. [Medline: 15460705]

    50. Grant RW, Wald JS, Poon EG, Schnipper JL, Gandhi TK, Volk LA, et al. Design and implementation of a web-based
    patient portal linked to an ambulatory care electronic health record: patient gateway for diabetes collaborative care. Diabetes
    Technol Ther 2006 Oct;8(5):576-586 [FREE Full text] [doi: 10.1089/dia.2006.8.576] [Medline: 17037972]

    51. Ross SE, Moore LA, Earnest MA, Wittevrongel L, Lin C. Providing a web-based online medical record with electronic
    communication capabilities to patients with congestive heart failure: randomized trial. J Med Internet Res 2004 May
    14;6(2):e12 [FREE Full text] [doi: 10.2196/jmir.6.2.e12] [Medline: 15249261]

    52. Marceglia S, Bonacina S, Braidotti A, Nardelli M, Pinciroli F. Towards a web-based system for family health record. AMIA
    Annu Symp Proc 2006:1023 [FREE Full text] [Medline: 17238642]

    53. Laplante PA, Kassab M, Laplante NL, Voas JM. Building caring healthcare systems in the internet of things. IEEE Syst J
    2018;12(3):- [FREE Full text] [doi: 10.1109/JSYST.2017.2662602] [Medline: 31080541]

    54. Wu F, Wu T, Yuce M. An internet-of-things (IoT) network system for connected safety and health monitoring applications.
    Sensors (Basel) 2018 Dec 21;19(1):E21 [FREE Full text] [doi: 10.3390/s19010021] [Medline: 30577646]

    55. Meinert E, van Velthoven M, Brindley D, Alturkistani A, Foley K, Rees S, et al. The internet of things in health care in
    Oxford: protocol for proof-of-concept projects. JMIR Res Protoc 2018 Dec 4;7(12):e12077 [FREE Full text] [doi:
    10.2196/12077] [Medline: 30514695]

    56. Mavrogiorgou A, Kiourtis A, Perakis K, Pitsios S, Kyriazis D. IoT in healthcare: achieving interoperability of high-quality
    data acquired by IoT medical devices. Sensors (Basel) 2019 Apr 27;19(9):1-24 [FREE Full text] [doi: 10.3390/s19091978]
    [Medline: 31035612]

    57. Valluru D, Jeya IJ. IoT with cloud based lung cancer diagnosis model using optimal support vector machine. Health Care
    Manag Sci 2019 Jul 20:- epub ahead of print. [doi: 10.1007/s10729-019-09489-x] [Medline: 31327114]

    58. Ramirez Lopez LJ, Puerta Aponte G, Rodriguez Garcia A. Internet of things applied in healthcare based on open hardware
    with low-energy consumption. Healthc Inform Res 2019 Jul;25(3):230-235 [FREE Full text] [doi: 10.4258/hir.2019.25.3.230]
    [Medline: 31406615]

    59. Qu Y, Ming X, Qiu S, Zheng M, Hou Z. An integrative framework for online prognostic and health management using
    internet of things and convolutional neural network. Sensors (Basel) 2019 May 21;19(10):1 [FREE Full text] [doi:
    10.3390/s19102338] [Medline: 31117213]

    60. Rau H, Wu Y, Chu C, Wang F, Hsu M, Chang C, et al. Importance-performance analysis of personal health records in
    Taiwan: a web-based survey. J Med Internet Res 2017 Apr 27;19(4):e131 [FREE Full text] [doi: 10.2196/jmir.7065]
    [Medline: 28450273]

    61. Bahga A, Madisetti VK. A cloud-based approach for interoperable electronic health records (EHRs). IEEE J Biomed Health
    Inform 2013 Sep;17(5):894-906. [doi: 10.1109/JBHI.2013.2257818] [Medline: 25055368]

    62. Fernández-Cardeñosa G, de la Torre-Díez I, López-Coronado M, Rodrigues JJ. Analysis of cloud-based solutions on EHRs
    systems in different scenarios. J Med Syst 2012 Dec;36(6):3777-3782. [doi: 10.1007/s10916-012-9850-2] [Medline:

    63. Zangara G, Corso PP, Cangemi F, Millonzi F, Collova F, Scarlatella A. A cloud based architecture to support electronic
    health record. Stud Health Technol Inform 2014;207:380-389. [doi: 10.3233/978-1-61499-474-9-380] [Medline: 25488244]

    64. Schulz S, Stegwee R, Chronaki C. Standards in healthcare data. In: Kubben P, Dumontier M, Dekker A, editors. Standards
    in Healthcare Data. New York, USA: Springer; 2019:19-36.

    65. MRC Cognition and Brain Sciences Unit. The Analyze Data Format URL: http://imaging.mrc-cbu.cam.ac.uk/imaging/
    FormatAnalyze [accessed 2020-02-13]

    66. NIfTI: Neuroimaging Informatics Technology Initiative. URL: https://nifti.nimh.nih.gov/ [accessed 2020-02-13]
    67. Vincent RD, Neelin P, Khalili-Mahani N, Janke AL, Fonov VS, Robbins SM, et al. MINC 2.0: a flexible format for

    multi-modal images. Front Neuroinform 2016;10:35 [FREE Full text] [doi: 10.3389/fninf.2016.00035] [Medline: 27563289]
    68. Digital Imaging and Communications in Medicine. URL: https://www.dicomstandard.org/ [accessed 2020-02-13]
    69. Wen H, Chang W, Hsu M, Ho C, Chu C. An assessment of the interoperability of electronic health record exchanges among

    hospitals and clinics in Taiwan. JMIR Med Inform 2019 Mar 28;7(1):e12630 [FREE Full text] [doi: 10.2196/12630]
    [Medline: 30920376]

    70. Davis J. HealthITSecurity. 2019. 15 Million Patient Records Breached in 2018; Hacking, Phishing Surges URL: https:/
    /healthitsecurity.com/news/15-million-patient-records-breached-in-2018-hacking-phishing-surges [accessed 2020-02-13]

    71. Business Wire. New Intel Security Cloud Report Reveals IT Departments Find It Hard to Keep the Cloud Safe URL: https:/
    /www.businesswire.com/news/home/20170212005011/en/ [accessed 2020-02-13]

    72. Zorz Z. Help Net Security – Information Security News. 2019. Healthcare’s Blind spot: Unmanaged IoT and Medical
    Devices URL: https://www.helpnetsecurity.com/2019/07/22/healthcare-iot/ [accessed 2020-02-13]

    73. Nakamoto S. Bitcoin: A Peer-to-Peer Electronic Cash System. New York, USA: BN Publishing; 2008.
    74. Ismail L, Heba H, AlShamsi M, AlHammadi M, AlDhanhani N. Towards a Blockchain Deployment at UAE University:

    Performance Evaluation and Blockchain Taxonomy. In: Proceedings of the 2019 International Conference on Blockchain
    Technology. 2019 Presented at: ICBCT’19; March 15-18, 2019; Hawaii, USA p. 30-38. [doi: 10.1145/3320154.3320156]

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 14https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)






















































    Healthcare’s blind spot: Unmanaged IoT and medical devices




    75. Azaria A, Ekblaw A, Vieira T, Lippman A. MedRec: Using Blockchain for Medical Data Access and Permission Management.
    In: Proceedings of the 2nd International Conference on Open and Big Data. 2016 Presented at: OBD’16; August 22-24,
    2016; Vienna, Austria. [doi: 10.1109/obd.2016.11]

    76. Li H, Zhu L, Shen M, Gao F, Tao X, Liu S. Blockchain-based data preservation system for medical data. J Med Syst 2018
    Jun 28;42(8):141. [doi: 10.1007/s10916-018-0997-3] [Medline: 29956058]

    77. Dagher GG, Mohler J, Milojkovic M, Marella PB. Ancile: privacy-preserving framework for access control and
    interoperability of electronic health records using blockchain technology. Sustain Cities Soc 2018 May;39:283-297. [doi:

    78. Fan K, Wang S, Ren Y, Li H, Yang Y. MedBlock: efficient and secure medical data sharing via blockchain. J Med Syst
    2018 Jun 21;42(8):136. [doi: 10.1007/s10916-018-0993-7] [Medline: 29931655]

    79. Yue X, Wang H, Jin D, Li M, Jiang W. Healthcare data gateways: found healthcare intelligence on blockchain with novel
    privacy risk control. J Med Syst 2016 Oct;40(10):218. [doi: 10.1007/s10916-016-0574-6] [Medline: 27565509]

    80. Dey T, Jaiswal S, SunderKrishnan S, Katre N. HealthSense: A Medical Use Case of Internet of Things and Blockchain.
    In: Proceedings of the International Conference on Intelligent Sustainable Systems. 2017 Presented at: ICISS’17; December
    7-8, 2017; Palladam, India. [doi: 10.1109/iss1.2017.8389459]

    81. Uddin MA, Stranieri A, Gondal I, Balasubramanian V. Continuous patient monitoring with a patient centric agent: a block
    architecture. IEEE Access 2018;6:32700-32726. [doi: 10.1109/access.2018.2846779]

    82. Evariant: Healthcare’s Only Patient for Life Platform. What is Healthcare Data Management and Why is it Important? URL:
    https://www.evariant.com/faq/why-is-healthcare-data-management-important [accessed 2020-02-13]

    83. Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, et al. Automated analysis of free speech predicts psychosis
    onset in high-risk youths. NPJ Schizophr 2015 Aug 26;1(1):15030-15037 [FREE Full text] [doi: 10.1038/npjschz.2015.30]
    [Medline: 27336038]

    84. Yu K, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, et al. Predicting non-small cell lung cancer prognosis by fully
    automated microscopic pathology image features. Nat Commun 2016 Aug 16;7:12474 [FREE Full text] [doi:
    10.1038/ncomms12474] [Medline: 27527408]

    85. Cruz-Roa A, Gilmore H, Basavanhally A, Feldman M, Ganesan S, Shih NN, et al. Accurate and reproducible invasive
    breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci Rep 2017 Apr
    18;7:46450 [FREE Full text] [doi: 10.1038/srep46450] [Medline: 28418027]

    86. Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A, et al. arXiv. 2017. Detecting Cancer Metastases on
    Gigapixel Pathology Images URL: https://arxiv.org/abs/1703.02442 [accessed 2020-03-24]

    87. Richter AN, Khoshgoftaar TM. Efficient learning from big data for cancer risk modeling: a case study with melanoma.
    Comput Biol Med 2019 Jul;110:29-39. [doi: 10.1016/j.compbiomed.2019.04.039] [Medline: 31112896]

    88. Narula S, Shameer K, Salem Omar AM, Dudley JT, Sengupta PP. Machine-Learning Algorithms to Automate Morphological
    and FunctionaMachine-learning algorithms to automate morphological and functional assessments in 2D echocardiographyl
    Assessments in 2D Echocardiography. J Am Coll Cardiol 2016 Nov 29;68(21):2287-2295 [FREE Full text] [doi:
    10.1016/j.jacc.2016.08.062] [Medline: 27884247]

    89. Kiral-Kornek I, Roy S, Nurse E, Mashford B, Karoly P, Carroll T, et al. Epileptic seizure prediction using big data and
    deep learning: toward a mobile system. EBioMedicine 2018 Jan;27:103-111 [FREE Full text] [doi:
    10.1016/j.ebiom.2017.11.032] [Medline: 29262989]

    90. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health
    records. Digit Med 2018;1:18.

    91. Hsiao P, Chin-Ming CH, Li Y. Applied wearable devices for digital health based on novel cardiac force index of running
    performance:cross-sectional study. JMIR mHealth and uHealth (forthcoming). [doi: 10.2196/15331]

    92. Health Data Exploration-Personal Data for the Public. 2014. Personal Data for the Public Good: New Opportunities to
    Enrich Understanding of Individual and Population Health URL: http://hdexplore.calit2.net/wp-content/uploads/2015/08/
    hdx_final_report_small [accessed 2020-02-13]

    93. Knight Lab. Personal Health Data as Public Good URL: https://knightlab.ucsd.edu/wordpress/?page_id=19 [accessed

    94. Bakar Institute. Bringing the Power of Computation to Today’s Spectrum of Data Will Yield Untold Health Insights and
    Patterns URL: https://bakarinstitute.ucsf.edu/research/ [accessed 2020-02-13]

    95. Digiconomist. Bitcoin Energy Consumption Index URL: https://digiconomist.net/bitcoin-energy-consumption [accessed

    96. Crush Crypto. What is Practical Byzantine Fault Tolerance (PBFT)? URL: https://crushcrypto.com/
    what-is-practical-byzantine-fault-tolerance/ [accessed 2020-02-13]

    97. Milutinovic M, He W, Wu H, Kanwal M. Proof of Luck: An Efficient Blockchain Consensus Protocol. In: Proceedings of
    the 1st Workshop on System Software for Trusted Execution. 2016 Presented at: SysTEX’16; December 12-16, 2016;
    Trento, Italy. [doi: 10.1145/3007788.3007790]

    98. Ismail L, Materwala H, Zeadally S. Lightweight blockchain for healthcare. IEEE Access 2019;7:149935-149951. [doi:

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 15https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)











































    99. Dorri A, Kanhere SS, Jurdak R. Towards an Optimized BlockChain for IoT. In: Proceedings of the Second International
    Conference on Internet-of-Things Design and Implementation. 2017 Presented at: IoTDI’17; April 18-21, 2017; Pittsburgh,
    Pennsylvania. [doi: 10.1145/3054977.3055003]

    EHR: electronic health record
    eHealth: electronic health
    IoT: Internet of Things
    mHealth: mobile health

    Edited by G Eysenbach; submitted 19.12.19; peer-reviewed by Z Sherali, A Behmanesh, CM Chu; comments to author 03.02.20;
    revised version received 13.02.20; accepted 01.03.20; published 07.07.20

    Please cite as:
    Ismail L, Materwala H, Karduck AP, Adem A
    Requirements of Health Data Management Systems for Biomedical Care and Research: Scoping Review
    J Med Internet Res 2020;22(7):e17508
    URL: https://www.jmir.org/2020/7/e17508
    doi: 10.2196/17508

    ©Leila Ismail, Huned Materwala, Achim P Karduck, Abdu Adem. Originally published in the Journal of Medical Internet Research
    (http://www.jmir.org), 07.07.2020. This is an open-access article distributed under the terms of the Creative Commons Attribution
    License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any
    medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete
    bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information
    must be included.

    J Med Internet Res 2020 | vol. 22 | iss. 7 | e17508 | p. 16https://www.jmir.org/2020/7/e17508
    (page number not for citation purposes)








    Review Article
    Big Data Management for Healthcare Systems:
    Architecture, Requirements, and Implementation

    Naoual El aboudi and Laila Benhlima

    Department of computer sciencne, Mohammadia School of Engineering, Mohammed V University, Rabat, Morocco

    Correspondence should be addressed to Naoual El aboudi; nawal.elaboudi@gmail.com

    Received 7 January 2018; Revised 22 May 2018; Accepted 27 May 2018; Published 21 June 2018

    Academic Editor: Florentino Fdez-Riverola

    Copyright © 2018 Naoual El aboudi and Laila Benhlima. This is an open access article distributed under the Creative Commons
    Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
    properly cited.

    The growing amount of data in healthcare industry has made inevitable the adoption of big data techniques in order to improve the
    quality of healthcare delivery. Despite the integration of big data processing approaches and platforms in existing data management
    architectures for healthcare systems, these architectures face difficulties in preventing emergency cases. The main contribution
    of this paper is proposing an extensible big data architecture based on both stream computing and batch computing in order to
    enhance further the reliability of healthcare systems by generating real-time alerts and making accurate predictions on patient
    health condition. Based on the proposed architecture, a prototype implementation has been built for healthcare systems in order
    to generate real-time alerts. The suggested prototype is based on spark and MongoDB tools.

    1. Introduction

    The proportion of elderly people in society is growing
    worldwide [1]; this phenomenon known as humanity’s aging
    has many implications on healthcare services, especially
    in terms of costs. In the face of such situation, relying
    on classical systems may result in a life quality decline
    for millions of people. Seeking to overcome this problem,
    a bunch of healthcare systems have been designed. Their
    common principle is transferring, on a periodical basis,
    medical parameters like blood pressure, heart rate, glucose
    level, body temperature, and ECG signals to an automated
    system aimed at monitoring in real time patients health
    condition. Such systems provide quick assistance when
    needed since data is analyzed continuously. Automating
    health monitoring favors a proactive approach that relieves
    medical facilities by saving costs related to hospitalization,
    and it also enhances healthcare services by improving waiting
    time for consultations. Recently, the number of data sources
    in healthcare industry has grown rapidly as a result of
    widespread use of mobile and wearable sensors technologies,
    which has flooded healthcare area with a huge amount of
    data.Therefore, it becomes challenging to performhealthcare
    data analysis based on traditional methods which are unfit

    to handle the high volume of diversified medical data. In
    general, healthcare domain has four categories of analytics:
    descriptive, diagnostic, predictive, and prescriptive analytics;
    a brief description of each one of them is given below.

    Descriptive Analytics. It consists of describing current situa-
    tions and reporting on them.

    Several techniques are employed to perform this level
    of analytics. For instance, descriptive statistics tools like
    histograms and charts are among the techniques used in
    descriptive analytics.

    Diagnostic Analysis. It aims to explain why certain events
    occurred and what the factors that triggered them are. For
    example, diagnostic analysis attempts to understand the
    reasons behind the regular readmission of some patients by
    using several methods such as clustering and decision trees.

    Predictive Analytics. It reflects the ability to predict future
    events; it also helps in identifying trends and determining
    probabilities of uncertain outcomes. An illustration of its
    role is to predict whether a patient can get complications or
    not. Predictive models are often built using machine learning

    Advances in Bioinformatics
    Volume 2018, Article ID 4059018, 10 pages



    2 Advances in Bioinformatics

    Figure 1: Analytics for healthcare domain.

    Prescriptive Analytics. Its goal is to propose suitable actions
    leading to optimal decision-making. For instance, prescrip-
    tive analysis may suggest rejecting a given treatment in the
    case of a harming side effect high probability. Decision trees
    and monte carlo simulation are examples of methods applied
    to perform prescriptive analytics. Figure 1 illustrates analytics
    phases for healthcare domain [2]. The integration of big
    data technologies in healthcare analytics may lead to better
    performance of medical systems.

    In fact, big data refers to large datasets that combine
    the following characteristics (see [3]): volume which refers
    to high amounts of data, velocity which means that data is
    generated at a rapid pace, variety which emphasizes that data
    comes under different formats, and, finally, veracity which
    means that data originates from trustable sources.

    Another characteristic of big data is the variability. It
    indicates variations that occur in the data flow rates. Indeed,
    velocity does not provide a consistent description of the data
    due to its periodic peaks and troughs. Another important
    aspect of big data is complexity; it arises from the fact that
    big data is often produced through a bunch of sources, which
    implies, to perform many operations over the data, these
    operations include identifying relationships and cleansing
    and transforming data flowing from different origins.

    Moreover, Oracle decided to introduce value as a key
    attribute of big data. According to Oracle, big data has a
    “low value density,” which means that raw data has a low
    value compared to its high volume. Nevertheless, analysis of
    important volumes of datamay lead to obtaining a high value.

    In the context of healthcare, high volumes of data are
    generated by multiple medical sources, and it includes, for
    example, biomedical images, lab test reports, physician writ-
    ten notes, andhealth condition parameters allowing real-time
    patient health monitoring. In addition to its huge volume
    and its diversity, healthcare data flows at high speed. As a
    result, big data approaches offer tremendous opportunities
    regarding healthcare systems efficiency.

    The contribution of this research paper is to propose an
    extensible big data architecture for healthcare applications
    formed by several components capable of storing, processing,
    and analyzing the high amount of data in real time and
    batch modes. This paper demonstrates the potential of using
    big data analytics in the healthcare domain to find useful
    information in highly valuable data.

    The paper has been organized as follows: In Section 2, a
    background of big data computing approaches and big data
    platforms is provided. Recent contributions on big data for
    healthcare systems are reviewed in Section 3. In Section 4, the
    components of the proposed big data architecture for health-
    care are described. The implementation process is reported
    in Section 5. Conclusion is finally drawn in Section 6, along
    with recommendations for future research.

    2. Background

    2.1. An Overview of Big Data Approaches. Big data tech-
    nologies have received great attention due to their success-
    ful handling of high volume data compared to traditional
    approaches. Big data framework supports all kind of data,
    structured, semistructured, and unstructured data, while
    providing several features. Those features include predictive
    model design and big data mining tools that allow better
    decision-making process through the selection of relevant

    Big data processing can be performed through two man-
    ners: batch processing and streamprocessing; see [4].Thefirst
    method is based on analyzing data over a specified period of
    time; it is adopted when there are no constraints regarding
    the response time. On the other hand, stream processing is
    suitable for applications requiring real-time feedback. Batch
    processing aims to process a high volume of data by collecting
    and storing batches to be analyzed in order to generate

    Batch mode requires ingesting all data before processing
    it in a specified time. Mapreduce represents a widely adopted
    solution in the field of batch computing; see [5]; it oper-
    ates by splitting data into small pieces that are distributed
    to multiple nodes in order to obtain intermediate results.
    Once data processing by nodes is terminated, outcomes
    will be aggregated in order to generate the final results.
    Seeking to optimize computational resources use, mapreduce
    allocates processing tasks to nodes close to data location.This
    model has encountered a lot of success in many applications,
    especially in the field of bioinformatics and healthcare. Batch
    processing framework has many characteristics such as the
    ability to access all data and to perform many complex com-
    putation operations, and its latency ismeasured byminutes or

    Advances in Bioinformatics 3

    Stream Computing. In real applications such as healthcare,
    intelligent transportation, and finance, a high amount of
    data is produced in continuous manner. When the need
    of processing such data streams in real time arises, data
    analysis takes into consideration continuous evolution of data
    and permanent change regarding statistical characteristics of
    data streams referred to as concept drift; see [6]. Indeed,
    storing a large amount of data for further processing may
    be challenging in terms of memory resources. Moreover,
    real applications tend to produce noisy data which contain
    missing values along with redundant features, making by
    the way data analysis complicated, as it requires important
    computational time. Stream processing reduces this compu-
    tational burden by performing simple and fast computations
    for one data element or for a window of recent data, and such
    computations spend seconds at most.

    Big data stream mining methods including classification,
    frequent pattern mining, and clustering relieve computa-
    tional effort through rapid extraction of the most relevant
    information; this objective is often achieved bymining data in
    a distributedmanner.Thosemethods belong to one of the two
    following classes: data-based techniques and task-based tech-
    niques; see [7]. Data-based techniques allow summarizing
    the entire dataset or selecting a subset of the continuous flow
    of streaming data to be processed. Sampling is one of these
    techniques; it consists of choosing a small subset of data to be
    processed according to a statistical criterion. Another data-
    based method is load shedding which drops a part from the
    entire data, while sketching technique establishes a random
    projection on a feature set. Synopsis data structures method
    and aggregation method belong also to the family of data-
    based techniques, the first one summarizes data streams,
    and the second one represents a number of elements in one
    element by using a statistical measure.

    Task-based techniques update existingmethods or design
    new ones to reduce the computational time in the case of
    data stream processing. They are categorized into approx-
    imation algorithms that generate outputs with acceptable
    error margin, sliding window that analyzes recent data under
    the assumption that it is more useful than older data, and
    algorithm output granularity that processes data according to
    the available memory and time constraints.

    Big data approaches are essential for modern health-
    care analytics; they allow real-time extraction of relevant
    information from a large amount of patient data. As a
    result, alerts are generated when the predictionmodel detects
    possible complications. This process helps to prevent health
    emergencies from occurring; it also assists medical care
    professionals in decision-making regarding disease diagnosis
    and provides special care recommendations.

    2.2. Big Data Processing Frameworks. Concerning batch
    processing mode, mapreduce framework is widely adopted;
    it allows distributed analysis of big data on a cluster of
    machines.Thus, simple computations are performed through
    two functions that consist of map and reduce. Mapreduce
    relies on a master/slave architecture, the master node allo-
    cates processing tasks to slave nodes and divides data into
    blocks, and, then, it structures data into a set of keys/values

    as an input of map tasks. Each worker assigns a map task to
    slaves and reads the appropriate input data, and, after that,
    it writes generated results of the map task into intermediate
    files. The reducer worker transmits results generated by the
    map task as an input of the reducer task; finally, the results
    are written into final output files. Hadoop is an open source
    framework that stores and analyzes data in a parallel manner
    through clusters.

    It is composed of two main components: Hadoop mapre-
    duce and distributed file system. Distributed file system
    (HDFS) stores data by duplicating it in many nodes; on
    the other hand, hadoop mapreduce implements mapre-
    duce programming model, its master node stores metadata
    information such as locations of duplicated blocks, and it
    identifies locations of data nodes to recover missing blocks
    in failure cases. The data are splitted into several blocks and
    the processing operations are made in the same machine.
    With hadoop, other tools regarding data storage can be used
    instead of HDFS, such as HBase, Cassandra, and relational
    databases. Data warehousing may be performed by other
    tools, for instance, Pig and Hive, while mahout is employed
    for machine learning purposes. When stream processing is
    required, Hadoop may not be a suitable choice since all
    input data must be available before starting mapreduce tasks.
    Recently, Storm from Twitter, S4 from Yahoo, and spark were
    presented to process incoming stream data. Each solution has
    its own principle.

    Storm. It is an open source framework to analyze data in
    real time, see [8], it is formed by Spouts and Bolts. Spout
    can produce data or load data from an input queue and
    bolt processes input streams and generates outputs streams.
    In storm program, a combination of a bolt and a spout is
    named topology. Storm has three nodes that are the master
    node named nimbus, the worker node and zookeeper. The
    master node distributes and coordinates the execution of
    topology while the worker node is responsible for execut-
    ing spouts/bolts. Finally, zookeeper synchronizes distributed

    S4. It is a distributed stream processing engine, inspired by
    the mapreduce model in order to process data streams; see
    [9]. It was implemented by Yahoo through Java. Data streams
    feed to S4 as events.

    Spark. It is applied for both batch and stream processing;
    therefore, spark may be considered as a powerful framework
    compared with other tools such as hadoop and storm; see
    [10]. It can access several data sources like HDFS, Cassan-
    dra, and HBase. Spark provides several interesting features,
    for example, iterative machine learning algorithms through
    Mllib library which provides efficient algorithms with high
    speed, structured data analysis using Hive, and graph pro-
    cessing based on GraphX and SparkSQL that restore data
    from many sources and manipulate them using SQL lan-
    guages. Before processing data streams, spark divides them
    into small portions and transforms them into a set of RDDs
    (Resilient DistributedDatasets) namedDStream (Discretised

    4 Advances in Bioinformatics

    Table 1: Big data processing solutions.

    Framework Type Latency Developped by Stream Primitive Stream source
    Hadoop batch Minutes or more Yahoo Key-value HDFS
    Storm streaming Subseconds Twitter Tuples Spouts
    Spark streaming Batch/streaming Few seconds Berkley AMPLay DStream HDFS
    S4 streaming Few seconds Yahoo Events Networks
    Flink Batch/streaming Few seconds Apache Software Foundation key-value KAFKA

    Apache Flink. It is an open source solution that analyzes data
    in both batch and real-time mode [11]. The programming
    models of flink and mapreduce share many similarities. It
    allows iterative processing and real-time computation on
    stream data collected by tools such as flume and KAFKA.
    Apache flink provides several features like FlinkML which
    represents a machine learning library capable of providing
    many learning algorithms for fast and scalable big data

    MongoDB. It is a NoSQL database capable of storing a high
    amount of data. MongoDB relies on JSON standard (Java
    Script Object Notation) in order to store records; it consists
    of an open, human, andmachine-readable format that makes
    data interchange easier compared to classical formats such
    as rows and tables. In addition, JSON scales better since join
    based queries are not needed due to the fact that relevant data
    of a given record is contained in a single JSON document..
    Spark is easily integrated with MongoDB; see [12].

    Table 1 summarizes big data processing solutions.

    3. Big Data-Based Healthcare Systems

    The potential offered by big data approaches in healthcare
    analytics has attracted the attention of many researchers.
    In [13], recent advances in big data for health informatics
    and their role to tackle disease management are presented,
    for instance, diagnosis prevention and treatment of several
    illnesses. The study demonstrates that data privacy and
    security represent challenging issues in healthcare systems.

    Raghupathi et al. exposed in [14] the architectural frame-
    work and challenges of big data healthcare analytics. In
    another study (see [15]), the importance of security and
    privacy issues is demonstrated in implementing successfully
    big data healthcare systems. Belle et al. discuss in [16] the
    role of big data in improving the quality of care delivery
    by aggregating and processing the large volume of data
    generated by healthcare systems. In [17], data mining tech-
    niques for healthcare analytics are presented, especially those
    used in healthcare applications like survival analysis and
    patient similarity. Bochicchio et al. proposed in [18] a big
    data healthcare analytics framework for supporting multi-
    dimensional mining over big healthcare data. The objective
    of this framework is analyzing the huge volume of data by
    applying data mining methods. Sakr et al. presented in [19]
    a composite big data healthcare analytics framework, called
    Smarthealth, whose goal is to overcome the challenges raised
    by healthcare big data via ICT technologies. In [20], authors
    presented Wiki-Health, a big data platform that processes

    data produced by health sensors. This platform is formed by
    the three following layers: application, query and analysis,
    and data storage. Application Layer ensures data access, data
    collection, security, and data sharing. On the other hand,
    query and analysis layer provides data management and data
    analysis, while data storage layer is in charge of storing data
    as its name suggests. Challenges regarding the design of
    such platforms, especially in terms of data privacy and data
    security, are highlighted in [21]. Baldominos et al. designed in
    [22] an intelligent big data healthcare management solution
    aimed at retrieving and aggregating data and predicting
    future values.

    Based on big data technologies, a few data processing
    systems for healthcare domain have been designed in order
    to handle the important amount of data streams generated
    by medical devices; a brief description of the major ones is
    provided in the next section.

    Borealis-based Heart Rate Variability Monitor, presented
    in [23], belongs to the category of big data processing systems
    for healthcare systems; it processes data originating from
    various sources in order to perform desired monitoring
    activities. It is composed of stream transmitter that represents
    an interface between sensors collecting data and Borealis
    application; it encapsulates the collected data into Borealis
    format in order to obtain a single stream. Then, the final
    stream is transferred toward Borealis application for pro-
    cessing purposes. This system includes also a graphical user
    interface (GUI) that allows physicians to select from among
    patients those whose health condition is going to be the
    subject of closemonitoring.Moreover, the graphical interface
    permits the medical staff to choose the parameters they want
    to focus on, regarding a monitoring task. Furthermore, it
    allows visualization of Borealis application outcomes. The
    system has many drawbacks; for instance, it does not include
    a machine learning component capable of making accurate
    predictions on patient health condition. Furthermore, adding
    an alarming component would enhance emergency cases

    Hadoop-based medical emergency management system
    using IoT technology relies on sensors measuring medical
    parameters through different processes [24]. Those sensors
    may be devices mounted on patient body or other types
    of medical devices capable of providing remote measuring.
    Before being transferred to the component called intelligent
    building (IB), the collected data flows through the primary
    medical device (PMD). Next, IB starts by aggregating the
    input stream thanks to its collection unit; then, the result-
    ing data is transferred to Hadoop Processing Unit (HPU)
    to perform statistical analyses of parameters measured by

    Advances in Bioinformatics 5

    sensors based on mapreduce paradigm. The map function
    aims to verify sensor readings; this verification occurs by
    performing a comparison with their corresponding normal
    threshold. If readings are considered to be normal, they
    are stored in database without further processing. On the
    other hand, if they are alarming, an alert is triggered and
    transmitted to the application layer.Meanwhile, when sensors
    return values that are neither normal, nor alerting, it is
    necessary to analyze them closely. Results of such analysis are
    collected by aggregation result unit through a reducer from
    different data nodes; then, they are sent to the final decision
    server. Finally, decision server receives the current results
    and applies machine learning classifiers and medical expert
    knowledge to process past patient data for more accurate
    decisions and generates outputs based onHadoop Processing
    Unit results.This system is based on hadoop ecosystemwhich
    is adapted for batch processing, however, it does not support
    stream processing.Therefore, it is more recommended to use
    spark in order to improve the system performance in terms
    of processing time using data stream mining approaches.

    A Prototype of Healthcare Big Data Processing System
    based on Spark [25] is proposed to analyze the high amount
    of data generated by healthcare big data process systems. It is
    formed by two logical parts: big data application service and
    big data supporting platform performing data analysis. The
    first logical part visualizes the processing results and plays the
    role of an interface between applications and data warehouse
    big data tools such as Hive or Spark SQL. The second one is
    responsible for computing operations and distributed storage
    allowing high storage capabilities. This solution is based
    on spark which is very promising since it handles batch
    computing, stream computing, and ad hoc query.The system
    has many drawbacks; for instance, it does not include big
    data mining and big data analytics in experimental platform,
    which hampers prediction possibilities that are vital for
    improving the quality of patient outcomes.

    In this paper, we summarize the added value of big
    data technologies on healthcare analytics by presenting an
    extensible big data architecture for healthcare analytics that
    combines advantages of both batch and stream computing to
    generate real-time alerts andmake accurate predictions about
    patient health condition. In this research, an architecture
    for management and analysis of medical data was designed
    based on big data methods and can be implemented via
    a combination of several big data technologies. Designing
    systems capable of handling both batch and real-time pro-
    cessing is a complex task and requires an effective conceptual
    architecture for implementing the system.

    4. An Extensible Big Data
    Architecture for Healthcare

    We are developing a system that has the advantage to be
    generic and can deal with various situations such as early
    disease diagnosis and emergency detection. In this study,
    we propose a new architecture aimed at handling medical
    big data originating from heterogeneous sources in different
    formats. Data management in this architecture is illustrated
    through the following scenario.

    Figure 2: The layer architecture.

    Indeed, new medical data is sent simultaneously to both
    batch layer and streaming layer. In batch mode, data is stored
    in data nodes; then, it is transmitted to semantic module
    which affects meaning to data using ontology store; after that,
    cleaning and filtering operations are applied to the resulting
    data before processing it. In the next step, the prepared
    data is analyzed through different phases: feature selection
    and feature extraction. Finally, the prepared data is used to
    design models predicting patients future health condition.
    This mode is solicited periodically on an offline basis. In the
    stream scenario, data comes from multiple sources such as
    medical sensors connected to patient body,measuring several
    medical parameters like blood pressure. Then, the collected
    data is synchronized based on time and its missing values are

    Based on sliding window technique, the adaptive prepro-
    cessor splits data into blocks, and then it extracts relevant
    information for the predictor component in order to build a
    predictive model for every window tuple. Figure 2 represents
    the layer architecture of the proposal.

    4.1. Batch Processing Layer. Batch computing is performed
    on extracted data from prepared data store through different

    4.1.1. Data Acquisition. When monitoring continuously a
    patient health condition, several types of data are generated.
    Medical data may include structured data like traditional
    Electronic Healthcare Records (EHRs), semistructured data
    such as logs produced by somemedical devices, and unstruc-
    tured data generated, for example, by biomedical imagery.

    Electronic Healthcare Records HER. It contains a complete
    patient medical history stored in a digital format; it is
    formed by amultitude ofmedical data describing the patient’s
    health status like demographics, medications, diagnoses,
    laboratory tests, doctor’s note, radiology documents, clinical

    6 Advances in Bioinformatics

    information, and payment notes. Thus, EHR represents a
    valuable source of information for the purpose of healthcare
    analytics. Furthermore, EHR allows exchanging data between
    healthcare professionals community.

    Biomedical Images. Biomedical imaging is considered as a
    powerful tool regarding disease detection and care delivery.
    Nevertheless, processing this kind of images is challenging as
    they include noisy data that needs to be discarded in order to
    help physicians make accurate decisions.

    Social Network Analysis. Performing social network analysis
    requires gathering data from socialmedia like social network-
    ing sites. The next step consists of extracting knowledge that
    could affect healthcare predictive analysis such as discovering
    infectious illnesses. In general, social networks data ismarked
    by uncertainty that makes their use in designing predictive
    models risky.

    Sensing Data. Sensors of different types are employed in
    healthcare monitoring solutions. Those devices are essential
    inmonitoring a patient health as theymeasure awide range of
    medical indicators such as body temperature, blood pressure,
    respiratory rate, heart rate, and cardiovascular status. In order
    to ensure an efficient health monitoring, patients living area
    may be full of devices like surveillance cameras,microphones,
    and pressure sensors. Consequently, data volume generated
    by healthmonitoring systems tends to increase tremendously
    which requires adopting sophisticated methods during the
    processing phase.

    Mobile Phone. Nowadays, mobile phone represents one of the
    most popular technological devices in the world. Compared
    to their early beginnings, mobile phones transformed from
    a basic communication tool to a complex device offering
    many features and services.They are currently equipped with
    several sensors like satellite positioning services, accelerome-
    ters, and cameras. Due to their multiple capabilities and wide
    use, mobile phones are ideal candidates regarding health data
    collection allowing the design of many successful healthcare
    applications like pregnancy monitoring [26], child nutrition
    [27], and heart frequency monitoring [28].

    The objective of data acquisition phase is to read the
    data gathered from healthcare sensors in several formats
    and then data flows through semantic module before being

    Semantic Module. It is based on ontologies, which constitute
    efficient tools when it comes to representing actionable
    knowledge in the field of biomedicine. In fact, ontologies
    have the ability to extract biomedical knowledge in a formal,
    powerful, and incremental way. They also allow automation
    and interoperability between different clinical information
    systems. Automation has a major benefit; it helps medical
    personnel in processing large amounts of patients’ data,
    especially when taking into consideration that this personnel
    is often overwhelmed by a series of healthcare tasks. Intro-
    ducing automation in healthcare application contributes to
    providing assistance to humanmedical staff, which enhances

    its overall performance. It should be highlighted that automa-
    tion will help humans in performing their duties rather
    than replacing them. Interoperability is an important issue
    when dealing with medical data. In fact, healthcare databases
    lack homogeneity as they adopt different structures and
    terminologies. Therefore, it is difficult to share information
    and integrate healthcare data. In this context, ontologies may
    play a determinant role by establishing a common structure
    and semantics, which allows sharing and reuse of data across
    different systems. In other words, by defining a standard
    ontology format, it becomes possible to map heterogeneous
    databases into a common structure and terminology. For
    instance, the Web Ontology Language (OWL) represents the
    standard interchange format regarding ontology data that
    employs XML syntax.

    4.1.2. Data Preparation. Processing raw data without prepa-
    ration routines may require extra computational resources
    that are not affordable in big data context. Thus, it is recom-
    mended to make sure data is prepared properly, in order to
    obtain accurate predictive models and to enhance the relia-
    bility of data mining techniques. Data preparation consists of
    two steps: data cleaning and data filtering.

    Data Filtering. Data filtering in the presence of large size data
    is achieved by discarding information that is not useful for
    healthcare monitoring based on a defined criterion.

    Data Cleaning. It encompasses several components such as
    normalization, noise reduction, and missing data manage-

    Several methods are utilized in order to eliminate noisy
    data and to find out the values of missing data. In fact,
    medical records often include noisy information and may
    have missing data. Determining missing values in healthcare
    data is a critical process. Making errors in filling miss-
    ing values may affect the quality of extracted knowledge
    and lead to incorrect results. In healthcare domain, the
    handling of missing data should be performed with maxi-
    mum precision as wrong decisions may have serious con-
    sequences. Data mining field has many powerful algorithms
    aimed at handling missing values, for instance, Expectation-
    Maximization (EM) algorithm andmultiple Imputation algo-

    Noise Treatment. In general, noisy data is treated according
    to two main approaches. The first one consists of correct-
    ing noisy values based on data polishing techniques; these
    techniques are difficult to implement and are applied only
    in the case of small amounts of noise. The second approach
    is based on noise filters, which determine and eliminate
    noisy instances in the training data, and those filters do not
    introduce modifications on adopted data mining methods.

    For instance, electronicmedical records (EMRs) illustrate
    well the need for data cleaning as it may provide noisy data
    containing incomplete information. Data sparsity in EMRs
    finds its origin in irregular collection of parameters over time,
    since patient parameters are recorded only when patients are
    present in hospitals. In the case of biomedical imagery, many

    Advances in Bioinformatics 7

    processing techniques have been applied in order to reduce

    Generally, the preparation of biomedical images starts by
    the identification (segmentation) of significant objects. On
    the other hand, data preparation is more challenging when
    dealing with raw social media data. In addition to its huge
    volume and its informal content, this kind of data has the
    critical aspect of including user’s personal information.Thus,
    data cleaning is a key factor for success in social networks
    analysis.When data preparation step ends, the processed data
    needs to be stored in prepared data store.

    4.1.3. Feature Extraction and Feature Selection. The prolifer-
    ation of devices designed to collect medical data in recent
    years has increased tremendously both the number of features
    and instances in the field of healthcaremonitoring.Therefore,
    selecting the most significant features becomes crucial when
    facing such high volume data; see [29]. In this context,
    several techniques have been proposed to manage this issue,
    especially when handling thousands of features [29]. On the
    other hand, feature extraction represents another approach
    that consists of extracting a reduced number of attributes
    compared to the original ones. Applying feature selection and
    extraction methods requires a statistical tools store. When
    this phase terminates, the selected feature subset will be used
    to build the predictive model.

    4.1.4. Predictive Model Design. The objective of this compo-
    nent is to build a model capable of producing predictions
    for new observations based on the previous data. The quality
    of a given predictive model is evaluated by its accuracy.
    Those models are developed based on tools available in
    the statistical and machine learning store provided by the
    suggested architecture.The results of batch processing will be
    stored into model store.

    4.2. Data Storage. Data Storage is one of the most challeng-
    ing tasks in big data framework, especially in the case of
    healthcare monitoring systems which involve large amounts
    of data.Therefore, traditional data analysis is unfit to manage
    those systems. This component may be HDFS, NoSQL such
    as MongoDB and SQL databases, or a combination of all of
    them. Therefore, it is more scalable and ensures high storage
    capabilities. In the proposed system, the patient data collected
    from heterogeneous sources can be classified into structured
    data such as EHR, unstructured data like biomedical images,
    or semistructured data such as XML and JSON documents.
    These data will be stored into raw data store in the target
    databases. Streaming data such as social media will be stored
    into stream temp data store.

    4.3. Stream Processing Layer. Stream data analysis layer is
    composed of data synchronization module, adaptive prepro-
    cessor module, and adaptive predictor module.

    Data Synchronization.The role of data synchronizationmod-
    ule is to make sure that data is processed in the correct order
    regarding time criterion. In addition, data synchronization
    process dismisses measurements that are inconsistent and

    takes care of missing values. Detection of inconsistent values
    is performed by defining thresholds on the incoming param-

    Adaptive Learning. In many applications, it is assumed that
    data preprocessing task is performed by learning algorithms,
    or at least that data has been already preprocessed before
    its arrival. In the majority of cases, such expectations do
    not match reality. This is particularly true for our proposed
    systemwhich extracts streaming data from stream temp store.
    The need to adapt in the face of data changes led to the
    development of adaptive systems, and a key factor for the
    long term success of a big data system is adaptability. In
    fact, preprocessing is not a task performed in an independent
    manner; it is rather a component belonging to the adaptive
    system. Moreover, in order to stay reliable and maintain a
    certain degree of accuracy, predictive models are supposed
    to adapt when data changes occur. As a result, prediction
    process may be considered as a part of the adaptive system
    that will be the association of two distinct parts that are
    adaptive preprocessor and adaptive predictor.

    Adaptive Preprocessor. It starts processing operations by
    splitting the arriving data flow in time windows. In this
    context, sliding window technique is adopted in order to
    split data streams into overlapping subsets of tuples (every
    tuple is included in multiple windows). For a given window,
    the average of every measure is computed and compared to
    predefined user’s threshold. If the value of a particular average
    exceeds alarming threshold value, it will be stored while an
    emergency alert is generated; otherwise, it is simply stored.
    When comparisons with threshold values are terminated,
    information extraction phase proceeds by selecting relevant
    features which will be transmitted to adaptive predictor
    component; see [30].

    Adaptive Predictor. In order to maintain a certain level
    of accuracy, predictors have to update according to data
    changes. Otherwise, they will simply become less reliable
    through time due to data evolution. Therefore, predictive
    model should take into consideration newly arrived samples,
    while having, at the same moment, the ability to generate
    predictions on a real-time space.

    The adaptive feature requires the establishment of a con-
    nection between adaptive processor and adaptive predictor.
    Through that connection, the predictor sends feedback to the
    preprocessor regarding the need to update or not, and then
    the preprocessing unit will provide it when necessary with
    raw data via a given mapping function. The results of stream
    processing will be transferred into stream processing results

    4.4. Query Processor. The query processor aims to find the
    status of patients by combining the responses of queries sent
    to both the stream processing results store and the batch
    processing results store.

    4.5. Visualization Layer. The analytics layer produces multi-
    ple outputs that include, for instance, visualization of patient

    8 Advances in Bioinformatics

    Figure 3: Big data architecture for healthcare systems.

    health monitoring report and predictive decision report.
    In healthcare context, the priority in terms of real-time
    visualization is given to the most critical information in
    order to optimize decision-making and avoid emergencies.
    Examples of relevant information encompass patient dash-
    boards tracking daily health condition, real-time alerts, and
    proactive messages generated by predictions.

    Figure 3 shows the proposed big data architecture for

    5. Implementation Process for
    Detecting Emergency Cases

    In this prototype system, we aim to detect potential dangers
    of patients. Spark streaming and MongoDB are chosen to
    implement themodule of emergency detection figuring in the
    visualization layer of the proposed architecture. The system
    employs spark to read data fromMongoDB in the batch layer.
    The batch jobs run at a regular time interval specified by the
    user. Spark streaming is used for processing real-time data
    streams; it directly gets data frommedical sources and detects
    abnormal situations based on user’s thresholds. Then, spark
    streaming sends alerts to MongoDB which will be used to
    notify doctors about emergencies. Spark MLLib and spark
    streaming techniques are adopted for real-time monitoring
    and online learning to predict whether the current state of
    patients is danger or not which is the supervised classifica-
    tion. The logistic regression model is selected for handling
    this supervised classification problem. Figure 4 illustrates the
    implementation process of our proposal.

    5.1. Diabetic Patient Case Study. Chronic patients must pay
    attention to numerous aspects in their daily life. Diet, sport

    Figure 4: The implementation process of our proposal.

    activity, medical analysis, or blood glucose levels represent
    some of those aspects. Medical care of such patients is a
    challenging process since a lot of checks are performed
    many times during a single day; for instance, some diabetics
    measure their blood pressure several times on a daily basis.
    The objective of the proposed system is to allow doctors to
    perform a real-time monitoring of diabetic patient’s health
    condition. First, the real-time alert detection reads directly
    from all the incoming data streams provided by sensors
    reading; then, for every window data stream, healthcare
    measures are compared with user defined thresholds in order
    to decide whether the current parameters are abnormal
    through mapreduce jobs. In the following step, the average
    value of every medical measure is calculated and written
    into MongoDB for notification purposes. Figure 5 illustrates
    a real-time monitoring of the blood pressure parameter and
    Figure 6 visualizes patient measures.

    The effectiveness of the proposal is evaluated by con-
    ducting experiments with a cluster formed by 3 nodes with
    identical setting, configured with an Intel CORE� i7-4770
    processor (3.40GHZ, 4 Cores, 16GB RAM, running Ubuntu
    12.04 LTS with 64-bit Linux 3.11.0 kernel). Figure 5 illustrates

    Advances in Bioinformatics 9

    Figure 5: Real-time monitoring of the blood pressure parameter.

    Figure 6: Visualization of measured patient parameters.


    Idpatient: ”PA12”
    measuretype: ”blood pressure”
    value: ”100mg”
    threshold: ”90-120mg”
    measuredate: ”2018-04-20”


    Box 1: JSON document representing patient parameters into Mon-

    the visualization of patient medical parameters measured by
    a given sensor selected by user through a GUI designed for
    that purpose.

    To evaluate the scalability of the proposal, we used an
    Open Source EHR Generator such as Box 1 to produce
    medical patient data in HL7 FHIR format which are loaded
    into MongoDB as a JSON documents such as Box 1.

    6. Conclusion

    In this paper, popular healthcare monitoring systems based
    on big data have been reviewed. Meanwhile, an overview
    of recent big data processing approaches and technologies
    has been provided. Then, a big data processing architecture
    for healthcare industry has been presented; it is capable of
    handling the high amount of data generated by different
    medical sources in real time. The proposal is designed
    according to big data approaches. The main contribution of
    the proposed solution is twofold; first, it proposed a generic
    big data architecture for healthcare based on both batch

    computing and stream computing providing simultaneously
    accurate predictions and online patient dashboards. Then,
    a solution prototype implementation based on spark and
    MongoDB has been proposed, in order to detect alarming
    cases and generate real-time alerts. In the future works,
    we project to handle missing value through Expectation-
    Maximization (EM) algorithm and we will implement the
    semantic module.

    Conflicts of Interest

    The authors declare that they have no conflicts of interest.


    [1] WHO, “Global Health and Aging,” 2011, http://www.who.int/
    ageing/publications/global health .

    [2] A. Gandomi and M. Haider, “Beyond the hype: big data con-
    cepts, methods, and analytics,” International Journal of Informa-
    tion Management, vol. 35, no. 2, pp. 137–144, 2015.

    [3] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile
    Networks and Applications, vol. 19, no. 2, pp. 171–209, 2014.

    [4] S. Shahrivari, “Beyond batch processing: towards real-time and
    streaming big data,”TheComputer Journal , vol. 3, no. 4, pp. 117–
    129, 2014.

    [5] J. Dean and S.Ghemawat, “MapReduce: simplified data process-
    ing on large clusters,” Communications of the ACM, vol. 51, no.
    1, pp. 107–113, 2008.

    [6] N. Tatbul, “Streaming data integration: Challenges and oppor-
    tunities,” in Proceedings of the 2010 IEEE 26th International
    Conference on Data Engineering Workshops, ICDEW 2010, pp.
    155–158, usa, March 2010.

    [7] D. Singh and C. K. Reddy, “A survey on platforms for big data
    analytics,” Journal of Big Data, vol. 2, no. 1, 2015.

    [8] R. Evans, “Apache Storm, a Hands on Tutorial,” in Proceedings
    of the 2015 IEEE International Conference on Cloud Engineering
    (IC2E), pp. 2-2, Tempe, AZ, USA, March 2015.

    [9] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari, “S4: Dis-
    tributed stream computing platform,” in Proceedings of the
    10th IEEE International Conference on Data Mining Workshops
    (ICDMW ’10), pp. 170–177, Sydney, Australia, December 2010.

    [10] M. Zaharia, R. S. Xin, P.Wendell et al., “Apache spark: A unified
    engine for big data processing,” Communications of the ACM,
    vol. 59, no. 11, pp. 56–65, 2016.

    [11] F. Ellen and T. Kostas, Introduction to Apache Flink: Stream
    Processing for Real Time and Beyond, Inc, O’Reilly Media, 2016.



    10 Advances in Bioinformatics

    [12] D. Hows, P. Membrey, E. Plugge, and T. Hawkins, “Introduction
    to mongodb,” in The Definitive Guide to MongoDB, 16, p. 1,
    Springer, Berkeley, CA, 2015.

    [13] T. Huang, L. Lan, X. Fang, P. An, J. Min, and F.Wang, “Promises
    and challenges of big data computing in health sciences,” Big
    Data Research, vol. 2, no. 1, pp. 2–11, 2015.

    [14] W. Raghupathi andV. Raghupathi, “Big data analytics in health-
    care: promise and potential,” Health Information Science and
    Systems, vol. 2, article 3, 2014.

    [15] I. Olaronke and O. Oluwaseun, “Big data in healthcare:
    Prospects, challenges and resolutions,” inProceedings of the 2016
    Future Technologies Conference, FTC 2016, pp. 1152–1157, usa,
    December 2016.

    [16] A. Belle, R. Thiagarajan, S. M. R. Soroushmehr, F. Navidi, D.
    A. Beard, and K. Najarian, “Big data analytics in healthcare,”
    BioMed Research International, vol. 2015, Article ID 370194,

    [17] J. Sun and C. K. Reddy, “Big data analytics for healthcare,” in
    Proceedings of the 19th ACM SIGKDD International Conference
    on Knowledge Discovery and Data Mining, p. 1525, Chicago, Ill,
    USA, August 2013.

    [18] M. Bochicchio, A. Cuzzocrea, and L. Vaira, “A big data analytics
    framework for supporting multidimensional mining over big
    healthcare data,” in Proceedings of the 15th IEEE International
    Conference onMachine Learning andApplications, ICMLA 2016,
    pp. 508–513, usa, December 2016.

    [19] S. Sakr and A. Elgammal, “Towards a Comprehensive Data
    Analytics Framework for Smart Healthcare Services,” Big Data
    Research, vol. 4, pp. 44–58, 2016.

    [20] Y. Li, C. Wu, L. Guo, C.-H. Lee, and Y. Guo, “Wiki-health: A
    big data platform for health sensor data management,” Cloud
    Computing Applications for Quality Health Care Delivery, pp.
    59–77, 2014.

    [21] N. Poh, S. Tirunagari, andD.Windridge, “Challenges in design-
    ing an online healthcare platform for personalised patient
    analytics,” in Proceedings of the 2014 IEEE Symposium on Com-
    putational Intelligence in Big Data, CIBD 2014, usa, December

    [22] U. Çetintemel, D. Abadi, Y. Ahmad et al., “TheAurora andBore-
    alis Stream Processing Engines,” in Data Stream Management,
    Data-Centric Systems and Applications, pp. 337–359, Springer
    Berlin Heidelberg, Berlin, Heidelberg, 2016.

    [23] X. Jiang, S. Yoo, and J. Choi, “DSMS in ubiquitous-healthcare: A
    Borealis-based heart rate variability monitor,” in Proceedings of
    the 2011 4th International Conference on Biomedical Engineering
    and Informatics, BMEI 2011, vol. 4, pp. 2144–2147, October 2011.

    [24] M.M. Rathore, A. Ahmad, and A. Paul, “The Internet ofThings
    based medical emergency management using Hadoop ecosys-
    tem,” in Proceedings of the 14th IEEE SENSORS, IEEE, Busan,
    South Korea, November 2015.

    [25] W. Liu, Q. Li, Y. Cai, Y. Li, and X. Li, “A prototype of healthcare
    big data processing system based on Spark,” in Proceedings of
    the 8th International Conference on BioMedical Engineering and
    Informatics, BMEI 2015, pp. 516–520, chn, October 2015.

    [26] M. Bachiri, A. Idri, J. L. Fernández-Alemán, and A. Toval,
    “Mobile personal health records for pregnancy monitoring
    functionalities: Analysis and potential,” Computer Methods and
    Programs in Biomedicine, vol. 134, pp. 121–135, 2016.

    [27] A. Guyon, A. Bock, L. Buback, and B. Knittel, “Mobile-based
    nutrition and child healthmonitoring to informprogramdevel-
    opment: An experience fromLiberia,”GlobalHealth Science and
    Practice, vol. 4, no. 4, pp. 661–674, 2016.

    [28] P. Pelegris, K. Banitsas, T. Orbach, and K. Marias, “A novel
    method to detect heart beat rate using a mobile phone.,” Con-
    ference proceedings: IEEE Engineering in Medicine and Biology
    Society, pp. 5488–5491, 2010.

    [29] A. Jović, K. Brkić, and N. Bogunović, “A review of feature
    selection methods with applications,” in Proceedings of the 38th
    International Convention on Information and Communication
    Technology, Electronics and Microelectronics, MIPRO 2015, pp.
    1200–1205, Croatia, May 2015.

    [30] S. Ramı́rez-Gallego, B. Krawczyk, S. Garćıa, M. Woźniak, and
    F. Herrera, “A survey on data preprocessing for data stream
    mining: current status and future directions,” Neurocomputing,
    vol. 239, pp. 39–57, 2017.


    International Journal of

    Volume 2018


    www.hindawi.com Volume 2018

    Research International

    International Journal of

    www.hindawi.com Volume 2018
    www.hindawi.com Volume 2018

    Journal of
    Parasitology Research

    International Journal of

    www.hindawi.com Volume 2018

    Hindawi Publishing Corporation
    http://www.hindawi.com Volume 2013

    The Scientific
    World Journal

    Volume 2018
    www.hindawi.com Volume 2018


    Advances in

    Marine Biology
    Journal of

    www.hindawi.com Volume 2018
    www.hindawi.com Volume 2018


    www.hindawi.com Volume 2018

    Research International

    Cell Biology
    International Journal of

    www.hindawi.com Volume 2018
    www.hindawi.com Volume 2018

    Research International

    www.hindawi.com Volume 2018

    www.hindawi.com Volume 2018

    Research International

    www.hindawi.com Volume 2018
    Advances in

    Virolog y Stem Cells International
    www.hindawi.com Volume 2018

    www.hindawi.com Volume 2018


    www.hindawi.com Volume 2018
    International Journal of


    Nucleic Acids
    Journal of

    Volume 2018

    Submit your manuscripts at






















    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Chiasson, Mike W;Davidson, Elizabeth
    MIS Quarterly; Dec 2005; 29, 4; ABI/INFORM Global
    pg. 591

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

    Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


    This proposal presents a comprehensive analysis and recommendations to bring the information governance of Jira Healthcare into compliance with industry Best Practices and legal compliance and to better manage our records to streamline the workload of the organization staff, doctors, and nurses to improve patient outcomes.

    This proposal will address the following critical issues:

    · Regulatory requirements found in the Health Insurance Portability and Accountability Act of 1996 (HIPAA) and the CFR Title 21, Part 11 Pharmaceuticals, and others could later be identified.

    · Best Practices that are applicable to the healthcare industry

    · Risk Management and Mitigation

    · Information Security and Governance

    · Records and E-Records Management

    · Metrics to evaluate Information Governance (IG) Performance

    · Patient record keeping

    · Email and social media strategy

    · Cloud Computing strategy

    · Outline of a Proposed IG Strategic Plan

    Information governance is a sensitive part of any organization in the current market, which is a major influence on the company’s functioning and performance. It is important to understand that any organization needs to develop proper information governance to help reduce their current problems, but especially critical to ours because we deal with human life and death. Our practice is having serious issues with the flow of information throughout our practice and the interfaces with other health organizations and with our patients. This is a major threat since one part of the practice might fail to receive information on time, leading to serious repercussion with our patients. Chiasson & Davidson (2018) explained that the lack of the proper flow of information might result to poor management and planning which is a major issue that implies that it is essential to develop the proper understanding of the need for information to be consistent through the workplace. In our case, poor management can also impact patients’ health. In the digital era, confidentiality and privacy of personal data are most important. For our company and patient’s trust, we are compelled to comply with the Health Insurance Portability and Accountability Act of 1996 (HIPAA). This act was passed to improve the US healthcare system’s efficiency and quality through developed information sharing. As well as increasing the use of E-Records, HIPAA has provisions to protect the security and privacy of protected health information. The present issues such as duplication of the workplace information are a threat to the patients’ records which is not appropriate and might discourage the patients from reaching out and opening up regarding their personal issues. The concept is a threat to the workplace’s entire functions because it is a major threat that is not appropriate. Embracing technology within the work environment will help make it easy to store, retrieve, and convey the information to all the available parties within the workplace, which is relevant to how they perform (Ratna, 2019). Technology is essential, but it is important to develop information governance policies early in order to prevent serious missteps, non-compliance and to ensure proper functioning and performance. Also, it is important to introduce and boost information security within the workplace since there is an increased risk of compromise with information technology.

    As being addressed, this Information Governance program is a new project that Jira Healthcare has decided to develop and to become an integral part in our business model. For the over 50 years that Jira Healthcare has been operating, we have adapted many changes to keep with the technology and the times. The establishment of this new Information Governance department is one of them. We, as a practice, have struggled with the big change of the digital era in which data and information drive a large part of our business requiring streaming data to be real-time, accurate, accessible, and well protected. Our goal toward this change is to successfully create this new department that will work specifically to governing information throughout the practice.


    Chiasson, M., & Davidson, E. (2018). Taking Industry Seriously in Information Systems Research. MIS

    Quarterly, 29(4), 591-605. doi:10.2307/25148701

    Ratna, H. (2019). The Importance of Effective Communication in Healthcare Practice. Harvard Public Health Review, 23, 1-6. doi:10.2307/48546767

    Calculate your order
    Pages (275 words)
    Standard price: $0.00
    Client Reviews
    Our Guarantees
    100% Confidentiality
    Information about customers is confidential and never disclosed to third parties.
    Original Writing
    We complete all papers from scratch. You can get a plagiarism report.
    Timely Delivery
    No missed deadlines – 97% of assignments are completed in time.
    Money Back
    If you're confident that a writer didn't follow your order details, ask for a refund.

    Calculate the price of your order

    You will get a personal manager and a discount.
    We'll send you the first draft for approval by at
    Total price:
    Power up Your Academic Success with the
    Team of Professionals. We’ve Got Your Back.
    Power up Your Study Success with Experts We’ve Got Your Back.

    Order your essay today and save 30% with the discount code ESSAYHELP